About
AI systems are increasingly relied upon to answer questions about organizations — who they are, what they do, what they stand for. Most of the time, the answers come from web pages that were never designed to be machine-read: unstructured, inconsistent, and often out of date. CrawlerFile exists to fix that. We publish verified, structured profiles for organizations — authoritative data, in a format built for machines, with a clear chain of authorization back to the source.
When someone asks an AI about your organization, the answer is assembled from whatever the model could find and interpret on the web. That might be accurate. It might be outdated. It might be a competitor's description of you, or a press release from three years ago. A CrawlerFile profile gives you a direct channel: structured, authorized, machine-readable data that AI systems can find, trust, and use.
Your profile contains exactly what you submit. We validate and publish — we don't editorialize, summarize, or interpret.
A bidirectional verification token links your CrawlerFile profile to your own domain. Any crawler can confirm you authorized it.
Formally state whether you consent to AI training use, require attribution, or have conditions on data use — on the record, with a timestamp.
Update your profile when things change. Your CrawlerFile is the most accurate version of your organization's data on the web.
Provide accurate, first-party information about your organization. You decide what to include.
Add a small verification snippet to your website. This proves to crawlers that you authorized the profile.
Your profile is published at a stable URL and immediately available to AI crawlers worldwide.
Ready to take control of how AI represents your organization? Get in touch to start your profile.
Get ListedCrawlerFile profiles are designed from the ground up for machine consumption. Every profile uses Schema.org vocabulary wrapped in a consistent envelope — so your system knows exactly what it's looking at before it reads a single field. And every profile is verifiably authorized by the entity it describes.
Every file has the same Layer 1 structure regardless of content. Parse the envelope first, then read the data you need.
Entity data uses Schema.org Organization vocabulary — a standard your systems already understand.
Fetch the entity_verification_url and confirm the token. No API call to CrawlerFile required.
The schema_docs field points to the authoritative field definitions for the declared version. Schema changes never happen silently.
Every CrawlerFile profile includes an aiPolicy section — a structured, timestamped declaration of the entity's stated preferences regarding AI data use. This is not a technical enforcement mechanism. It is a formal, machine-readable record: discoverable, citable, and legally meaningful in a rapidly shifting regulatory landscape.
trainingDataConsent
Whether this profile's data may be used to train AI models. Values: permitted, not_permitted, conditional.
retrievalConsent
Whether this data may be used in real-time retrieval systems, independent of training consent.
requiresAttribution
Whether the entity requires credit when their data is cited or used by AI systems.
dataFreshnessSLA
How frequently this profile is reviewed — so consumers know how current the data is likely to be.
As AI regulation matures, the ability to demonstrate that data was used with documented consent — or that an organization's objection was on the record — will matter. CrawlerFile profiles create that record, consistently, at scale.
See full aiPolicy field definitions →