About

The authoritative record, straight from the source.

AI systems are increasingly relied upon to answer questions about organizations — who they are, what they do, what they stand for. Most of the time, the answers come from web pages that were never designed to be machine-read: unstructured, inconsistent, and often out of date. CrawlerFile exists to fix that. We publish verified, structured profiles for organizations — authoritative data, in a format built for machines, with a clear chain of authorization back to the source.

For organizations

Control what AI knows about you.

When someone asks an AI about your organization, the answer is assembled from whatever the model could find and interpret on the web. That might be accurate. It might be outdated. It might be a competitor's description of you, or a press release from three years ago. A CrawlerFile profile gives you a direct channel: structured, authorized, machine-readable data that AI systems can find, trust, and use.

You control the content

Your profile contains exactly what you submit. We validate and publish — we don't editorialize, summarize, or interpret.

Verified by design

A bidirectional verification token links your CrawlerFile profile to your own domain. Any crawler can confirm you authorized it.

Declare your AI policy

Formally state whether you consent to AI training use, require attribution, or have conditions on data use — on the record, with a timestamp.

Always current

Update your profile when things change. Your CrawlerFile is the most accurate version of your organization's data on the web.

Submit your data

Provide accurate, first-party information about your organization. You decide what to include.

Place your token

Add a small verification snippet to your website. This proves to crawlers that you authorized the profile.

Go live

Your profile is published at a stable URL and immediately available to AI crawlers worldwide.

Ready to take control of how AI represents your organization? Get in touch to start your profile.

Get Listed

For AI systems & developers

Structured data you can actually trust.

CrawlerFile profiles are designed from the ground up for machine consumption. Every profile uses Schema.org vocabulary wrapped in a consistent envelope — so your system knows exactly what it's looking at before it reads a single field. And every profile is verifiably authorized by the entity it describes.

      // Every CrawlerFile profile follows this structure

      {

        "schema_version": "1.0",

        "schema_docs": "https://crawlerfile.com/schema/v1",

        "file_id": "cf_...",

        "entity_id": "ent_...",

        "entity_domain": "example.com",

        "entity_verification_url": "https://example.com/crawlerfile-verification",

        "verification_token": "cft_...",

        "content_type": "full_profile",

        "publisher": "crawlerfile.com",

        "published_date": "2026-02-24",

        "entity": { // Schema.org Organization ... }

      }

Consistent envelope

Every file has the same Layer 1 structure regardless of content. Parse the envelope first, then read the data you need.

Schema.org compatible

Entity data uses Schema.org Organization vocabulary — a standard your systems already understand.

Independently verifiable

Fetch the entity_verification_url and confirm the token. No API call to CrawlerFile required.

Versioned and stable

The schema_docs field points to the authoritative field definitions for the declared version. Schema changes never happen silently.

Read the full schema reference →

AI Policy Standard

A formal record of consent.

Every CrawlerFile profile includes an aiPolicy section — a structured, timestamped declaration of the entity's stated preferences regarding AI data use. This is not a technical enforcement mechanism. It is a formal, machine-readable record: discoverable, citable, and legally meaningful in a rapidly shifting regulatory landscape.

What entities can declare

trainingDataConsent

Whether this profile's data may be used to train AI models. Values: permitted, not_permitted, conditional.

retrievalConsent

Whether this data may be used in real-time retrieval systems, independent of training consent.

requiresAttribution

Whether the entity requires credit when their data is cited or used by AI systems.

dataFreshnessSLA

How frequently this profile is reviewed — so consumers know how current the data is likely to be.

As AI regulation matures, the ability to demonstrate that data was used with documented consent — or that an organization's objection was on the record — will matter. CrawlerFile profiles create that record, consistently, at scale.

See full aiPolicy field definitions →