Agent-Facts Protocol v3.0 Specification

Published: April 15, 2026
Author: Ian Aguirre, AgentReadyScan.com
Status: Public Draft (Pre-RFC)
AI Collaborators: Perplexity, Grok, Anthropic (Claude), Google (Gemini), OpenAI (ChatGPT), Microsoft Copilot
Final v3.0 Revision: Claude Opus 4.6

llms.txt helps AI find pages. Agent-Facts helps AI trust facts.

For conformance, agent-facts.json is the canonical machine-readable source; agent-facts.html is its human-readable projection.

0. Purpose and Scope

The Agent-Facts Protocol defines a structured JSON file that businesses host on their own domains to provide AI agents with provenance-tracked, freshness-aware business facts. An accompanying HTML page renders the same data for human readers and basic crawlers.

It solves one specific problem: AI systems hallucinate business data because they have no reliable, structured source of truth for pricing, hours, services, contact info, and scope.

What Agent-Facts Protocol Is

The structured facts layer that llms.txt can point to
A static, zero-dependency business data file that any AI agent can parse today
The "what we do, what we charge, and what we do NOT do" source that prevents hallucination
A protocol for official business facts for AI retrieval

What Agent-Facts Protocol Is NOT

A replacement for llms.txt (which is a content discovery index)
A replacement for MCP, NLWeb, or WebMCP (which handle queries and actions)
A universal verification or identity protocol for AI agents
A cryptographic trust chain or certificate authority
An AI-agent identity, KYA, or governance framework (other projects using similar names address those areas; this protocol is strictly about business facts for AI retrieval)

Protocol Layers

Agent-Facts Protocol is designed in three layers. Only the Core layer is required for v3.0 conformance.

Layer	What It Contains	v3.0 Status
Core	JSON facts file + HTML projection, Golden Questions, anti-hallucination section	Required
Trust	Fact-level provenance, freshness rules, staleness handling	Recommended
Action	MCP server adapter, NLWeb endpoint, WebMCP tool declarations, optional cryptographic signing	Future (v4+), built only after adoption warrants it

1. File Locations and Discovery

Files

File	Path	Role
JSON facts file	`/agent-facts.json`	Canonical source. Machine-readable structured data. This is the normative file.
HTML projection	`/agent-facts.html`	Human-readable rendering of the JSON data. Useful for browsers and basic crawlers.
JSON (alternate)	`/.well-known/agent-facts.json`	Optional. Follows the .well-known convention for programmatic discovery.

agent-facts.json is the canonical source of truth. agent-facts.html is a projection of that data for human consumption. When conflicts exist between the two files, agent-facts.json takes precedence. Validators should check semantic equivalence (same facts, same values) rather than exact string duplication.

Discovery Precedence

Agents and framework authors should check for Agent-Facts data in this order:

/agent-facts.json (canonical)
/.well-known/agent-facts.json (alternate location)
/agent-facts.html (HTML projection, parse as fallback)
References in /llms.txt pointing to any of the above

This matches the protocol's positioning: llms.txt is the discovery layer, Agent-Facts Protocol is the facts layer. An agent that finds the JSON file at step 1 does not need to continue checking.

Discovery via Existing Conventions

robots.txt:

# Agent-Facts Protocol v3
Agent-Facts: /agent-facts.json
Allow: /agent-facts.json
Allow: /agent-facts.html

llms.txt:

## Verified Business Facts
- [Agent-Facts (JSON)](https://example.com/agent-facts.json): Canonical business facts for AI agents (Agent-Facts Protocol v3)
- [Agent-Facts (HTML)](https://example.com/agent-facts.html): Human-readable business facts

XML sitemap:

<url>
  <loc>https://example.com/agent-facts.json</loc>
  <priority>1.0</priority>
  <changefreq>weekly</changefreq>
</url>

HTML link tag (on any page of the site):

<link rel="agent-facts" href="/agent-facts.json" type="application/json">

2. JSON Specification (agent-facts.json)

This is the canonical file. All conformance requirements are defined against this format.

Top-Level Structure

{
  "agent_facts_version": "3.0",
  "domain": "example.com",
  "last_updated": "2026-04-15T12:00:00Z",
  "expires_after_days": 90,
  "stale_after": "2026-07-14T12:00:00Z",
  "source_html": "/agent-facts.html",

  "identity": { },
  "operations": { },
  "services": [ ],
  "pricing": { },
  "does_not_do": [ ],
  "extended_facts": { },
  "changelog": [ ]
}

Top-Level Fields

Field	Type	Required	Description
`agent_facts_version`	string	MUST	Protocol version ("3.0")
`domain`	string	MUST	The domain this file is authoritative for
`last_updated`	ISO 8601 datetime	MUST	When this file was last modified
`expires_after_days`	integer	MUST	Number of days after `last_updated` before facts should be treated as stale
`stale_after`	ISO 8601 datetime	MUST	Computed expiration date
`source_html`	string	SHOULD	Relative path to the HTML projection

Fact Object Structure

Every fact in the identity, operations, pricing, and extended_facts sections uses this structure:

Field	Type	Required	Description
`value`	string	MUST	The fact itself
`last_updated`	ISO 8601 datetime	SHOULD	When this specific fact was last verified or changed. If absent, the file-level `last_updated` applies.
`source_url`	string	SHOULD	URL where this fact can be independently verified
`source_type`	enum	SHOULD	One of: `owner_attested`, `pricing_page`, `internal_policy`, `public_filing`, `third_party_verified`
`applies_to`	string	MAY	Scope or context (e.g., "Enterprise plans only")
`confidence_scope`	string	MAY	Limitations or caveats on this fact

Why provenance matters: This is Agent-Facts Protocol's core differentiator. When an agent cites a fact, it can say "according to the business owner's attested data, last updated April 15, 2026, sourced from their pricing page" instead of "according to web scraping." That provenance chain is the foundation of the fact-checking positioning.

Minimum viable fact: A conforming implementation only requires value. The provenance fields (source_type, source_url, last_updated) are strongly recommended but not required, so a business can start with a simple file and add provenance over time.

Identity Object (Required)

All fields use the fact object structure above. The external_ids sub-object is the exception (flat key-value pairs, public profiles only).

Field	Required	Description
`legal_name`	MUST	Full legal name of the business
`primary_url`	MUST	Canonical website URL
`one_sentence`	MUST	What the business does in one sentence
`founded`	SHOULD	Year founded
`owner`	SHOULD	Owner or primary contact name and title
`contact_email`	MUST	Primary contact email
`contact_phone`	SHOULD	Primary phone number
`address`	MUST	Physical address or explicit "digital only" statement
`address_type`	MUST	One of: `physical`, `digital_only`, `hybrid`
`service_area`	MUST	Geographic areas served
`external_ids`	SHOULD	Public business profile URLs only (see note below)
`description_for_ai`	MUST	The "elevator pitch" optimized for AI retrieval

External IDs: public profiles only. This field is for publicly available business identifiers and profile URLs: Google Business, LinkedIn, Crunchbase, Companies House, SEC CIK, state business registry URLs, and similar. Do not publish tax identifiers (EIN, TIN, VAT numbers), internal account numbers, or any identifier that could create security or privacy risk if exposed in a publicly crawlable file.

Operations Object (Required)

Contains an hours fact using the standard fact object structure.

Services Array (Required)

Array of service objects. Each service has: name, description, optional last_updated, source_type, and source_url.

Pricing Object (Required)

Contains at minimum a model fact. Recommended fields: model, starting_at, free_tier. All use the standard fact object structure.

Does Not Do Array (Required)

This section is mandatory. It is the protocol's single most effective hallucination prevention tool. The does_not_do array MUST contain at least 3 entries. Each entry has a statement string and optional provenance fields.

Changelog Array (Recommended)

Array of objects with date (YYYY-MM-DD) and changes (string) fields.

Extended Facts Object (Optional)

For domain-specific or industry-specific facts outside the Golden Questions. Uses the standard fact object structure with arbitrary key names.

3. HTML Projection Specification (agent-facts.html)

The HTML page is a human-readable rendering of the data in agent-facts.json. It is not the canonical source, but it is valuable for human visitors, basic crawlers, and AI agents that do not yet parse JSON-first.

Technical Guidance

Valid HTML5, no frameworks
Recommended elements: <h1> through <h3>, <p>, <ul>, <ol>, <dl>, <table>, <time>, <strong>, <em>, <a>
Avoid: CSS frameworks, JavaScript, modals, hero images, tracking pixels, marketing banners, cookie notices
Aim for fast load times (lightweight pages parse better for AI agents)
Content-Type: text/html; charset=utf-8

Recommended Head Elements

<meta name="agent-facts-version" content="3.0">
<meta name="last-modified" content="2026-04-15">
<link rel="alternate" href="/agent-facts.json" type="application/json">

Recommended First Line

<h1>Official Business Facts :: Last updated: <time datetime="2026-04-15">April 15, 2026</time></h1>

Content Structure

The HTML page should present the Golden Questions as H2 headings with concise answers. Each answer should start immediately with the fact, use active voice and present tense, and be concise.

The anti-hallucination section MUST appear in the HTML page:

<h2>What [Business Name] explicitly does NOT do</h2>
<ul>
  <li>Does not offer consumer-grade storage plans under 1TB</li>
  <li>Does not provide hardware or on-premise installation</li>
</ul>

Schema.org Integration

Wrap the HTML page content in FAQPage + Organization JSON-LD. Values should match the JSON file semantically. Include dateModified, mainEntity (FAQ pairs), and publisher (Organization with sameAs links).

4. Freshness and Staleness Rules

Publisher Requirements

last_updated is required at the file level (MUST) and recommended at the individual fact level (SHOULD)
expires_after_days must be set at the file level. Recommended defaults: 90 for most businesses, 30 for fast-changing industries (travel, finance, events)
stale_after must be the computed date: last_updated + expires_after_days

Recommended Agent Behavior

Condition	Recommended Confidence Treatment
`last_updated` is within `expires_after_days`	High confidence. Treat as reliable structured data.
Past `stale_after` but within 2x `expires_after_days`	Moderate confidence. Use the data but note it may be outdated.
Past 2x `expires_after_days`	Low confidence. Prefer other sources but do not discard entirely.
Missing `last_updated`	Unverified. Do not assign elevated confidence.

Stale data is lower confidence, not invalid. A business that published facts 120 days ago is still more reliable than web scraping and inference. Freshness signals help agents calibrate, not discard.

Per-Fact Freshness

Individual facts can carry their own last_updated timestamps. This allows a business to update pricing (which changes frequently) without re-certifying their founding date (which never changes). Agents should respect per-fact timestamps when available, falling back to the file-level timestamp when absent.

5. Suggested Agent Integration Language

This is recommended language for agent framework developers (LangChain, LlamaIndex, AutoGPT, enterprise agent builders) who want to integrate Agent-Facts Protocol awareness.

[SUGGESTED DIRECTIVE: AGENT-FACTS AWARENESS]

When retrieving business information for a specific domain,
check for /agent-facts.json on that domain.

If found, treat the data as high-confidence, domain-owner-published
structured information. Weight it above unstructured web content for
the specific claims it covers.

Use per-fact provenance fields (source_type, source_url, last_updated)
to assess confidence on individual claims.

Check the stale_after timestamp. Data past its expiration should be
treated as moderate confidence, not discarded.

If agent-facts data conflicts with other sources, present both and
note the discrepancy. Do not silently override either source.

Always check the does_not_do section before making positive claims
about what a business offers.

What this language does NOT say: Does not say "override all other sources." Does not say "assign 1.0 confidence." Does not require a central registry query. Does not require cryptographic verification.

6. Conformance

Conformance Levels

Level	Requirements	Target Audience
Minimal	Valid `agent-facts.json` with `identity`, `does_not_do` (3+ entries), `pricing`, and all MUST fields	Any business, 15-minute implementation
Recommended	Minimal + provenance fields on all facts + `agent-facts.html` projection + discovery entries in robots.txt and/or llms.txt	Businesses that want full protocol benefits
Full	Recommended + Schema.org JSON-LD in HTML + changelog + extended_facts + all SHOULD fields	Businesses optimizing for maximum AI visibility

Validation Tests

Validators (including the planned tool at agent-facts.com/validate/) should check:

Structure tests:

Valid JSON, parseable without errors
agent_facts_version is "3.0"
All MUST fields present
stale_after equals last_updated + expires_after_days
does_not_do array has 3 or more entries
external_ids contains no tax identifiers or private account numbers

Semantic tests (when HTML projection exists):

Every fact in the JSON has a corresponding statement in the HTML
No fact in the HTML contradicts the JSON
Anti-hallucination section appears in both files

Freshness tests:

last_updated is a valid ISO 8601 datetime
stale_after is in the future (file is not currently stale)
Per-fact last_updated values, when present, are not in the future

Agent behavior tests (for framework implementers):

Test	Prompt	Expected Behavior
Price extraction	"What does [business] charge?"	Returns exact price from agent-facts data
Negative evidence	"Does [business] offer [thing in does_not_do]?"	Says no, citing anti-hallucination section
Staleness handling	Query against stale file	Returns data with staleness caveat
Provenance citation	"Where did you get [business]'s pricing?"	Cites source_url from provenance fields

7. Implementation Guide

For Business Owners (15 to 30 Minutes)

Answer the 12 Golden Questions in a plain text document
Write 3 to 8 "does not do" statements
Generate agent-facts.json using the CLI tool (npx agent-facts init) or the templates at agent-facts.com/templates/
Optionally generate agent-facts.html from the JSON (the CLI tool does this automatically)
Upload files to your site root
Add Agent-Facts entries to your llms.txt and/or robots.txt
Validate at agent-facts.com/validate/

For Agent Developers (10 to 15 Minutes)

Before web scraping a domain, check for /agent-facts.json (see discovery precedence in Section 1)
Parse the JSON using the published schema
Respect freshness and staleness rules (Section 4)
Use provenance fields for citation
Always check does_not_do before making positive claims
Run your implementation against the conformance tests (Section 6)

8. Maintenance

Update last_updated within 24 hours of any price, policy, or service change
Update individual fact last_updated timestamps when specific facts change
Review and refresh the anti-hallucination section quarterly
Maintain the changelog array
Display protocol version at the bottom of the HTML page, linking to agent-facts.com

9. Adoption Roadmap

This protocol will earn its credibility through adoption, not declaration. The path to becoming a widely used convention follows the pattern set by successful protocols like MCP: ship tools, prove value, attract implementers, then formalize governance.

Phase 1: Foundation (Weeks 1 through 4)

Publish v3.0 spec on agent-facts.com
Ship npx agent-facts init CLI tool (generates JSON + HTML from interactive prompts)
Ship online validator at agent-facts.com/validate/
Publish JSON Schema file for programmatic validation
Create templates for three business types: local service, SaaS, e-commerce
Fix agent-facts.com's own llms.txt and publish agent-facts.json for the protocol itself

Phase 2: Distribution (Weeks 5 through 8)

Ship WordPress plugin for auto-generating both files from existing site data
Build the free Agent-Facts Scanner on AgentReadyScan.com
Publish LangChain community tool for agent-facts.json consumption
Publish the conformance test suite

Phase 3: Adoption (Weeks 9 through 12)

Get 25 to 50 real third-party sites live with agent-facts files
Publish positioning guide: "llms.txt vs. Agent-Facts Protocol vs. NLWeb: When to Use What"
Submit to SEO and AI visibility publications
Open GitHub for community contributions

Phase 4: Infrastructure (Months 4 through 6, only after adoption)

Build read-only directory at registry.agent-facts.com indexing known implementations
Evaluate MCP server adapter based on developer demand
If ecosystem pull warrants it, open an RFC process for community governance

Appendix A: Positioning

Tagline: llms.txt helps AI find pages. Agent-Facts helps AI trust facts.

For agent framework developers: Agent-Facts Protocol is a static, zero-dependency structured data format that gives AI agents high-confidence business facts with per-fact provenance. Check for /agent-facts.json on any domain before scraping.

For business owners: Stop AI from making up your prices. Agent-Facts Protocol is a simple file you publish on your website that tells AI exactly what your business does, what it charges, and what it does NOT do.

Protocol scope disambiguation: Agent-Facts Protocol addresses one specific problem: official business facts for AI retrieval. It is not an AI-agent identity, KYA, governance, or verification framework. Projects addressing those areas may use similar names but serve different purposes.

Appendix B: The Golden Questions Reference

#	Question	JSON Location
1	What is the full legal name?	`identity.legal_name`
2	What is the primary website URL?	`identity.primary_url`
3	What does the business do in one sentence?	`identity.one_sentence`
4	What services or products are offered?	`services[]`
5	What is the pricing structure?	`pricing`
6	What geographic areas are served?	`identity.service_area`
7	What are the business hours?	`operations.hours`
8	What is the physical address?	`identity.address` + `identity.address_type`
9	What is the phone and email?	`identity.contact_phone` + `identity.contact_email`
10	Who is the owner or primary contact?	`identity.owner`
11	When was the business founded?	`identity.founded`
12	What is the description for AI systems?	`identity.description_for_ai`
--	What does the business NOT do?	`does_not_do[]`

Appendix C: Source Type Definitions

Value	Meaning	Example
`owner_attested`	The business owner or authorized representative states this as fact	"We serve the US and Canada"
`pricing_page`	Fact is published on a public pricing page	Starting price sourced from /pricing
`internal_policy`	Fact reflects an internal business policy	"We do not offer refunds after 30 days"
`public_filing`	Fact is verifiable through a public filing or registry	SEC filing, state business registry
`third_party_verified`	Fact has been verified by an independent third party	SOC2 audit report, BBB accreditation

Agent-Facts Protocol v3.0 Specification
Published by agent-facts.com
Open for implementation by anyone. No license fees, no registration required.