Crawler Logs

Track AI crawler visits (ChatGPT, Perplexity, Claude, Google, etc.) to your website in real-time.

Why Track AI Crawlers?

AI-powered search and assistants are changing how people discover content. Asky helps you understand:

Which AI bots are crawling your site - GPTBot, ClaudeBot, PerplexityBot, etc.
Live retrieval traffic - When ChatGPT or Perplexity cite your content in real-time
Which pages AI systems access - Understand what content AI finds valuable
Citation patterns - See when your site is used as a source in AI answers

Different AI tools behave differently. ChatGPT fetches pages in real-time when answering queries, while Perplexity often uses pre-indexed content from background crawls.

Setup

Choose your platform to get started:

Cloudflare

Cloudflare Worker

Use a Cloudflare Worker to intercept requests and send visit data to Asky.

This requires a Cloudflare account with Workers enabled. The free tier includes 100,000 requests/day which is sufficient for most sites.

Get Your Domain ID

Log in to your Asky dashboard
Go to AI Search → Crawler Logs in the sidebar
Click Setup Instructions
Copy your Domain ID (a UUID like 1d44d1d2-7396-4721-949d-8817c8e50265)

Create a Cloudflare Worker

Log in to your Cloudflare Dashboard
Select your website
Go to Workers Routes in the sidebar (under Workers & Pages)
Click Create Route or Manage Workers
Create a new Worker

Add the Worker Code

Paste the following code into your Worker, replacing YOUR_DOMAIN_ID with your actual Domain ID:


export default {
  async fetch(req, env, ctx) {
    const url = new URL(req.url);
 
    // Skip static assets - they don't need tracking
    if (url.pathname.match(/\.(png|jpg|jpeg|gif|svg|css|js|ico|woff2?|ttf|map)$/i)) {
      return fetch(req);
    }
 
    // Build the log payload
    const log = {
      ua: req.headers.get("user-agent") || "Unknown",
      referrer: req.headers.get("referer") || null,
      host: url.hostname,  // Required for security validation
      path: url.pathname,
      query: url.search || null,
      ip: req.headers.get("cf-connecting-ip") || null,
      ts: Date.now()
    };
 
    // Send to Asky asynchronously (non-blocking)
    ctx.waitUntil(
      fetch("https://zmmfehojjrqgcfapzgqy.supabase.co/functions/v1/collect", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
          domain_id: "YOUR_DOMAIN_ID", // Replace with your Domain ID
          ...log
        })
      }).catch((err) => console.error("Error sending log:", err))
    );
 
    // Continue serving the original request
    return fetch(req);
  }
};

The host field is required for security validation. Asky verifies that the hostname matches the domain registered with your Domain ID, preventing data pollution from unauthorized sources.

Configure the Worker Route

Go to Workers Routes for your domain
Add a route pattern: *yourdomain.com/* (replace with your actual domain)
Select your newly created Worker
Click Save

Make sure to replace yourdomain.com with your actual domain. You may want to exclude certain subdomains like www. if they redirect to your main domain.

Verify Installation

Crawler Logs only shows bot traffic, not regular human visits. To test your setup:

Go to ChatGPT and ask a question about your site (e.g., “What does [yoursite.com] offer?”)
Check your Asky dashboard → AI Search → Crawler Logs
You should see visits from OpenAI bots appear within seconds

You may see different OpenAI bots depending on how ChatGPT processes your query:

ChatGPT Citations (OpenAI) - Live fetch when ChatGPT references your content
OAI-SearchBot (OpenAI) - Search indexing
GPTBot (OpenAI) - Training data crawler

If no visits appear, try asking about specific content on your pages. You can also use a browser extension to set your User-Agent to GPTBot/1.1 and visit your site directly.

Excluding Subdomains

If you have multiple subdomains and only want to track some of them:


// Skip specific subdomains
if (url.hostname === "www.yourdomain.com") {
  return fetch(req);
}
 
if (url.hostname === "api.yourdomain.com") {
  return fetch(req);
}

Excluding Additional Paths

To skip tracking on certain paths (e.g., admin areas):


// Skip admin and API routes
if (url.pathname.startsWith("/admin") || url.pathname.startsWith("/api/")) {
  return fetch(req);
}

Custom Server

Custom Server Integration

For custom servers (Node.js, Python, Go, etc.), send logs directly to the Asky API.

API Endpoint


POST https://zmmfehojjrqgcfapzgqy.supabase.co/functions/v1/collect
Content-Type: application/json

Request Body


{
  "domain_id": "your-domain-uuid",
  "host": "yourdomain.com",
  "ua": "Mozilla/5.0 ... GPTBot/1.0",
  "path": "/page-path",
  "query": "?param=value",
  "referrer": "https://google.com",
  "ip": "1.2.3.4",
  "ts": 1704067200000
}

Required Fields

Field	Type	Description
`domain_id`	string	Your Domain ID from the Asky dashboard
`host`	string	The hostname of the request (for validation)
`ua`	string	User-Agent header
`path`	string	Request path

Optional Fields

Field	Type	Description
`query`	string	Query string
`referrer`	string	Referer header
`ip`	string	Client IP address
`ts`	number	Timestamp in milliseconds

Node.js Example


// Express middleware
app.use(async (req, res, next) => {
  // Don't block the request
  fetch("https://zmmfehojjrqgcfapzgqy.supabase.co/functions/v1/collect", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      domain_id: process.env.ASKY_DOMAIN_ID,
      host: req.hostname,
      ua: req.headers["user-agent"],
      path: req.path,
      query: req.url.includes("?") ? req.url.split("?")[1] : null,
      referrer: req.headers["referer"],
      ip: req.ip,
      ts: Date.now()
    })
  }).catch(console.error);
 
  next();
});

Python Example


import requests
import threading
from flask import request
 
def log_to_asky():
    def send():
        requests.post(
            "https://zmmfehojjrqgcfapzgqy.supabase.co/functions/v1/collect",
            json={
                "domain_id": os.environ["ASKY_DOMAIN_ID"],
                "host": request.host,
                "ua": request.headers.get("User-Agent"),
                "path": request.path,
                "query": request.query_string.decode(),
                "referrer": request.headers.get("Referer"),
                "ip": request.remote_addr,
                "ts": int(time.time() * 1000)
            }
        )
    # Non-blocking
    threading.Thread(target=send).start()
 
@app.before_request
def before_request():
    log_to_asky()

Always send logs asynchronously to avoid adding latency to your responses.

Detected Bots

Asky automatically detects and categorizes the following AI crawlers:

Bot	Displayed As	Company	Type
GPTBot	GPTBot (OpenAI)	OpenAI	Training crawler
ChatGPT-User	ChatGPT Citations (OpenAI)	OpenAI	Live retrieval
OAI-SearchBot	OAI-SearchBot (OpenAI)	OpenAI	Search indexing
ClaudeBot	ClaudeBot (Anthropic)	Anthropic	Training crawler
Claude-User	Claude Citations (Anthropic)	Anthropic	Live retrieval
PerplexityBot	PerplexityBot (Perplexity)	Perplexity	Crawler
Perplexity-User	Perplexity Citations (Perplexity)	Perplexity	Live retrieval
Googlebot	Googlebot (Google)	Google	Search crawler
Google-Extended	Google-Extended (Google)	Google	AI training
Bingbot	Bingbot (Microsoft)	Microsoft	Search crawler

Troubleshooting

No Data Appearing

Check your Domain ID - Verify you’re using the correct UUID from your Asky dashboard
Check logs - Look for errors in your platform’s logs (Cloudflare Workers, Vercel, etc.)
Wait a moment - Data may take a few seconds to appear

403 Domain Mismatch Error

If you see 403 errors, the host field doesn’t match the domain registered with your Domain ID.

Ensure host is included in your log payload
Check your domain registration - The hostname must exactly match (e.g., example.com vs www.example.com)
Handle www variants - Normalize the hostname: host.replace(/^www\./, '')

High Request Volume

The tracking runs on every request. To reduce volume:

Add more static file extensions to the skip list
Skip paths that don’t need tracking (like /api/*)
Consider only tracking specific paths you care about

Need Help?

If you’re stuck or have questions about setting up crawler tracking, reach out to us at hello@getasky.com.