Crawler Logs
Track AI crawler visits (ChatGPT, Perplexity, Claude, Google, etc.) to your website in real-time.
Why Track AI Crawlers?
AI-powered search and assistants are changing how people discover content. Asky helps you understand:
- Which AI bots are crawling your site - GPTBot, ClaudeBot, PerplexityBot, etc.
- Live retrieval traffic - When ChatGPT or Perplexity cite your content in real-time
- Which pages AI systems access - Understand what content AI finds valuable
- Citation patterns - See when your site is used as a source in AI answers
Different AI tools behave differently. ChatGPT fetches pages in real-time when answering queries, while Perplexity often uses pre-indexed content from background crawls.
Setup
Choose your platform to get started:
Cloudflare
Cloudflare Worker
Use a Cloudflare Worker to intercept requests and send visit data to Asky.
This requires a Cloudflare account with Workers enabled. The free tier includes 100,000 requests/day which is sufficient for most sites.
Get Your Domain ID
- Log in to your Asky dashboard
- Go to AI Search → Crawler Logs in the sidebar
- Click Setup Instructions
- Copy your Domain ID (a UUID like
1d44d1d2-7396-4721-949d-8817c8e50265)
Create a Cloudflare Worker
- Log in to your Cloudflare Dashboard
- Select your website
- Go to Workers Routes in the sidebar (under Workers & Pages)
- Click Create Route or Manage Workers
- Create a new Worker
Add the Worker Code
Paste the following code into your Worker, replacing YOUR_DOMAIN_ID with your actual Domain ID:
export default {
async fetch(req, env, ctx) {
const url = new URL(req.url);
// Skip static assets - they don't need tracking
if (url.pathname.match(/\.(png|jpg|jpeg|gif|svg|css|js|ico|woff2?|ttf|map)$/i)) {
return fetch(req);
}
// Build the log payload
const log = {
ua: req.headers.get("user-agent") || "Unknown",
referrer: req.headers.get("referer") || null,
host: url.hostname, // Required for security validation
path: url.pathname,
query: url.search || null,
ip: req.headers.get("cf-connecting-ip") || null,
ts: Date.now()
};
// Send to Asky asynchronously (non-blocking)
ctx.waitUntil(
fetch("https://zmmfehojjrqgcfapzgqy.supabase.co/functions/v1/collect", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
domain_id: "YOUR_DOMAIN_ID", // Replace with your Domain ID
...log
})
}).catch((err) => console.error("Error sending log:", err))
);
// Continue serving the original request
return fetch(req);
}
};The host field is required for security validation. Asky verifies that the hostname matches the domain registered with your Domain ID, preventing data pollution from unauthorized sources.
Configure the Worker Route
- Go to Workers Routes for your domain
- Add a route pattern:
*yourdomain.com/*(replace with your actual domain) - Select your newly created Worker
- Click Save
Make sure to replace yourdomain.com with your actual domain. You may want to exclude certain subdomains like www. if they redirect to your main domain.
Verify Installation
Crawler Logs only shows bot traffic, not regular human visits. To test your setup:
- Go to ChatGPT and ask a question about your site (e.g., “What does [yoursite.com] offer?”)
- Check your Asky dashboard → AI Search → Crawler Logs
- You should see visits from OpenAI bots appear within seconds
You may see different OpenAI bots depending on how ChatGPT processes your query:
- ChatGPT Citations (OpenAI) - Live fetch when ChatGPT references your content
- OAI-SearchBot (OpenAI) - Search indexing
- GPTBot (OpenAI) - Training data crawler
If no visits appear, try asking about specific content on your pages. You can also use a browser extension to set your User-Agent to GPTBot/1.1 and visit your site directly.
Excluding Subdomains
If you have multiple subdomains and only want to track some of them:
// Skip specific subdomains
if (url.hostname === "www.yourdomain.com") {
return fetch(req);
}
if (url.hostname === "api.yourdomain.com") {
return fetch(req);
}Excluding Additional Paths
To skip tracking on certain paths (e.g., admin areas):
// Skip admin and API routes
if (url.pathname.startsWith("/admin") || url.pathname.startsWith("/api/")) {
return fetch(req);
}Detected Bots
Asky automatically detects and categorizes the following AI crawlers:
| Bot | Displayed As | Company | Type |
|---|---|---|---|
| GPTBot | GPTBot (OpenAI) | OpenAI | Training crawler |
| ChatGPT-User | ChatGPT Citations (OpenAI) | OpenAI | Live retrieval |
| OAI-SearchBot | OAI-SearchBot (OpenAI) | OpenAI | Search indexing |
| ClaudeBot | ClaudeBot (Anthropic) | Anthropic | Training crawler |
| Claude-User | Claude Citations (Anthropic) | Anthropic | Live retrieval |
| PerplexityBot | PerplexityBot (Perplexity) | Perplexity | Crawler |
| Perplexity-User | Perplexity Citations (Perplexity) | Perplexity | Live retrieval |
| Googlebot | Googlebot (Google) | Search crawler | |
| Google-Extended | Google-Extended (Google) | AI training | |
| Bingbot | Bingbot (Microsoft) | Microsoft | Search crawler |
Troubleshooting
No Data Appearing
- Check your Domain ID - Verify you’re using the correct UUID from your Asky dashboard
- Check logs - Look for errors in your platform’s logs (Cloudflare Workers, Vercel, etc.)
- Wait a moment - Data may take a few seconds to appear
403 Domain Mismatch Error
If you see 403 errors, the host field doesn’t match the domain registered with your Domain ID.
- Ensure
hostis included in your log payload - Check your domain registration - The hostname must exactly match (e.g.,
example.comvswww.example.com) - Handle www variants - Normalize the hostname:
host.replace(/^www\./, '')
High Request Volume
The tracking runs on every request. To reduce volume:
- Add more static file extensions to the skip list
- Skip paths that don’t need tracking (like
/api/*) - Consider only tracking specific paths you care about
Need Help?
If you’re stuck or have questions about setting up crawler tracking, reach out to us at hello@getasky.com.