How AdSense Preflight works

Honest methodology. What's real, what's approximated, what's impossible.

The three Google bots that matter

AdSense uses multiple specialized crawlers. Understanding which is which is the foundation of this tool.

Googlebot — the indexer

User-agent: Googlebot

Discovers and indexes pages for regular Google Search. AdSense uses its crawl history as a signal: if Googlebot has never successfully crawled your site, AdSense won't seriously consider it.

AdsBot-Google — the reviewer

User-agent: AdsBot-Google / AdsBot-Google-Mobile

The bot that decides if you get approved. It checks reachability (was the site down?), verifies ads.txt, evaluates pages against AdSense policies. This bot only visits during an active AdSense review.

Mediapartners-Google — the ad matcher

User-agent: Mediapartners-Google

Reads page content to decide which ads to show. Only relevant after approval.

What we replicate from Google's review

✅ Lighthouse audits (exact engine)

We call googleapis.com/pagespeedonline/v5/runPagespeed directly. The audits, scoring, and recommendations come straight from Google's servers — same engine they use internally.

✅ Core Web Vitals (real Chrome data)

Google's Chrome User Experience Report (CrUX) records anonymous performance data from real Chrome users. This is the closest thing that exists to Google's private telemetry — and it's the data AdSense reviewers see.

✅ Server reachability patterns

Our log parser detects zero-byte responses, 5xx errors, and slow responses — the exact patterns that trigger AdSense's "Site down or unavailable" verdict. We catch these before the next AdsBot visit, not after.

✅ Bot verification

15+ classification rules identify real Googlebot (by reverse-DNS-able IP ranges and user-agent patterns), AdsBot-Google, fake Tencent bot farms, scraper tools, and brute-force probes.

✅ Domain history

Auto-checked via the Internet Archive's Wayback Machine API. Returns first-seen date and approximate age — critical for 2026's stricter domain-age preferences.

What we cannot replicate (and why)

❌ Content-quality ML classifier

Google trained models on billions of sites to detect "low value content," AI-generated content, and scraped content. The model weights are proprietary. No public tool can replicate this.

❌ Domain trust graph

Google has 25 years of data on every domain — previous owners, spam associations, backlink quality, ad ecosystem reputation. This database is internal.

❌ Human reviewer judgment

Edge cases go to Google employees who make subjective calls. We can't predict their decisions.

❌ WHOIS / Safe Browsing / live indexing

Free browser-callable APIs don't exist for these. We provide one-click links to Google's official tools instead.

How the score is calculated

The site readiness score combines multiple weighted components:

If no Chrome field data is available (very new or low-traffic site), the overall score is reduced by 10% — this itself is an AdSense signal.

The server score uses log-based deductions:

The 2026 changes that matter

Google tightened AdSense approval criteria in 2026. The unofficial benchmarks now circulating among publishers who recently got approved (or rejected):

Privacy

The site is pure HTML + JavaScript with no backend. Here's what gets sent where:

Recommended workflow

  1. Run a check before applying for AdSense the first time
  2. If score is below 75, fix the High-priority items and re-run
  3. If you're rejected, run a check before clicking "I confirm I have fixed the issues"
  4. Use server logs + URL together for the most complete picture
  5. Aim for combined score 85+ before requesting any AdSense review