Auto-Expire or Report Option for AMP Cached Pages from 404 Origin URLs
Description
Currently, AMP cached pages on cdn.ampproject.org continue to appear in Google SERPs even when the origin page is completely inaccessible (i.e., returns a 404 Not Found or 410 Gone). This causes major abuse potential, especially in cases involving:
- Hacked domains (e.g., .gov or .edu sites)
- Blackhat SEO using AMP for parasite hosting
- Expired or abandoned domains serving stale AMP content
The AMP cache still serves the page even after the source is long gone — creating the illusion of valid content and polluting search results.
🚀 Feature Request:
- Auto-refresh or invalidate AMP cache when the origin consistently returns 404/410 on fetch.
- Add a “Report this AMP page” button on the AMP cache viewer page (e.g.,
https://xxx.cdn.ampproject.org/c/s/domain.com/path) for pages whose origin no longer exists. - Include a mechanism to revalidate stale cache from a source that no longer meets the serving criteria (expired cert, 404 origin, etc).
Alternatives Considered
- Manually reporting AMP cache pages via Google Search “Feedback” system (not scalable, inconsistent results, and lacks visibility).
- Waiting for AMP cache expiration passively, which may take weeks or months — often long after the origin has disappeared.
- Using the “Remove Outdated Content” tool from Google Search Console, which requires user-side manual effort and is not always reliable for mass abuse cases.
Additional Context
🕵️ Example of Abuse Case:
- AMP Cache URL:
https://kasih--profit-pages-dev.cdn.ampproject.org/c/s/kasih-profit.pages.dev/judi%20slot%20microstar88/ - Original Source:
https://ppdb.man2kotabjm.sch.id/resources/?hkm=judi%20slot%20microstar88→ This returns 404 (confirmed).
These stale AMP cache entries are being widely exploited for blackhat SEO manipulation, especially via hacked government and educational websites. This undermines both the credibility of AMP as a technology and Google's index integrity.
A feature like automatic invalidation or a user-accessible report button would help significantly mitigate this abuse.
thanks for the report @BellaAnasastasya. starting to investigate
@BellaAnasastasya could you clarify your example:
AMP Cache URL: https://kasih--profit-pages-dev.cdn.ampproject.org/c/s/kasih-profit.pages.dev/judi%20slot%20microstar88/
Original Source: https://ppdb.man2kotabjm.sch.id/resources/?hkm=judi%20slot%20microstar88
The original source for the AMP Cache URL you provided should be: kasih-profit.pages.dev/judi%20slot%20microstar88 and it is still up. The amp cache url you provided and the original source have completely different domains
Thanks for the response and follow-up. Let me clarify the situation — this is a bit more nuanced than a typical AMP cache case.
The AMP Cache URL:
https://kasih--profit-pages-dev.cdn.ampproject.org/c/s/kasih-profit.pages.dev/judi%20slot%20microstar88/
did not originate from kasih-profit.pages.dev directly.
Instead, it originated from a hijacked .sch.id domain:
https://ppdb.man2kotabjm.sch.id/resources/?hkm=judi%20slot%20microstar88 (which is now 404)
At the time this .sch.id page was active, it included a <link rel="amphtml" href="https://kasih-profit.pages.dev/..."> meta tag, which told Google (and AMP crawlers) that the AMP version of that page was hosted on kasih-profit.pages.dev.
That AMP URL (hosted on a different domain) was then cached by AMP Cache and still appears in Google mobile SERPs — even though the source .sch.id page is now 404.
Visual Evidence:
If you search in Google mobile:
slot site:ppdb.man2kotabjm.sch.id/resources/?hkm= or slot site:ppdb.man2kotabjm.sch.id
You’ll see that the .sch.id page shows up with AMP format, pointing to the cached version:
google.com/amp/s/kasih-profit.pages.dev/...
Why this matters:
This is a blackhat parasite-hosting trick:
- Bad actors hijack a trusted
.sch.iddomain, - Inject spam content + a
rel=amphtmlpointing to their own domain (e.g.,kasih-profit.pages.dev) - Google sees it, crawls it, and caches the AMP version,
- Then they delete the original
.sch.idpage — but AMP Cache still serves the content
So technically, the cached AMP page still works, but its original context is fraudulent, and the origin (where the AMP link came from) is now gone.
What we need:
If an AMP cache is based on a now-deleted origin (even if the AMP page is still live), there should be a way to:
- Invalidate the cache (because it’s detached from the original source context)
- Prevent these hijacked AMP pages from continuing to rank in Google Search
Important Notes:
Kindly, we’d like to emphasize that this request is not something that can be solved by simply pointing us to the “Remove Outdated Content” tool. This is a systemic issue — and trying to remove each case manually is not scalable, especially when there are hundreds or even thousands of URLs using this abuse method across many compromised domains.
Also, in countries like Indonesia, many of these government or educational institutions do not have the resources or urgency to respond quickly — so relying on them to fix or delist the content is unfortunately unrealistic.
We hope AMP can support a more automated and proactive solution for this problem, especially since the abuse is clearly growing and exploiting AMP’s own architecture.
Let us know if further clarification or technical logs are needed — we’re happy to assist.
thanks @BellaAnasastasya for the explanation. Triaging and We'll look into it. I might have a few more follow up questions.
Thankyou sir @erwinmombay
Hi, I’m interested in working on this. Can I take it?
Hello sir @erwinmombay is there any update? thankyou
@BellaAnasastasya Im still doing my research so I can propose a solution.
One thing I wanted to confirm though, if you search for slot site:ppdb.man2kotabjm.sch.id/resources/?hkm=
or slot site:ppdb.man2kotabjm.sch.id are you still seeing it being associated to the AMP document? currently when I make this search i just get directed to https://ppdb.man2kotabjm.sch.id/plugins/?ez=JANDA+SLOT+LOGIN which is a 404 page. I'm going to need this to prove to our PM's that this is an actual issue that needs to be escalated. Appreciate your help
I will give you example from another url in video sir
site:batuankaler.desa.id / bosku777 site:batuankaler.desa.id
https://github.com/user-attachments/assets/936e0f22-0e3f-46e1-b39a-160911246a3a
i hope this video help sir @erwinmombay , thanks
@BellaAnasastasya much appreciated, will try and reproduce the video
okay thanks @erwinmombay , and last thing, i forget to tell before, the AMP is also available when we search "bosku777" and click that url that i mention before (batuankaler.desa.id),, i made the keywords using site:.......domain.com,, just to make easy example. thanks sir
Upon deeper inspection, especially after reviewing the AMP runtime file /src/document-fetcher.js, it's clear that AMP has no persistent handling of 404/410 origin errors, nor any system that re-checks whether the original page still includes the <link rel="amphtml">.
I completely understand that constant auto-refetching would be resource-intensive and unscalable. So instead of that, I’d like to propose a middle-ground solution that improves AMP cache integrity without harming performance.
✅ Updated Proposal
Core Idea:
If the origin page no longer contains <link rel="amphtml"> pointing to the AMP version, and/or the page returns a 404 or 410 consistently, then AMP Cache should offer a mechanism to "refresh" or "invalidate" the AMP cache — even without requiring Search Console access.
Implementation Details:
- Add a "Report Stale AMP" or "Refresh AMP Cache" button on AMP viewer pages (e.g.,
cdn.ampproject.orgURLs). - When clicked, the AMP system:
- Checks whether the original page still contains the rel=amphtml tag.
- Optionally retries after a 1–2 day grace period to avoid false positives.
- If after that grace period:
- The origin still returns 404/410 or
- The AMP reference (
rel=amphtml) is no longer present,
→ Then AMP Cache should auto-expire or invalidate the cached AMP document.
🚨 Important Clarification Regarding Security & Abuse Scenarios
This proposed system must not rely on Google Search Console ownership cancellation, unlike the current “Remove Outdated Content” process — because in many blackhat SEO cases:
- The attacker has control of the GSC (Search Console) account due to domain hijacking or site compromise.
- If the system allows cancellation from GSC, the attacker could simply reject all cleanup attempts.
- Meanwhile, the legitimate domain owner may have lost access, and the cached AMP content continues to rank, enabling abuse.
That’s why this mechanism should treat the absence of <link rel="amphtml"> as a hard signal that the origin no longer supports or acknowledges the AMP page — and should not be reversible by GSC in this case.
Of course, if the origin still contains <link rel="amphtml"> but is temporarily unreachable (e.g., DDoS or hosting issue), then the AMP Cache can wait or retry before acting — since the origin clearly still intends to support AMP.
Why this matters
- Avoids blind dependence on Search Console (which can be weaponized in abuse cases)
- Uses a technical signal (meta tag presence + HTTP status) to determine legitimacy
- Balances scalability with abuse prevention
- Aligns with how “Remove Outdated Content” works, but more abuse-resilient
Happy to provide mockups or draft the revalidation logic if needed.
Thanks again for your time and support!
@BellaAnasastasya thanks for that! I'll integrate this with my internal documentation. I still need to make a proposal and go through review so please bare with me.
Hello sir @erwinmombay, any update sir?
@BellaAnasastasya still conducting research. ill find time to prioritize this
thanks sir
just updating this for visibility. I am looking into the expiration logic right now. and looking into feasibility of some of the proposals.
@BellaAnasastasya with the slot site:ppdb.man2kotabjm.sch.id example, does it look like the links 404 now? I recognize this is still an issue but trying to see if the documents expired since i was using them as an example
please wait sir, im looking for good example that can help the case
try this one sir : pinjoltogel site:sipp.pa-girimenang.go.id/search/
Hope this example help