For Dup checking with archive posts and 12ft proxy, check the underlying URL as well
One remaining source of duplicates is where a link in a Link Post is for a method of bypassing a paywall (e.g., Archive.today, 12ft.io, Internet Archive's Wayback / Archive.org), and the earlier Link Post was for the original article (and vice-versa). This request is to consider deducing the underlying source article link from the paywall bypass page and then checking for dupes using that. For checking the other way, where an earlier article was the link for the paywall bypass page to then store that underlying link so that a later Link Post using the original link for the article will find the earlier content and show it as a possible dupe.
For Archive.today, there are numerous extensions/mirrors (e.g., archive.is, archive.ph, archive.vn, etc., see [wiki article](the underlying source article link) for the full list.)
The frequency of this occurring has increased recently as the number of posts under /recent have increased, and thus those submitting aren't taking the time to go through the list of posts to see if their content was previously shared. It's still not all that significant, ... maybe occurring once every couple of days, but that frequency will likely increase as /recent gets even busier, so I am sharing this here.
Issue #179 is related but is not quite the same, so I created this Issue to specifically address paywall bypass methods.
This should be provided by a bot because archive turnover is high.