ArchiveBot icon indicating copy to clipboard operation
ArchiveBot copied to clipboard

Remove www. from domain names (at least in viewer)

Open willsheppard opened this issue 2 years ago • 3 comments

When browsing for an archived website here: https://archive.fart.website/archivebot/viewer/domains/ -- I didn't expect "fanfiction.net" to be under "www.fanfiction.net".

www is the equivalent of "the" in a title -- it shouldn't be used in an index as it's not really part of the website's canonical name. Is it possible this might be implemented in the viewer only, with no other code changes required? Or whichever solution is best.

willsheppard avatar May 16 '22 13:05 willsheppard

Can't you use the search box, though?

TheTechRobo avatar May 16 '22 21:05 TheTechRobo

Nevermind, saw the context in #archiveteam-bs.

The problem with this is that www. is sometimes different from "normal". It's not common, but I've seen it before, although I can't remember what website it was (some type of forum). Non-www was a home page with a link to both the English one, and the www one (I don't remember what language the www one was, but it wasn't English). Something like that, at least.

But, if you are just suggesting that www.fanfiction.net should also be under F in addition to W, I think that would be OK. Maybe a maintainer can say whether this is doable. :-)

TheTechRobo avatar May 16 '22 23:05 TheTechRobo

The problem with this is that www. is sometimes different from "normal". It's not common, but I've seen it before, although I can't remember what website it was (some type of forum).

Working protocol/subdomain difference examples (as of 2022-12-10 20:38:17 UTC):

  • http://scoobypanel.com/ - Inaccessible.
  • https://scoobypanel.com/ - Inaccessible.
  • http://www.scoobypanel.com/ - Redirects to https://www.scoobypanel.com/.
  • https://www.scoobypanel.com/ - Returns its homepage.

Minor page difference example (working as of 2022-12-10 20:38:17 UTC):

  • http://ftp.fau.de/archlinux/ - Bottom text: "ftp.fau.de Port 80"
  • http://www.ftp.fau.de/archlinux/ - Bottom text: "www.ftp.fau.de Port 80"
  • https://ftp.fau.de/archlinux/ - Bottom text: "ftp.fau.de Port 443"
  • https://www.ftp.fau.de/archlinux/ - Bottom text: "www.ftp.fau.de Port 443"

I know there are other instances of this—some returning vastly different pages or redirecting to other domains—but no further examples come to mind at the moment. I will update this message when I come across more (if I remember to do so, that is, haha).

To add further insult to injury, I have come across instances of prefixes such as www2., www3., www49., etc. Should these appear with the results as well? I assume, however, it may be rather unlikely one is looking for these variations without specifying them beforehand.

…if you are just suggesting that www.fanfiction.net should also be under F in addition to W, I think that would be OK.

I, too, agree with this implementation.

systwi-again avatar Dec 10 '22 19:12 systwi-again