domains icon indicating copy to clipboard operation
domains copied to clipboard

Do not download large files such as FLAC and MP3

Open DamonHD opened this issue 1 year ago • 3 comments

Downloading these places a significant load on servers, and most are not going to contain URL metadata of use to the project.

This is probably true of image files too.

At least please explicitly describe a suitable robots.txt User-agent name to stop the tool scraping inappropriate sites/subtrees.

Rgds

Damon

DamonHD avatar Feb 15 '24 11:02 DamonHD

Hi,

Is it GET /some/large/file.mp3 or just HEAD ?

Thanks

tb0hdan avatar Feb 15 '24 15:02 tb0hdan

GET, eg:

"GET /img/audio/AudioMoth/20210402/20210402T1827Z-desk-ambient-AudioMoth-384ksps.flac HTTP/2.0" 200 18384032 "-" "Mozilla/5.0 (compatible; Domains Project/1.3.7; +https://domainsproject.org)"

DamonHD avatar Feb 15 '24 16:02 DamonHD

Got it, it's a duplicate of #28

tb0hdan avatar Feb 15 '24 16:02 tb0hdan