twitter-archive-parser icon indicating copy to clipboard operation
twitter-archive-parser copied to clipboard

Stop iterating on the content that is 404'd or DMCA'd

Open fl0werpowers opened this issue 3 years ago • 2 comments

Some content that is present in the archives either does not exist anymore due to it being deleted by the original uploader, or it is taken down via DMCA claims. The tool clearly emits the exceptions (as 'Download failed with status "404 Not Found"' and 'Download failed with status "403 Forbidden"' respectively), with the 403 one clearly specifying that the content in question has been struck by DMCA. Iterating through such content multiple times is a waste of time, and such media can be skipped to save time.

fl0werpowers avatar Nov 21 '22 15:11 fl0werpowers

these are the exceptions in question

FAIL. Media couldn't be retrieved from https://pbs.twimg.com/media/EbH_bxcUYAgxbki.png:orig because of exception: Download failed with status "404 Not Found". Response content: ""

FAIL. Media couldn't be retrieved from https://video.twimg.com/ext_tw_video/1560406436982804480/pu/vid/1280x720/m7-vUTLunERc4auB.mp4?tag=12 because of exception: Download failed with status "403 Forbidden". Response content: "{"error_code":2,"error_response":"Dmcaed"}"

fl0werpowers avatar Nov 21 '22 15:11 fl0werpowers

Agree. Was going to raise this issue myself. Thanks for the well written issue.

chibiconsulting avatar Dec 29 '22 06:12 chibiconsulting