narrative-ripper icon indicating copy to clipboard operation
narrative-ripper copied to clipboard

Metadata download fails on particular file

Open justingist opened this issue 9 years ago • 13 comments

After downloading 123 metadata folders, it's just failing on end on the next one... I know I should have a lot more than this. Is there a way I can bypass the one that is failing or go back to it later? I cleared out the directory and started again from scratch and it produced the same result. It will stay on this for hours on end, with no end in sight.

justingist avatar Oct 13 '16 14:10 justingist

Update: I created some dummy files and now it is going along nicely again... will have to see what happens when I try and download the photos. I figured 123 was a bit low when I have 1,857 moments (372,500 photos)...

justingist avatar Oct 13 '16 14:10 justingist

Several people are having similar issues. Obviously the data quality narrative has isn't very good. Seems like a lot of people have pointers to missing (I'm guessing deleted) data. Great to hear you got it working again. Alternatively you could have opened moments/page-1.json file and found the reference to the failing moment and deleted it.

What was the name of the files you created dummy files? I'll try to determine if you should delete the dummy files, the whole directory or nothing at all to avoid issues when downloading the photos.

deadcyclo avatar Oct 13 '16 14:10 deadcyclo

I think it was mostly this positions file:

Failed https://narrativeapp.com/api/v2/moments/619637cb19b84208802d0677b7396287/positions/?limit=1500. Retrying in 1 second. Failed https://narrativeapp.com/api/v2/moments/619637cb19b84208802d0677b7396287/positions/?limit=1500. Retrying in 1 second. Failed https://narrativeapp.com/api/v2/moments/619637cb19b84208802d0677b7396287/positions/?limit=1500. Retrying in 1 second. Failed https://narrativeapp.com/api/v2/moments/619637cb19b84208802d0677b7396287/positions/?limit=1500. Retrying in 1 second. Failed https://narrativeapp.com/api/v2/moments/619637cb19b84208802d0677b7396287/positions/?limit=1500. Retrying in 1 second. Failed https://narrativeapp.com/api/v2/moments/619637cb19b84208802d0677b7396287/positions/?limit=1500

justingist avatar Oct 13 '16 14:10 justingist

I should add I think I got a 404 on it when I tried to open file in chrome, but wasn't sure if it was just because I wasn't using a rest client...

justingist avatar Oct 13 '16 14:10 justingist

Ah.. Position files shouldn't be an issue at all. They aren't used during image download at all...

deadcyclo avatar Oct 13 '16 15:10 deadcyclo

I potentially have tons of broken moments (1.3 mil photos, not sure how many moments). I tried deleting the entries from page-1.json but there are just more and more moments that it keeps retrying forever, so trying to manually delete them seem futile. Can we change the script in a way that it gives up trying and move on to the next moment?

thederan avatar Oct 13 '16 16:10 thederan

Is it always the position that fails?

deadcyclo avatar Oct 13 '16 16:10 deadcyclo

@thederan Could you try the code in the allow-failure branch? I tried to make a version that would skip after some retries, but I can't test it right now. Test it, and let me know if it works.

deadcyclo avatar Oct 13 '16 17:10 deadcyclo

Just to update: I only had an issue with the 1 file, so the rest of the metadata completed just fine. Currently now downloading using an 8 core Azure VM... so far I think I've got 40GB of photos in!

justingist avatar Oct 13 '16 20:10 justingist

@deadcyclo Yes, always failing at /moments/.../positions/?limit=1500

Thanks for the other branch! I reduced the retries from 10 to 2 and then after a number of more failures it finally got to a point where the moments weren't failing anymore.

Does the 1500 limit mean, if I have more moments it will get cut off?

thederan avatar Oct 14 '16 00:10 thederan

@thederan Great.

No. Basically, you can tell them how many results you want for a single request, and 1500 is the maximum amount of replies allowed in a single request on their side. So if you have more than 1500 moments, it will be split into multiple requests and multiple files, and you will get a moments/page-1.json, moments/page-2.json etc. depending on how many moments you actually have.

deadcyclo avatar Oct 14 '16 08:10 deadcyclo

Now without error message the ripper seem to be stuck in an endless looping over the same /moments/.../positions/?cursor=... URL. Restarting the script it ends up looping over the same URL again eventually.

thederan avatar Oct 16 '16 16:10 thederan

@thederan I saw this potential issue when I was creating the script, but it didn't hit me since I didn't have any moments with more than 1500 positions. Basically, there is an error with pagination on the locations bit of the service.

Could you try replacing line 75 get_multiple(session, "https://narrativeapp.com/api/v2/moments/{uuid}/positions/?limit=1500".format(uuid=moment['uuid']), moment_path, "positions-{cnt}.json")

with: get_from_file_or_service(session, "https://narrativeapp.com/api/v2/moments/{uuid}/positions/?limit=1500".format(uuid=moment['uuid']), moment_path, "positions-1.json")

and see if that fixes the issue for you?

deadcyclo avatar Oct 18 '16 12:10 deadcyclo