internetarchive icon indicating copy to clipboard operation
internetarchive copied to clipboard

WIP fix derive logic and redundant hash calc

Open Dobatymo opened this issue 4 years ago • 7 comments

This addresses https://github.com/jjjake/internetarchive/issues/253 and https://github.com/jjjake/internetarchive/issues/288

Counts all files (even those which might be skipped). This is consistent with the newly added behavior for file-level metadata.

BUG: if the last file is skipped, then a derive is never queued

Any idea how to solve this? Is there a way to queue a derive WITHOUT uploading a file? The easiest way would be to just upload everything and skip whatever you like, and then after everything is completed, simple do queue a derive with a separate call. Then we could do away with this fragile file counting.

Dobatymo avatar May 28 '20 04:05 Dobatymo

Is there a way to queue a derive WITHOUT uploading a file?

Related: #252

This would be nice to have in general. I wrote a script for myself last year since I didn't want to click around on the web interface all the time as I'm using --no-derive to avoid hitting those two bugs you're trying to solve as well as wanting to be sure everything's fine before initiating the derive that can take a very long time. I'm not sure if that works for non-admin accounts though, and it should most definitely not be used as a basis for an implementation here as it emulates the website interaction, so I'm not going to link it here.

JustAnotherArchivist avatar May 30 '20 02:05 JustAnotherArchivist

Yes, it's possible to queue a derive task without uploading anything:

$ ia tasks jj-test-2020-05-14 --cmd derive.php

Python:

>>> r = item.derive()

jjjake avatar Jun 01 '20 17:06 jjjake

Great, I will use this instead of the counting logic.

Dobatymo avatar Jun 02 '20 00:06 Dobatymo

I removed all counting logic and just call derive() in the end. Do you think that is an acceptable solution? The tests don't seem to like it though. If you think we should chose this solution. I will have a look at the tests.

Dobatymo avatar Jun 03 '20 02:06 Dobatymo

No, I think the derive task should still be queued in the upload request. I'd like to avoid submitting an extra request to another API if possible.

jjjake avatar Jun 03 '20 17:06 jjjake

@jjjake

--cmd derive.php

Could this be added to the documentation? It only mentions make_dark.php and make_undark.php currently: https://archive.org/services/docs/api/tasks.html#supported-tasks. Maybe also an example in the ia tasks help.

JustAnotherArchivist avatar Jun 03 '20 19:06 JustAnotherArchivist