internetarchive
internetarchive copied to clipboard
WIP fix derive logic and redundant hash calc
This addresses https://github.com/jjjake/internetarchive/issues/253 and https://github.com/jjjake/internetarchive/issues/288
Counts all files (even those which might be skipped). This is consistent with the newly added behavior for file-level metadata.
BUG: if the last file is skipped, then a derive is never queued
Any idea how to solve this? Is there a way to queue a derive WITHOUT uploading a file? The easiest way would be to just upload everything and skip whatever you like, and then after everything is completed, simple do queue a derive with a separate call. Then we could do away with this fragile file counting.
Is there a way to queue a derive WITHOUT uploading a file?
Related: #252
This would be nice to have in general. I wrote a script for myself last year since I didn't want to click around on the web interface all the time as I'm using --no-derive
to avoid hitting those two bugs you're trying to solve as well as wanting to be sure everything's fine before initiating the derive that can take a very long time. I'm not sure if that works for non-admin accounts though, and it should most definitely not be used as a basis for an implementation here as it emulates the website interaction, so I'm not going to link it here.
Yes, it's possible to queue a derive task without uploading anything:
$ ia tasks jj-test-2020-05-14 --cmd derive.php
Python:
>>> r = item.derive()
Great, I will use this instead of the counting logic.
I removed all counting logic and just call derive() in the end. Do you think that is an acceptable solution? The tests don't seem to like it though. If you think we should chose this solution. I will have a look at the tests.
No, I think the derive task should still be queued in the upload request. I'd like to avoid submitting an extra request to another API if possible.
@jjjake
--cmd derive.php
Could this be added to the documentation? It only mentions make_dark.php
and make_undark.php
currently: https://archive.org/services/docs/api/tasks.html#supported-tasks. Maybe also an example in the ia tasks
help.