dataverse
dataverse copied to clipboard
[feature request] stop an harvest job in progress
Hello,
I ran into the issue described here: https://groups.google.com/g/dataverse-community/c/O-NdDtgFrI0/m/os_KjdLxAQAJ
The process worked, but maybe it would be a good idea to have a button that would do the same thing on the harvest page? or maybe an API to reset stuck harvest?
Here is what i've done: #manually get the harvest job's id: select * from harvestingclient;
#fix the issue - where {ID} is the database id of the harvesting client. UPDATE clientharvestrun SET harvestresult=0 WHERE harvestingclient_id={ID} AND harvestresult = 2; UPDATE harvestingclient SET harvestingnow = FALSE WHERE id={ID};
#restart payara "just in case" systemctl restart payara
Take care,
Virgile
Thanks @virgilejarrige, and good to see you at some #dataverse2021 sessions this week. I think it's a good idea to not have this be reliant on a DB update, but we should also examine and fix the failure cases that results in this condition in the first place as well.
Hey Danny!
We had two cases in which this happened:
1 - The harvest of an entire data repository (Nakala) - which worked but was so huge it made our postgresql bdd too big for the VM it was in. As it's on our "all-in-one" test server, we had to use a snapshot to restore it. That was a noob mistake, but would have been nice to be able to stop it. ;-)
2 - The haverst of an ahp collection - which was ended with the DP Update and in the dashboard now we have "SUCCESS; 0 harvested, 0 deleted, 0 failed." For this one, here are the parameters if you want to test it on your side: URL: https://archives.ahp-numerique.fr/index.php/;oai OAI Set: oai:archives.ahp-numerique.fr:ahpoai_406 Metadata Format: oai_dc Schedule: None Archive Type: Generic OAI archive Archive URL: https://archives.ahp-numerique.fr
sprint:
- this is an older one. Frequent request.
- Harvesting jobs can take a long time and currently you are stuck waiting for the end once it starts.
- This does not have an immediate solution.
- Solution is not likely complex.
Desired behavior is to stop the harvesting that is in progress as opposed to a pause state. The current workaround is restarting the app server, so providing a stop will be a large improvement.
- small. e.g. implement a binary "stop" flag that client checks after every dataset import.
Sprint
- orphaned in OnDeck in pm.sprint.2022_05_11
Sprint:
- pm.sprint.2022_05_25 ended WIP
Waiting on PR8753 to clear. Leonid was looking to work on this but this is not important. Phil - noted that customers have noticed this in the field. Gustavo - let's push this to the sprint following this.
Is this issue in scope as well?
- #7052