scrapyd icon indicating copy to clipboard operation
scrapyd copied to clipboard

Delete job from finished jobs

Open agauravdev opened this issue 2 years ago • 12 comments

I am maintaining a different table with all the finished tasks in my Django project, so I would like to delete the tables in the scrapyd finished jobs list. Then I can move a task to my table and delete it from here. Is there a simpler solution to this? If not, then I have added this functionality and will put a pr soon.

agauravdev avatar Dec 14 '21 15:12 agauravdev

Not 100% sure to understand your use case. Take a look at finished to keep and jobstorage which you could try by using the master branch.

mxdev88 avatar Dec 21 '21 06:12 mxdev88

@mxdev88 I have built a small internal tool in my company to track scraping tasks that ran on the server. But some tasks fail and we do not want to keep those tasks in the list of the finished tasks. So, I have written an API (with the web interface too) to delete a task from the finished list using the task id. Hope it's more clear now.

agauravdev avatar Dec 21 '21 12:12 agauravdev

I suppose an API endpoint deljob.json could be added similar to delversion for this purpose.

Personally, I think it could be useful. I would let the project maintainers comment on the idea. Feel free to submit a PR.

mxdev88 avatar Dec 22 '21 14:12 mxdev88

This seems reasonable!

jpmckinney avatar Dec 23 '21 19:12 jpmckinney

Coooll.. Will try to submit the PR this weekend. :D

agauravdev avatar Dec 24 '21 16:12 agauravdev

Someone created the Pr?

javiersrf avatar Aug 22 '22 17:08 javiersrf

I wonder how this feature would be expected to behave if one calls it for pending or running jobs. Shall we consider that one calling deljob would behave as a cancel for pending and running jobs? or fail with some error message? It seems there would be some sort of overlap in the two features. Maybe the deljob would supersede cancel and cancelwould eventually be deprecated. Any thoughts?

mxdev88 avatar Jan 31 '23 20:01 mxdev88

What state does a canceled job end up in?

I think there’s a semantic difference between canceling a running job and deleting a finished job. Only one of the two involves interrupting a process. (Similar to stopping vs removing a container.)

I would keep the APIs separate.

jpmckinney avatar Feb 01 '23 02:02 jpmckinney

Oh I totally forgot about this PR. 🙇🏻‍♂️ I will try to finish it. I had written the code, but changed my laptop so mostly will write it again.

But I agree with @jpmckinney that it should be kept separate. Trying to delete a running process should return an error to cancel the process or let it finish. Thanks for making this issue active again 🙇🏻‍♂️

agauravdev avatar Feb 01 '23 04:02 agauravdev

What state does a canceled job end up in?

Looking at the code it gets removed from the queue if pending or killed if running so no state; the job disappears.

I think there’s a semantic difference between canceling a running job and deleting a finished job. Only one of the two involves interrupting a process. (Similar to stopping vs removing a container.)

Yep, fully agree on the semantic difference. I was just wondering because in the end the cancel removes the job as if it never existed, which is sort of a deletion.

I would keep the APIs separate.

ok :)

mxdev88 avatar Feb 01 '23 19:02 mxdev88

Aha - presently the state change is the same, but I can imagine in the future that cancelling a running job puts it in the end-state "interrupted" rather than deleting it. (If we were to implement this in the future, then deljob could delete either interrupted or finished jobs.)

jpmckinney avatar Feb 01 '23 23:02 jpmckinney

Makes sense!

mxdev88 avatar Feb 02 '23 07:02 mxdev88