ricecooker icon indicating copy to clipboard operation
ricecooker copied to clipboard

Ricecooker does not clean up temp files and directories

Open kollivier opened this issue 7 years ago • 2 comments

Description

Forgot to file this as an issue when I ran into it, and just made a mental note to look into it later. Basically, ricecooker (and hence chefs by default) do not clean up temp files and directories, which can cause them to bloat considerably over time.

In particular, create_predictable_zip creates a zip file in the temp folder that is never deleted. What makes this particularly problematic is that these temp files remain until the OS cleans them up, which on Mac appears to be upon reboot. While debugging a PraDigi issue, I ended up filling up my HDD space after a couple runs because each run would create several GB of temp files and leave them on my drive.

The simplest solution is to have ricecooker create a root temp directory on start, and provide access to this directory to chefs (e.g. ricecooker.get_temp_dir()) so that they can store all the temp files they create inside it. Then, even if the chef errors out, we wipe out this temp directory before exiting. (Maybe an @atexit handler?)

What I Did

Ran the PraDigi chef locally on OS X then inspect my temp files directory.

kollivier avatar Sep 13 '18 22:09 kollivier

Yeah this would be good to fix. There are two cases:

  1. "one-off" temp files that we don't want to keep [especially ones auto-generated by ricecooker utils]
  2. files created in pre_run that we do want to keep---e.g. intermediate states for files we download and process locally before uploading. These shouldn't be deleted atexit, but should be deleted if --update directive is given. Currently this --update logic has to be done by chef.

Related to 2., I opened this to maybe turn it into a general purpose thing: https://github.com/learningequality/ricecooker/issues/138

And something to watch out for RE --update logic, many of the chefs use "forever caching" like this: https://github.com/learningequality/sushi-chef-pradigi/blob/master/chef.py#L69-L78 This is very useful when debugging, but we need to make chefs in production have a more reasonable expiry logic—otherwise we'll never re-scrape sites.

ivanistheone avatar Oct 29 '18 13:10 ivanistheone

Here is some recent chef code that sets the TMPDIR env variable which tmpfile package respcts, and as a result all the chef's temporary files are put into chefdata/tmp.

https://github.com/learningequality/sushi-chef-african-storybook/blob/master/chef.py#L31-L34

I don't see any blockers for this being the default behaviour for ALL chefs (implemented in ricecooker), possibly with a way to override somehow if chef author needs tmp files to go elsewhere...

The location chefdata/tmp seems like a good fit but open for other names?

Note also there are several places in the ricecooker code that don't delete tmp files, so now would be a good time to fixe those with finally blocks or other cleanups.

ivanistheone avatar Apr 19 '20 16:04 ivanistheone

Fixed in https://github.com/learningequality/ricecooker/pull/268

rtibbles avatar Jun 15 '25 03:06 rtibbles