pynsee icon indicating copy to clipboard operation
pynsee copied to clipboard

pynsee doesn't (always) clean tempfiles

Open tgrandje opened this issue 1 year ago • 12 comments

When running the tests for my actual PR, i've remarked pynsee is not (always ?) cleaning temporary files after it's finished. As a matter of fact, this has just filled the available space I still got on my windows/WSL partition.

For instance, that tmpfile is never cleaned afterwards.

tgrandje avatar Aug 07 '24 14:08 tgrandje

@hadrilec I think your in detail knowledge of INSEE-specific data is required to understand how long this tmpfiles actually need to last, please see my comment on the associated PR #206

tfardet avatar Aug 08 '24 08:08 tfardet

  • the functions using the save_df folder should delete the files they download as the data should be stored in the "pynsee folder" anyway thanks to the decorator.
  • data from _get_full_list_wfs should be kept for one python session
  • _dwn_idbank_files, _get_insee, _get_geo_list_simple, _get_geo_relation and _get_full_list_wfs are called only by a function using the save_df decorator
  • get_dataset_list uses the save_df decorator
  • get_last_release: data can be deleted

hadrilec avatar Aug 08 '24 15:08 hadrilec

Great, thanks!

functions using the save_df folder should delete the files

so for these, TemporaryFile, NamedTemporaryFile, or TemporaryDirectory should be used in a context manager

data from _get_full_list_wfs should be kept for one python session

so we need an atexit function (EDIT: no we don't, see below)

tfardet avatar Aug 08 '24 15:08 tfardet

well, actually for _get_full_list_wfs it is the same as for the others, as it is only called by get_geodata_list which is using save_df

hadrilec avatar Aug 08 '24 15:08 hadrilec

great, no atexit, then!

tfardet avatar Aug 08 '24 15:08 tfardet

I'll settle for removing the unnecessary tempfiles first :-) I've started it : it might (almost) be easier and (surely) cleaner than tracking the tempfiles which are opened in nested functions (and could therefore not really be handled through context managers).

tgrandje avatar Aug 09 '24 07:08 tgrandje

I'm not sure about _create_insee_folder. As far as I can see, this is to prevent an exception when permissions are currently too low to create an appdata folder. In that case, I'd say the tempdir should stay, and the lru_cache on _get_temp_dir() prevents creating multiple tempdir during a session.

What about cleaning that?

tgrandje avatar Aug 09 '24 07:08 tgrandje

Doesn't platformdirs basically guarantee that we'll have the rights to create the folder?

I wonder whether everything could not be replaced by platformdirs.user_cache_dir("pynsee", ensure_exists=True)

BTW: is there a reason why it's using a "pynsee" forlder inside the initial "pynsee" folder or should we clean that up?

tfardet avatar Aug 09 '24 08:08 tfardet

Doesn't platformdirs basically guarantee that we'll have the rights to create the folder?

I sure hope so, but I'm no expert on that. If there wasn't that comment, I would already have gone for the platformdirs.user_cache_dir("pynsee", ensure_exists=True)...

@hadrilec does that reminds you of anything?

EDIT: If we move to rely strictly on platformdirs, we can also get rid of the _get_temp_dir.py file: that call is the last one (I got rid of all the other to favor in-memory files instead of tempfiles).

tgrandje avatar Aug 09 '24 09:08 tgrandje

if platformdirs works as expected _get_temp_dir will not be used. As it is safe this way, I would prefer not to remove it because I dont want to debbug in case platformsdirs fails.

hadrilec avatar Aug 09 '24 09:08 hadrilec

BTW: is there a reason why it's using a "pynsee" forlder inside the initial "pynsee" folder or should we clean that up?

I suspect that's a typo. But if we don't want to break (too much) cache when upgrading pynsee, we should keep it that way.

tgrandje avatar Aug 09 '24 12:08 tgrandje

this is not a typo, back then this folder was in different path location it was more tidy this way. I would prefer not to change this feature.

hadrilec avatar Aug 09 '24 13:08 hadrilec

@tgrandje can we close this?

tfardet avatar Feb 21 '25 08:02 tfardet

yes

tgrandje avatar Feb 21 '25 10:02 tgrandje