OpenWPM icon indicating copy to clipboard operation
OpenWPM copied to clipboard

Managing Temporary Files

Open MohammadMahdiJavid opened this issue 2 years ago • 2 comments

Hi,

i'm running large crawls, but as i noticed temp files are not getting removed as sometime passes or crawls move forward

openwpm_profile_archive_{some random number} --> each almost more than 2GB

i was wondering, if i made mistake in my experiments or this feature is not implemented?

Thanks

MohammadMahdiJavid avatar Feb 02 '24 16:02 MohammadMahdiJavid

So from a quick search around I can see the profile.tar getting generated here: https://github.com/openwpm/OpenWPM/blob/f72e7ca1fc3edcc60b26c780c264176e1e384779/openwpm/browser_manager.py#L114-L134 Which then get used here: https://github.com/openwpm/OpenWPM/blob/f72e7ca1fc3edcc60b26c780c264176e1e384779/openwpm/deploy_browsers/deploy_firefox.py#L64-L73

And never cleaned up. Since the recovery_tar is by definition generated by OpenWPM, it should clean up after the browser has been restored after a crash. Doing an os.remove and unsetting browser_params.recovery_tar after it has been restored seems reasonable.

Do you have time to implement this?

vringar avatar Feb 07 '24 17:02 vringar

Hi, Thanks for your time and the great insight provided

https://github.com/openwpm/OpenWPM/blob/f72e7ca1fc3edcc60b26c780c264176e1e384779/openwpm/browser_manager.py#L221-L230

I see here that tempdir get's removed, although the variable name looks very unreadable :) and tempdir is the one used to create the directory

https://github.com/openwpm/OpenWPM/blob/f72e7ca1fc3edcc60b26c780c264176e1e384779/openwpm/browser_manager.py#L116-L121

I think the issue would be from the profiling since it get's removed when spawn is successful and by looking more into the logs I realized

there are different errors like


  File "openwpm/commands/profile_commands.py", line 58, in dump_profile
    tar.add(browser_profile_path, arcname="")
    
  File "python3.9/tarfile.py", line 2172, in add
    self.add(os.path.join(name, f), os.path.join(arcname, f),
    
  File "python3.9/tarfile.py", line 2150, in add
    tarinfo = self.gettarinfo(name, arcname)
    
  File "python3.9/tarfile.py", line 2023, in gettarinfo
    statres = os.lstat(name)
    
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/firefox_profile_mp57p7k5/prefs-41.js'

or similar errors for other files like

prefs-41.js

storage.sqlite-journal

WebDriverBiDiServer.json

I was wondering when the profile is being dumped, if the previous browser is crashed and closed, right? does it need a few seconds maybe to remove temp files or something like this?

i think this should be the issue of "not removed archived profiles"

MohammadMahdiJavid avatar Feb 08 '24 21:02 MohammadMahdiJavid