beets icon indicating copy to clipboard operation
beets copied to clipboard

Preserve mtimes of files extracted from zip archives

Open zuzzurro opened this issue 3 years ago • 12 comments

Importing a disc from a zip file while having the importadded set results in file times not corresponding to the content of the zip archive itself, the files times are current. Repeating the same import on a folder extracted from the zip file using the unzip tool works fine.

I'm using 1.6.0 on Fedora 36 (self compiled as the distro still has 1.4.9).

I can provide more information, but the issue is quite easy to reproduce as I explained above.

this is the relevant section of my config file:

importadded:
  preserve_mtimes: yes
  preserve_write_mtimes: yes

zuzzurro avatar Jun 29 '22 16:06 zuzzurro

Interesting! I think the first step to understanding this would be to know whether the library we use to extract these files, namely the zipfile.extractall function, can preserve mtimes. If it does, then we would need to find where we're discarding those. If it doesn't, then we may be out of luck. Any chance you'd be able to investigate?

sampsyo avatar Jun 29 '22 21:06 sampsyo

Mmmh. It doesn't seem to work properly. I tested it by running:

python -m zipfile -e zz.zip zz/

and the time of the extracted files is now.

zuzzurro avatar Jun 30 '22 08:06 zuzzurro

I found the following, so it seems a limitation of pythons implementation of zip extraction:

https://stackoverflow.com/questions/9813243/extract-files-from-zip-file-and-retain-mod-date

arogl avatar Jun 30 '22 11:06 arogl

Would it be possible to adopt one of the proposed solutions? Since it's for internal use I don't see much possibly going wrong by changing the time by hand after the extraction.

zuzzurro avatar Jun 30 '22 12:06 zuzzurro

Nice find, @arogl! It seems technically possible, but somewhat annoying to implement because we can no longer just use that extractall function… but I'll mark this as a feature request in case anyone is interested in giving it a shot.

sampsyo avatar Jun 30 '22 13:06 sampsyo

I haven't tested if the functionality works for RAR files, but since RAR unpacking it is implemented by directly calling an external utility (unrar) maybe that's the case.

zuzzurro avatar Jun 30 '22 14:06 zuzzurro

I will try to look at all file extraction from archives over the weekend.

I was thinking of wrapping the time setting while only the preserve options enabled

arogl avatar Jun 30 '22 22:06 arogl

@sampsyo

Could this work?

In importer.py#L1080 add


# From here:
# https://stackoverflow.com/questions/9813243/extract-files-from-zip-file-and-retain-mod-date
# fixing #4392

def RestoreTimestampsOfArchiveContents(archivename, extract_dir):
    for f in archivename.infolist():
        # path to this extracted f-item
        fullpath = os.path.join(extract_dir, f.filename)
        # still need to adjust the dt o/w item will have the current dt
        date_time = time.mktime(f.date_time + (0, 0, -1))
        # update dt
        os.utime(fullpath, (date_time, date_time))

Then at importer.py#L1093:

if (config['preserve_mtimes'].get(bool)):
    RestoreTimestampsOfArchiveContents(archive, extract_to)

I have not thought too much about PY2 example v. PY3, nor the if config

arogl avatar Jul 01 '22 08:07 arogl

Yes, something like this could work! With the caveat that the preserve_mtimes option is located within the importadded configuration—not at the top level of config.

sampsyo avatar Jul 01 '22 11:07 sampsyo

By doing it this way are we setting the times also in the cases where they are already set by the unarchiver? Just curious.

zuzzurro avatar Jul 01 '22 13:07 zuzzurro

By doing it this way are we setting the times also in the cases where they are already set by the unarchiver? Just curious.

At the moment every extraction, regardless of type.

Further testing to be done

arogl avatar Jul 01 '22 21:07 arogl

Initial change pushed #4396

arogl avatar Jul 02 '22 06:07 arogl