pooch icon indicating copy to clipboard operation
pooch copied to clipboard

`ExtractorProcessor.__call__()` does not retain the order given by the members filter

Open carlocastoldi opened this issue 10 months ago • 1 comments
trafficstars

Description of the problem: When I use an ExtractorProcessor (e.g. Unzip), I would expect that if I specify a list of files i want to extract, the order of the extracted paths was retained. Instead, ExtractorProcessor uses os.walk(self.extract_dir) https://github.com/fatiando/pooch/blob/5860444bd19565791eecdb24dd620cf60b848e11/pooch/processors.py#L126 and then filters the extracted files based on the given members, resulting in an arbitrary order.

I would like to suggest to invert the check: loop over the members (if any), and filter on the extracted files.

Also: why not use pathlib.Path?

Example

>>> pooch.Unzip(
                extract_dir=".",
                members=[
                        'E11.5/E11-5_DevCCF_Annotations_31.5um.nii.gz',
                        'E11.5/E11-5_MRI-adc_31.5um.nii.gz',
                ])

Expected output

'/home/castoldi/brainglobe_workingdir/kim_dev_mouse/downloads/./E11.5/E11-5_DevCCF_Annotations_31.5um.nii.gz']
'/home/castoldi/brainglobe_workingdir/kim_dev_mouse/downloads/./E11.5/E11-5_MRI-adc_31.5um.nii.gz',

Actual output

'/home/castoldi/brainglobe_workingdir/kim_dev_mouse/downloads/./E11.5/E11-5_MRI-adc_31.5um.nii.gz',
'/home/castoldi/brainglobe_workingdir/kim_dev_mouse/downloads/./E11.5/E11-5_DevCCF_Annotations_31.5um.nii.gz']

System information

  • Operating system: Ubuntu 24.04
  • Python installation (Anaconda, system, ETS): system
  • Version of Python: 3.12.3
  • Version of this package: 1.8.2

carlocastoldi avatar Jan 10 '25 23:01 carlocastoldi