pooch
pooch copied to clipboard
`ExtractorProcessor.__call__()` does not retain the order given by the members filter
trafficstars
Description of the problem:
When I use an ExtractorProcessor (e.g. Unzip), I would expect that if I specify a list of files i want to extract, the order of the extracted paths was retained.
Instead, ExtractorProcessor uses os.walk(self.extract_dir) https://github.com/fatiando/pooch/blob/5860444bd19565791eecdb24dd620cf60b848e11/pooch/processors.py#L126 and then filters the extracted files based on the given members, resulting in an arbitrary order.
I would like to suggest to invert the check: loop over the members (if any), and filter on the extracted files.
Also: why not use pathlib.Path?
Example
>>> pooch.Unzip(
extract_dir=".",
members=[
'E11.5/E11-5_DevCCF_Annotations_31.5um.nii.gz',
'E11.5/E11-5_MRI-adc_31.5um.nii.gz',
])
Expected output
'/home/castoldi/brainglobe_workingdir/kim_dev_mouse/downloads/./E11.5/E11-5_DevCCF_Annotations_31.5um.nii.gz']
'/home/castoldi/brainglobe_workingdir/kim_dev_mouse/downloads/./E11.5/E11-5_MRI-adc_31.5um.nii.gz',
Actual output
'/home/castoldi/brainglobe_workingdir/kim_dev_mouse/downloads/./E11.5/E11-5_MRI-adc_31.5um.nii.gz',
'/home/castoldi/brainglobe_workingdir/kim_dev_mouse/downloads/./E11.5/E11-5_DevCCF_Annotations_31.5um.nii.gz']
System information
- Operating system: Ubuntu 24.04
- Python installation (Anaconda, system, ETS): system
- Version of Python: 3.12.3
- Version of this package: 1.8.2