fastdup
fastdup copied to clipboard
[Bug]: Fastdup will create a copy of all images in the 'cdn' folder inside work_dir.
What happened?
When trying to run fastdup on a dataset, it ends up copying all these images to specific sub-directories inside a 'cdn' directory inside the work-dir specified. This becomes a challenge with disk storage and also a bottleneck when dealing with network volumes that have slow read/write speeds.
What did you expect to see?
Expected fastdup to not create copies of all images inside work-dir.
What version of fastdup were you runnning on?
2.3
What version of Python were you running on?
Python 3.10
Operating System
Ubuntu 22.04
Reproduction steps
- Download an image dataset.
- Run fastdup on this dataset using the command:
fd = fastdup.create(input_dir=f"{data_dir}/images/", work_dir=f"{data_dir}/work_dir")
fd.run()
- Navigate to the directory work_dir/cdn. This would contain subdirectories where all the images have been copied.
Relevant log output
No response
Attach a screenshot [Optional]
No response
Contact Details [Optional]
This is a valid concern. Thanks for reporting!
@shantanusingh16 we've released fastdup==2.5 which addressed this issue. Would you please update fastdup and see if this is still an issue?
Hey @dnth . I was able to verify that this problem is solved with fastdup==2.5. Thank you for the prompt fix!
Thanks for confirming again! I will close this issue.