fastdup icon indicating copy to clipboard operation
fastdup copied to clipboard

[Bug]: Fastdup will create a copy of all images in the 'cdn' folder inside work_dir.

Open shantanusingh16 opened this issue 1 year ago • 2 comments

What happened?

When trying to run fastdup on a dataset, it ends up copying all these images to specific sub-directories inside a 'cdn' directory inside the work-dir specified. This becomes a challenge with disk storage and also a bottleneck when dealing with network volumes that have slow read/write speeds.

What did you expect to see?

Expected fastdup to not create copies of all images inside work-dir.

What version of fastdup were you runnning on?

2.3

What version of Python were you running on?

Python 3.10

Operating System

Ubuntu 22.04

Reproduction steps

  1. Download an image dataset.
  2. Run fastdup on this dataset using the command:
fd = fastdup.create(input_dir=f"{data_dir}/images/", work_dir=f"{data_dir}/work_dir")
fd.run()
  1. Navigate to the directory work_dir/cdn. This would contain subdirectories where all the images have been copied.

Relevant log output

No response

Attach a screenshot [Optional]

No response

Contact Details [Optional]

[email protected]

shantanusingh16 avatar Jun 21 '24 09:06 shantanusingh16

This is a valid concern. Thanks for reporting!

dnth avatar Jun 21 '24 17:06 dnth

@shantanusingh16 we've released fastdup==2.5 which addressed this issue. Would you please update fastdup and see if this is still an issue?

dnth avatar Jun 27 '24 04:06 dnth

Hey @dnth . I was able to verify that this problem is solved with fastdup==2.5. Thank you for the prompt fix!

shantanusingh16 avatar Jul 05 '24 13:07 shantanusingh16

Thanks for confirming again! I will close this issue.

dnth avatar Jul 09 '24 05:07 dnth