imagededup
imagededup copied to clipboard
Tensorflow Requirement
I have tensorflow gpu v1.10 installed on my system. On running the setup I get this error:
Collecting tensorflow==2.0.0 (from imagededup) Could not find a version that satisfies the requirement tensorflow==2.0.0 (from imagededup) (from versions: 0.12.1, 1.0.0, 1.0.1, 1.1.0rc0, 1.1.0rc1, 1.1.0rc2, 1.1.0, 1.2.0rc0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.3.0rc0, 1.3.0rc1, 1.3.0rc2, 1.3.0, 1.4.0rc0, 1.4.0rc1, 1.4.0, 1.4.1, 1.5.0rc0, 1.5.0rc1, 1.5.0, 1.5.1, 1.6.0rc0, 1.6.0rc1, 1.6.0, 1.7.0rc0, 1.7.0rc1, 1.7.0, 1.7.1, 1.8.0rc0, 1.8.0rc1, 1.8.0, 1.9.0rc0, 1.9.0rc1, 1.9.0rc2, 1.9.0, 1.10.0rc0, 1.10.0rc1, 1.10.0, 1.10.1, 1.11.0rc0, 1.11.0rc1, 1.11.0rc2, 1.11.0, 1.12.0rc0, 1.12.0rc1, 1.12.0rc2, 1.12.0, 1.12.2, 1.12.3, 1.13.0rc0, 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 2.0.0a0, 2.0.0b0, 2.0.0b1) No matching distribution found for tensorflow==2.0.0 (from imagededup)
Is it necessary to install tensorflow v2.0?
Hi it's not! We will relax the requirements in the next release (see this PR #36). We will also see if >TF 1.0 should also be okay.
@varishtsg we just released our new version. Can you try again?
I ended up removing my existing tensorflow install and installing 1.14 instead. After that I was able to install imagededup. Now I need to test it to see if it works well.
@datitran the cnn encoder worked fine the first time I ran it.
However on using it on a directory with a large number of images I'm getting this error.
duplicates_list = cnn_encoder.find_duplicates(image_dir=desktop_directory, scores=True)
`2019-10-31 23:32:08,777: INFO Start: Image encoding generation
AttributeError Traceback (most recent call last)
~/.local/lib/python3.6/site-packages/imagededup/methods/cnn.py in find_duplicates(self, image_dir, encoding_map, min_similarity_threshold, scores, outfile) 339 min_similarity_threshold=min_similarity_threshold, 340 scores=scores, --> 341 outfile=outfile, 342 ) 343 elif encoding_map:
~/.local/lib/python3.6/site-packages/imagededup/methods/cnn.py in _find_duplicates_dir(self, image_dir, min_similarity_threshold, scores, outfile) 278 'image1_duplicate2.jpg'], 'image2.jpg':['image1_duplicate1.jpg',..], ..} 279 """ --> 280 self.encode_images(image_dir=image_dir) 281 282 return self._find_duplicates_dict(
~/.local/lib/python3.6/site-packages/imagededup/methods/cnn.py in encode_images(self, image_dir) 173 raise ValueError('Please provide a valid directory path!') 174 --> 175 return self._get_cnn_features_batch(image_dir) 176 177 @staticmethod
~/.local/lib/python3.6/site-packages/imagededup/methods/cnn.py in _get_cnn_features_batch(self, image_dir) 94 95 feat_vec = self.model.predict_generator( ---> 96 self.data_generator, len(self.data_generator), verbose=self.verbose 97 ) 98 self.logger.info('End: Image encoding generation')
~/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py in predict_generator(self, generator, steps, callbacks, max_queue_size, workers, use_multiprocessing, verbose) 1553 use_multiprocessing=use_multiprocessing, 1554 verbose=verbose, -> 1555 callbacks=callbacks) 1556 1557 def _validate_compile_param_for_distribution_strategy(
~/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py in model_iteration(model, data, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch, mode, batch_size, steps_name, **kwargs)
330 # Epochs only apply to fit
.
331 callbacks.on_epoch_end(epoch, epoch_logs)
--> 332 progbar.on_epoch_end(epoch, epoch_logs)
333
334 # Recreate dataset iterator for the next epoch.
~/.local/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py in on_epoch_end(self, epoch, logs) 779 self.log_values.append((k, logs[k])) 780 if self.verbose: --> 781 self.progbar.update(self.seen, self.log_values) 782 783
AttributeError: 'ProgbarLogger' object has no attribute 'log_values'
`
EDIT: The folder being used was actually empty. It would be great if you could add a check for this.
Good point. Will add it to next release
Now that a recursive flag has been added to several methods through #104 (only available in dev at the moment), a comprehensive way to check empty directory can be added.
This is the current behaviour for hashing:
- If there are no files in the directory, no error/warning is raised, the logic runs through and returns an empty dictionary.
- The above is True even if all files in the directory have a format that is not supported by the package.
- The behaviour is same for
encode_images
,find_duplicates
/find_duplicates_to_remove
.
This is the current behaviour for cnn:
- If there are no files in the directory, the code breaks.
- If all files in the directory have a format that is not supported by the package,
encode_images
runs through and returns an empty dictionary, butfind_duplicates
breaks.
This points to an inconsistency in the current implementation and should be addressed by a fix to this issue (The behaviour needs to be same across all types of deduplication methods).
Apart from the above inconsistency, support for recursive directories as implemented in #104 add more considerations:
- With the recursive parameter set to False, only the
image_dir
provided needs to be checked for presence of files (ignoring the subdirectories). At this point, should the validity of file formats also be checked (to catch the case that all files do not adhere to supported formats)? Currently, format validation responsibility lies with imagededup/utils/image_utils.py/load_image. Spreading the same responsibility across different functions does not seem like a good idea to me. - With recursive set to True, should each subdirectory be visited and format checking applied to each?
@datitran @clennan Would be happy to have your inputs.