imagededup icon indicating copy to clipboard operation
imagededup copied to clipboard

Tensorflow Requirement

Open varishtsg opened this issue 5 years ago • 6 comments

I have tensorflow gpu v1.10 installed on my system. On running the setup I get this error:

Collecting tensorflow==2.0.0 (from imagededup) Could not find a version that satisfies the requirement tensorflow==2.0.0 (from imagededup) (from versions: 0.12.1, 1.0.0, 1.0.1, 1.1.0rc0, 1.1.0rc1, 1.1.0rc2, 1.1.0, 1.2.0rc0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.3.0rc0, 1.3.0rc1, 1.3.0rc2, 1.3.0, 1.4.0rc0, 1.4.0rc1, 1.4.0, 1.4.1, 1.5.0rc0, 1.5.0rc1, 1.5.0, 1.5.1, 1.6.0rc0, 1.6.0rc1, 1.6.0, 1.7.0rc0, 1.7.0rc1, 1.7.0, 1.7.1, 1.8.0rc0, 1.8.0rc1, 1.8.0, 1.9.0rc0, 1.9.0rc1, 1.9.0rc2, 1.9.0, 1.10.0rc0, 1.10.0rc1, 1.10.0, 1.10.1, 1.11.0rc0, 1.11.0rc1, 1.11.0rc2, 1.11.0, 1.12.0rc0, 1.12.0rc1, 1.12.0rc2, 1.12.0, 1.12.2, 1.12.3, 1.13.0rc0, 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 2.0.0a0, 2.0.0b0, 2.0.0b1) No matching distribution found for tensorflow==2.0.0 (from imagededup)

Is it necessary to install tensorflow v2.0?

varishtsg avatar Oct 26 '19 14:10 varishtsg

Hi it's not! We will relax the requirements in the next release (see this PR #36). We will also see if >TF 1.0 should also be okay.

datitran avatar Oct 29 '19 13:10 datitran

@varishtsg we just released our new version. Can you try again?

datitran avatar Oct 30 '19 16:10 datitran

I ended up removing my existing tensorflow install and installing 1.14 instead. After that I was able to install imagededup. Now I need to test it to see if it works well.

varishtsg avatar Oct 31 '19 17:10 varishtsg

@datitran the cnn encoder worked fine the first time I ran it.

However on using it on a directory with a large number of images I'm getting this error.

duplicates_list = cnn_encoder.find_duplicates(image_dir=desktop_directory, scores=True)

`2019-10-31 23:32:08,777: INFO Start: Image encoding generation


AttributeError Traceback (most recent call last) in ----> 1 duplicates_list = cnn_encoder.find_duplicates(image_dir=desktop_directory, scores=True)

~/.local/lib/python3.6/site-packages/imagededup/methods/cnn.py in find_duplicates(self, image_dir, encoding_map, min_similarity_threshold, scores, outfile) 339 min_similarity_threshold=min_similarity_threshold, 340 scores=scores, --> 341 outfile=outfile, 342 ) 343 elif encoding_map:

~/.local/lib/python3.6/site-packages/imagededup/methods/cnn.py in _find_duplicates_dir(self, image_dir, min_similarity_threshold, scores, outfile) 278 'image1_duplicate2.jpg'], 'image2.jpg':['image1_duplicate1.jpg',..], ..} 279 """ --> 280 self.encode_images(image_dir=image_dir) 281 282 return self._find_duplicates_dict(

~/.local/lib/python3.6/site-packages/imagededup/methods/cnn.py in encode_images(self, image_dir) 173 raise ValueError('Please provide a valid directory path!') 174 --> 175 return self._get_cnn_features_batch(image_dir) 176 177 @staticmethod

~/.local/lib/python3.6/site-packages/imagededup/methods/cnn.py in _get_cnn_features_batch(self, image_dir) 94 95 feat_vec = self.model.predict_generator( ---> 96 self.data_generator, len(self.data_generator), verbose=self.verbose 97 ) 98 self.logger.info('End: Image encoding generation')

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py in predict_generator(self, generator, steps, callbacks, max_queue_size, workers, use_multiprocessing, verbose) 1553 use_multiprocessing=use_multiprocessing, 1554 verbose=verbose, -> 1555 callbacks=callbacks) 1556 1557 def _validate_compile_param_for_distribution_strategy(

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py in model_iteration(model, data, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch, mode, batch_size, steps_name, **kwargs) 330 # Epochs only apply to fit. 331 callbacks.on_epoch_end(epoch, epoch_logs) --> 332 progbar.on_epoch_end(epoch, epoch_logs) 333 334 # Recreate dataset iterator for the next epoch.

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py in on_epoch_end(self, epoch, logs) 779 self.log_values.append((k, logs[k])) 780 if self.verbose: --> 781 self.progbar.update(self.seen, self.log_values) 782 783

AttributeError: 'ProgbarLogger' object has no attribute 'log_values'

`

EDIT: The folder being used was actually empty. It would be great if you could add a check for this.

varishtsg avatar Oct 31 '19 18:10 varishtsg

Good point. Will add it to next release

clennan avatar Nov 18 '19 08:11 clennan

Now that a recursive flag has been added to several methods through #104 (only available in dev at the moment), a comprehensive way to check empty directory can be added.

This is the current behaviour for hashing:

  1. If there are no files in the directory, no error/warning is raised, the logic runs through and returns an empty dictionary.
  2. The above is True even if all files in the directory have a format that is not supported by the package.
  3. The behaviour is same for encode_images, find_duplicates/ find_duplicates_to_remove.

This is the current behaviour for cnn:

  1. If there are no files in the directory, the code breaks.
  2. If all files in the directory have a format that is not supported by the package, encode_images runs through and returns an empty dictionary, but find_duplicates breaks.

This points to an inconsistency in the current implementation and should be addressed by a fix to this issue (The behaviour needs to be same across all types of deduplication methods).

Apart from the above inconsistency, support for recursive directories as implemented in #104 add more considerations:

  1. With the recursive parameter set to False, only the image_dir provided needs to be checked for presence of files (ignoring the subdirectories). At this point, should the validity of file formats also be checked (to catch the case that all files do not adhere to supported formats)? Currently, format validation responsibility lies with imagededup/utils/image_utils.py/load_image. Spreading the same responsibility across different functions does not seem like a good idea to me.
  2. With recursive set to True, should each subdirectory be visited and format checking applied to each?

@datitran @clennan Would be happy to have your inputs.

tanujjain avatar Dec 22 '20 12:12 tanujjain