fastdup [Bug]: RuntimeError: Fastdup execution failed

What happened?

imgs_embs_array is numpy array of image embeddings

np.save(ip_dir+ip_file_name, imgs_embs_array)

from fastdup.engine import Fastdup
fd = Fastdup(input_dir=ip_dir)
imgs_embs_array_loaded = np.load(ip_dir+ip_file_name)
fd.run(embeddings=imgs_embs_array_loaded, annotations=annotations_df, overwrite=True)

2025-01-07 07:16:21 [FATAL] Failed to read any features
fastdup C++ error received:  2025-01-07 07:16:21 [FATAL] Failed to read any features
RuntimeError: Fastdup execution failed

What did you expect to see?

No response

What version of fastdup were you runnning on?

2.14

What version of Python were you running on?

Python 3.10

Operating System

[GCC 13.3.0]

Reproduction steps

No response

Relevant log output

No response

Attach a screenshot [Optional]

Screen Shot 2025-01-07 at 13 20 00 PM

Contact Details [Optional]

[email protected]

Jan 07 '25 07:01 rapidcrawler

Even when simplified as below code, still getting same error message

np.save("./input_dir/img_embds_numpy.npy", imgs_embs_array)
import fastdup
fd = fastdup.create(work_dir="work_dir/", input_dir="input_dir/")
fd.run()

NoneType: None
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[66], line 4
      2 import fastdup
      3 fd = fastdup.create(work_dir="work_dir/", input_dir="input_dir/")
----> 4 fd.run()

File /opt/conda/lib/python3.10/site-packages/fastdup/engine.py:157, in Fastdup.run(self, input_dir, annotations, embeddings, subset, data_type, overwrite, model_path, distance, nearest_neighbors_k, threshold, outlier_percentile, num_threads, num_images, verbose, license, high_accuracy, cc_threshold, **kwargs)
    154     fastdup_func_params['model_path'] = model_path
    155 fastdup_func_params.update(kwargs)
--> 157 return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type,
    158             overwrite=overwrite, embeddings=embeddings, **fastdup_func_params)

File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:146, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
    144     else:
    145         fastdup_capture_exception(f"V1:{func.__name__}", ex)
--> 146     raise ex
    148 except Exception as ex:
    149     fastdup_capture_exception(f"V1:{func.__name__}", ex)

File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:137, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
    135 try:
    136     start_time = time.time()
--> 137     ret = func(*args, **kwargs)
    138     fastdup_performance_capture(f"V1:{func.__name__}", start_time)
    139     return ret

File /opt/conda/lib/python3.10/site-packages/fastdup/fastdup_controller.py:618, in FastdupController.run(self, input_dir, annotations, subset, embeddings, data_type, overwrite, print_summary, print_vl_datasets_ref, run_explore, dataset_name, verbose, run_fast, **fastdup_kwargs)
    616 if not run_fast:
    617     if fastdup.run(fastdup_input, work_dir=str(self._work_dir), logger=self._logger, **fastdup_kwargs) != 0:
--> 618         raise RuntimeError('Fastdup execution failed')
    620     # post process - map fastdup-id to image (for bbox this is done in self._set_fastdup_input)
    621     if self._dtype == FD.IMG or self._run_mode == FD.MODE_CROP:

RuntimeError: Fastdup execution failed

Jan 07 '25 09:01 rapidcrawler

Hello @rapidcrawler The proper way to save binary features to be read by fastdup is by the call: https://visual-layer.readme.io/docs/v02xx-api#save_binary_feature

Example for loading the feature is here: https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/feature_vectors.ipynb You need to provide the embeddings into fastdup to read them like this:

fd = fastdup.create(input_dir="images/", work_dir='output')  
fd.run(annotations=filenames, embeddings=feature_vec)

Please try it out and let us know if this works. BTW to better debug please user versbose=True to the run() call

Jan 07 '25 12:01 dbickson

Thanks @dbickson, it's working now.

The general idea helped. Since I don't have direct access to images as of now, just image-embeddings, thus couldn't use save_binary_feature

But, passing available embeddings via fd.run(embeddings=np.array(embs)) helped me. Updated working code below

from fastdup.engine import Fastdup
fd = Fastdup(input_dir="/")
fd.run(embeddings=np.array(embs)
       , annotations=annotations_df
       , overwrite=True)

Jan 07 '25 16:01 rapidcrawler

BTW @dbickson , any reason why the library returns error if I pass more than 5k embeddings at a time?

I.e. below code has slicer at top 5k, and it gets successfully executed and returns the answer as per expectations

start = dt.now()
from fastdup.engine import Fastdup

fd = Fastdup(input_dir="/")


fd.run(embeddings=np.array(embs)[:5000]
       , annotations=annotations_df.head(5000)
       , overwrite=True
       , verbose=True)

df_sim  = fd.similarity()
end = dt.now()
df_sim

However, If I increase the slicer index to 10k or 6k, it is returning below error message about RuntimeError: Fastdup execution failed

start = dt.now()
from fastdup.engine import Fastdup

fd = Fastdup(input_dir="/")


fd.run(embeddings=np.array(embs)[:6000]
       , annotations=annotations_df.head(6000)
       , overwrite=True
       , verbose=True)

df_sim  = fd.similarity()
end = dt.now()
df_sim


fastdup By Visual Layer, Inc. 2024. All rights reserved.

A fastdup dataset object was created!

Input directory is set to "/"
Work directory is set to "work_dir"

The next steps are:
   1. Analyze your dataset with the .run() function of the dataset object
   2. Interactively explore your data on your local machine with the .explore() function of the dataset object

For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup.

2025-01-07 18:05:25 [FATAL] Failed to read any features
NoneType: None
fastdup C++ error received:  2025-01-07 18:05:25 [FATAL] Failed to read any features
 

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[42], line 7
      2 from fastdup.engine import Fastdup
      4 fd = Fastdup(input_dir="/")
----> 7 fd.run(embeddings=np.array(embs)[:6000]
      8        , annotations=annotations_df.head(6000)
      9        , overwrite=True)
     11 df_sim  = fd.similarity()
     12 end = dt.now()

File /opt/conda/lib/python3.10/site-packages/fastdup/engine.py:157, in Fastdup.run(self, input_dir, annotations, embeddings, subset, data_type, overwrite, model_path, distance, nearest_neighbors_k, threshold, outlier_percentile, num_threads, num_images, verbose, license, high_accuracy, cc_threshold, **kwargs)
    154     fastdup_func_params['model_path'] = model_path
    155 fastdup_func_params.update(kwargs)
--> 157 return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type,
    158             overwrite=overwrite, embeddings=embeddings, **fastdup_func_params)

File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:146, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
    144     else:
    145         fastdup_capture_exception(f"V1:{func.__name__}", ex)
--> 146     raise ex
    148 except Exception as ex:
    149     fastdup_capture_exception(f"V1:{func.__name__}", ex)

File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:137, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
    135 try:
    136     start_time = time.time()
--> 137     ret = func(*args, **kwargs)
    138     fastdup_performance_capture(f"V1:{func.__name__}", start_time)
    139     return ret

File /opt/conda/lib/python3.10/site-packages/fastdup/fastdup_controller.py:618, in FastdupController.run(self, input_dir, annotations, subset, embeddings, data_type, overwrite, print_summary, print_vl_datasets_ref, run_explore, dataset_name, verbose, run_fast, **fastdup_kwargs)
    616 if not run_fast:
    617     if fastdup.run(fastdup_input, work_dir=str(self._work_dir), logger=self._logger, **fastdup_kwargs) != 0:
--> 618         raise RuntimeError('Fastdup execution failed')
    620     # post process - map fastdup-id to image (for bbox this is done in self._set_fastdup_input)
    621     if self._dtype == FD.IMG or self._run_mode == FD.MODE_CROP:

RuntimeError: Fastdup execution failed

Jan 07 '25 18:01 rapidcrawler

Today it's running if embeddings are ~ 500 rows. Anything more than 500 embeds is throwing RuntimeError: Fastdup execution failed error. Is there any API rate limit or internal throttling applied, that daily reduces the number of embeds that can be processed?

Jan 08 '25 06:01 rapidcrawler

Hi @rapidcrawler this is weird. Can you run() with verbose=1 so we can see what is the issue.

Alternatively, you can use v0.2 API namely: fastdup.save_binary_features(work_dir, file_list, embedding) to save binary files to work_dir. And then fastdup.run(input_dir, work_dir, run_mode=2, threshold=0) to create the similarities. The output will be at work_dir/similarity.csv

Let us know if this worked for you.

Jan 09 '25 12:01 dbickson

Hi @rapidcrawler did you manage to run? Thanks

Jan 16 '25 09:01 dbickson

fastdup fastdup copied to clipboard

[Bug]: RuntimeError: Fastdup execution failed

What happened?

What did you expect to see?

What version of fastdup were you runnning on?

What version of Python were you running on?

Operating System

Reproduction steps

Relevant log output

Attach a screenshot [Optional]

Contact Details [Optional]

fastdup
fastdup copied to clipboard