fastdup
fastdup copied to clipboard
[Bug]: RuntimeError: Fastdup execution failed
What happened?
imgs_embs_array is numpy array of image embeddings
np.save(ip_dir+ip_file_name, imgs_embs_array)
from fastdup.engine import Fastdup
fd = Fastdup(input_dir=ip_dir)
imgs_embs_array_loaded = np.load(ip_dir+ip_file_name)
fd.run(embeddings=imgs_embs_array_loaded, annotations=annotations_df, overwrite=True)
2025-01-07 07:16:21 [FATAL] Failed to read any features
fastdup C++ error received: 2025-01-07 07:16:21 [FATAL] Failed to read any features
RuntimeError: Fastdup execution failed
What did you expect to see?
No response
What version of fastdup were you runnning on?
2.14
What version of Python were you running on?
Python 3.10
Operating System
[GCC 13.3.0]
Reproduction steps
No response
Relevant log output
No response
Attach a screenshot [Optional]
Contact Details [Optional]
Even when simplified as below code, still getting same error message
np.save("./input_dir/img_embds_numpy.npy", imgs_embs_array)
import fastdup
fd = fastdup.create(work_dir="work_dir/", input_dir="input_dir/")
fd.run()
NoneType: None
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[66], line 4
2 import fastdup
3 fd = fastdup.create(work_dir="work_dir/", input_dir="input_dir/")
----> 4 fd.run()
File /opt/conda/lib/python3.10/site-packages/fastdup/engine.py:157, in Fastdup.run(self, input_dir, annotations, embeddings, subset, data_type, overwrite, model_path, distance, nearest_neighbors_k, threshold, outlier_percentile, num_threads, num_images, verbose, license, high_accuracy, cc_threshold, **kwargs)
154 fastdup_func_params['model_path'] = model_path
155 fastdup_func_params.update(kwargs)
--> 157 return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type,
158 overwrite=overwrite, embeddings=embeddings, **fastdup_func_params)
File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:146, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
144 else:
145 fastdup_capture_exception(f"V1:{func.__name__}", ex)
--> 146 raise ex
148 except Exception as ex:
149 fastdup_capture_exception(f"V1:{func.__name__}", ex)
File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:137, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
135 try:
136 start_time = time.time()
--> 137 ret = func(*args, **kwargs)
138 fastdup_performance_capture(f"V1:{func.__name__}", start_time)
139 return ret
File /opt/conda/lib/python3.10/site-packages/fastdup/fastdup_controller.py:618, in FastdupController.run(self, input_dir, annotations, subset, embeddings, data_type, overwrite, print_summary, print_vl_datasets_ref, run_explore, dataset_name, verbose, run_fast, **fastdup_kwargs)
616 if not run_fast:
617 if fastdup.run(fastdup_input, work_dir=str(self._work_dir), logger=self._logger, **fastdup_kwargs) != 0:
--> 618 raise RuntimeError('Fastdup execution failed')
620 # post process - map fastdup-id to image (for bbox this is done in self._set_fastdup_input)
621 if self._dtype == FD.IMG or self._run_mode == FD.MODE_CROP:
RuntimeError: Fastdup execution failed
Hello @rapidcrawler The proper way to save binary features to be read by fastdup is by the call: https://visual-layer.readme.io/docs/v02xx-api#save_binary_feature
Example for loading the feature is here: https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/feature_vectors.ipynb You need to provide the embeddings into fastdup to read them like this:
fd = fastdup.create(input_dir="images/", work_dir='output')
fd.run(annotations=filenames, embeddings=feature_vec)
Please try it out and let us know if this works. BTW to better debug please user versbose=True to the run() call
Thanks @dbickson, it's working now.
The general idea helped. Since I don't have direct access to images as of now, just image-embeddings, thus couldn't use save_binary_feature
But, passing available embeddings via fd.run(embeddings=np.array(embs)) helped me. Updated working code below
from fastdup.engine import Fastdup
fd = Fastdup(input_dir="/")
fd.run(embeddings=np.array(embs)
, annotations=annotations_df
, overwrite=True)
BTW @dbickson , any reason why the library returns error if I pass more than 5k embeddings at a time?
I.e. below code has slicer at top 5k, and it gets successfully executed and returns the answer as per expectations
start = dt.now()
from fastdup.engine import Fastdup
fd = Fastdup(input_dir="/")
fd.run(embeddings=np.array(embs)[:5000]
, annotations=annotations_df.head(5000)
, overwrite=True
, verbose=True)
df_sim = fd.similarity()
end = dt.now()
df_sim
However, If I increase the slicer index to 10k or 6k, it is returning below error message about RuntimeError: Fastdup execution failed
start = dt.now()
from fastdup.engine import Fastdup
fd = Fastdup(input_dir="/")
fd.run(embeddings=np.array(embs)[:6000]
, annotations=annotations_df.head(6000)
, overwrite=True
, verbose=True)
df_sim = fd.similarity()
end = dt.now()
df_sim
fastdup By Visual Layer, Inc. 2024. All rights reserved.
A fastdup dataset object was created!
Input directory is set to "/"
Work directory is set to "work_dir"
The next steps are:
1. Analyze your dataset with the .run() function of the dataset object
2. Interactively explore your data on your local machine with the .explore() function of the dataset object
For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup.
2025-01-07 18:05:25 [FATAL] Failed to read any features
NoneType: None
fastdup C++ error received: 2025-01-07 18:05:25 [FATAL] Failed to read any features
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[42], line 7
2 from fastdup.engine import Fastdup
4 fd = Fastdup(input_dir="/")
----> 7 fd.run(embeddings=np.array(embs)[:6000]
8 , annotations=annotations_df.head(6000)
9 , overwrite=True)
11 df_sim = fd.similarity()
12 end = dt.now()
File /opt/conda/lib/python3.10/site-packages/fastdup/engine.py:157, in Fastdup.run(self, input_dir, annotations, embeddings, subset, data_type, overwrite, model_path, distance, nearest_neighbors_k, threshold, outlier_percentile, num_threads, num_images, verbose, license, high_accuracy, cc_threshold, **kwargs)
154 fastdup_func_params['model_path'] = model_path
155 fastdup_func_params.update(kwargs)
--> 157 return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type,
158 overwrite=overwrite, embeddings=embeddings, **fastdup_func_params)
File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:146, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
144 else:
145 fastdup_capture_exception(f"V1:{func.__name__}", ex)
--> 146 raise ex
148 except Exception as ex:
149 fastdup_capture_exception(f"V1:{func.__name__}", ex)
File /opt/conda/lib/python3.10/site-packages/fastdup/sentry.py:137, in v1_sentry_handler.<locals>.inner_function(*args, **kwargs)
135 try:
136 start_time = time.time()
--> 137 ret = func(*args, **kwargs)
138 fastdup_performance_capture(f"V1:{func.__name__}", start_time)
139 return ret
File /opt/conda/lib/python3.10/site-packages/fastdup/fastdup_controller.py:618, in FastdupController.run(self, input_dir, annotations, subset, embeddings, data_type, overwrite, print_summary, print_vl_datasets_ref, run_explore, dataset_name, verbose, run_fast, **fastdup_kwargs)
616 if not run_fast:
617 if fastdup.run(fastdup_input, work_dir=str(self._work_dir), logger=self._logger, **fastdup_kwargs) != 0:
--> 618 raise RuntimeError('Fastdup execution failed')
620 # post process - map fastdup-id to image (for bbox this is done in self._set_fastdup_input)
621 if self._dtype == FD.IMG or self._run_mode == FD.MODE_CROP:
RuntimeError: Fastdup execution failed
Today it's running if embeddings are ~ 500 rows. Anything more than 500 embeds is throwing RuntimeError: Fastdup execution failed error. Is there any API rate limit or internal throttling applied, that daily reduces the number of embeds that can be processed?
Hi @rapidcrawler this is weird. Can you run() with verbose=1 so we can see what is the issue.
Alternatively, you can use v0.2 API namely: fastdup.save_binary_features(work_dir, file_list, embedding) to save binary files to work_dir. And then fastdup.run(input_dir, work_dir, run_mode=2, threshold=0) to create the similarities. The output will be at work_dir/similarity.csv
Let us know if this worked for you.
Hi @rapidcrawler did you manage to run? Thanks