text-dedup icon indicating copy to clipboard operation
text-dedup copied to clipboard

minhash deduplication error

Open bowspider-man opened this issue 1 year ago • 1 comments

When I run the minhash instance in the code,I encountered the following problem,I believe this should be an environmental issue, but I don't know how to do it specifically.

Iterating MinHashes...: 0%| | 0/1 [00:00<?, ?it/s] Traceback (most recent call last): File "/data/miniconda3/envs/env-novelai/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/data/miniconda3/envs/env-novelai/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/minhash.py", line 314, in main() File "/data/miniconda3/envs/env-novelai/lib/python3.10/site-packages/click/core.py", line 1130, in call return self.main(*args, **kwargs) File "/data/miniconda3/envs/env-novelai/lib/python3.10/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/data/miniconda3/envs/env-novelai/lib/python3.10/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/data/miniconda3/envs/env-novelai/lib/python3.10/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/utils/args.py", line 61, in wrapper return func(*args, **kwargs, io_args=io_args) File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/utils/args.py", line 85, in wrapper return func(*args, **kwargs, meta_args=meta_args) File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/utils/args.py", line 144, in wrapper return func(*args, **kwargs, minhash_args=minhash_args) File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/minhash.py", line 215, in main with timer("Total"): File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/utils/timer.py", line 18, in exit raise exc_val File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/minhash.py", line 248, in main with timer("Clustering"): File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/utils/timer.py", line 18, in exit raise exc_val File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/minhash.py", line 255, in main embedded_shard = embedded.shard( AttributeError: 'DatasetDict' object has no attribute 'shard'. Did you mean: 'shape'?

bowspider-man avatar Oct 29 '24 16:10 bowspider-man

It seems that a datasetdict {train: [...], test:[...]} was loaded instead of a dataset [...]. Could you share the command you used?

ChenghaoMou avatar Oct 29 '24 18:10 ChenghaoMou

It seems that a datasetdict {train: [...], test:[...]} was loaded instead of a dataset [...]. Could you share the command you used?

Yes, I found this problem too, and it has been solved now. Thanks for your reply

bowspider-man avatar Oct 30 '24 11:10 bowspider-man