When I run the minhash instance in the code,I encountered the following problem,I believe this should be an environmental issue, but I don't know how to do it specifically.
Iterating MinHashes...: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/data/miniconda3/envs/env-novelai/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/data/miniconda3/envs/env-novelai/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/minhash.py", line 314, in
main()
File "/data/miniconda3/envs/env-novelai/lib/python3.10/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/data/miniconda3/envs/env-novelai/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/data/miniconda3/envs/env-novelai/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/data/miniconda3/envs/env-novelai/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/utils/args.py", line 61, in wrapper
return func(*args, **kwargs, io_args=io_args)
File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/utils/args.py", line 85, in wrapper
return func(*args, **kwargs, meta_args=meta_args)
File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/utils/args.py", line 144, in wrapper
return func(*args, **kwargs, minhash_args=minhash_args)
File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/minhash.py", line 215, in main
with timer("Total"):
File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/utils/timer.py", line 18, in exit
raise exc_val
File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/minhash.py", line 248, in main
with timer("Clustering"):
File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/utils/timer.py", line 18, in exit
raise exc_val
File "/cfs/cfs-5197cf3ac/jarvisjhhe/text-dedup-main/text_dedup/minhash.py", line 255, in main
embedded_shard = embedded.shard(
AttributeError: 'DatasetDict' object has no attribute 'shard'. Did you mean: 'shape'?
It seems that a datasetdict {train: [...], test:[...]} was loaded instead of a dataset [...]. Could you share the command you used?
It seems that a datasetdict {train: [...], test:[...]} was loaded instead of a dataset [...]. Could you share the command you used?
Yes, I found this problem too, and it has been solved now. Thanks for your reply