text-dedup icon indicating copy to clipboard operation
text-dedup copied to clipboard

how about make a ray executor to deduplication

Open simplew2011 opened this issue 3 months ago • 1 comments

  • https://github.com/ChenghaoMou/text-dedup/blob/main/text_dedup/minhash_spark.py
  • reference:https://github.com/alibaba/data-juicer/blob/main/data_juicer/core/ray_executor.py
  • Ray is simpler and faster than Spark

simplew2011 avatar Mar 08 '24 11:03 simplew2011