datatrove icon indicating copy to clipboard operation
datatrove copied to clipboard

Spark support

Open jordane95 opened this issue 2 years ago • 3 comments

I'm wondering if it is possible to add support for other popular large-scale data processing frameworks like spark, since most operations are compatible with the map operation in spark. This would greatly improve the efficiency and scability of the processing pipeline when working with large datasets.

jordane95 avatar Jan 30 '24 15:01 jordane95

Is there any update on this? @guipenedo

xiaohanzhan-db avatar Feb 26 '24 23:02 xiaohanzhan-db

Is there any update on this? @jordane95 @guipenedo

maoxiangyi avatar Jun 17 '24 10:06 maoxiangyi

is there any update on this?

i could use this one instead of spark support. but i think this repo is not complete one(only minhash provided i think).

aiqwe avatar Jul 02 '24 09:07 aiqwe