datatrove
datatrove copied to clipboard
Spark support
I'm wondering if it is possible to add support for other popular large-scale data processing frameworks like spark, since most operations are compatible with the map operation in spark. This would greatly improve the efficiency and scability of the processing pipeline when working with large datasets.
Is there any update on this? @guipenedo
Is there any update on this? @jordane95 @guipenedo
is there any update on this?
i could use this one instead of spark support. but i think this repo is not complete one(only minhash provided i think).