datatrove
datatrove copied to clipboard
Spark support
I'm wondering if it is possible to add support for other popular large-scale data processing frameworks like spark, since most operations are compatible with the map operation in spark. This would greatly improve the efficiency and scability of the processing pipeline when working with large datasets.