datatrove
datatrove copied to clipboard
Support Ray as executor
Ray (https://github.com/ray-project/ray) becomes popular choice of running distributed Python ML applications. Its Python interface is easy to scale up the workload from local laptop to distributed cluster. It would be good to add Ray as an executor backend (and we are happy to contribute).
Some more info related in this topic:
- RAG embedding generation w/ Ray and Pinecone - https://www.anyscale.com/blog/rag-at-scale-10x-cheaper-embedding-computations-with-anyscale-and-pinecone
- Building RAG-based LLM Applications for Production w/ Ray - https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1