datatrove icon indicating copy to clipboard operation
datatrove copied to clipboard

Support Ray as executor

Open c21 opened this issue 1 year ago • 2 comments

Ray (https://github.com/ray-project/ray) becomes popular choice of running distributed Python ML applications. Its Python interface is easy to scale up the workload from local laptop to distributed cluster. It would be good to add Ray as an executor backend (and we are happy to contribute).

Some more info related in this topic:

  • RAG embedding generation w/ Ray and Pinecone - https://www.anyscale.com/blog/rag-at-scale-10x-cheaper-embedding-computations-with-anyscale-and-pinecone
  • Building RAG-based LLM Applications for Production w/ Ray - https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1

c21 avatar Jan 23 '24 19:01 c21