raydp icon indicating copy to clipboard operation
raydp copied to clipboard

How to store other types of data in ObjectStore distributedly ?

Open YeahNew opened this issue 3 years ago • 2 comments

After I use raydp to read the data, I got the data in tensor format after GNN training. Now I want to store it into ObjectStore distributedly, and then I can take it out and combine it with other feature data for logistic regression (Distributed Scikit-learn / Joblib). But by calling create_ml_dataset_from_spark(), I found that this method only accepts data of type sql.DataFrame. Do I need to convert the data? Can you provide methods to store other data types in the future?

YeahNew avatar Aug 24 '21 07:08 YeahNew

Hi @YeahNew , it seems this is no longer related to Spark. If your data is not in Spark, probably you can directly use Ray's API instead of APIs from RayDP. If you still want to create a MLDataset, you can create a Ray parallel iterator and use Ray MLDataset APIs from_parallel_it. By the way, we are collecting some real RayDP use cases. Your workload sounds very interesting. Are you using RayDP and other frameworks for a real use case in your company?

carsonwang avatar Aug 24 '21 09:08 carsonwang

Hi @YeahNew , it seems this is no longer related to Spark. If your data is not in Spark, probably you can directly use Ray's API instead of APIs from RayDP. If you still want to create a MLDataset, you can create a Ray parallel iterator and use Ray MLDataset APIs from_parallel_it. By the way, we are collecting some real RayDP use cases. Your workload sounds very interesting. Are you using RayDP and other frameworks for a real use case in your company?

OK, I get. Yes. It seems that ray APIs does not provide a method that can directly store data in a distributed manner into the ObjectStore.

YeahNew avatar Aug 24 '21 10:08 YeahNew

close as stale. Putting data into ray object store will ensure you can fetch it from any node of the ray cluster. If the data is already distributed when you put it, then it'll be distributed. If it's not, you also don't need to do extra things because it will be fetched to the node where you want to use it when you call ray.get

kira-lin avatar Apr 14 '23 08:04 kira-lin