Zhi Lin comments

Results 84 comments of


                                            Zhi Lin

update to use Spark 3.3

ok, but we'll need a shim layer to add support for 3.3. I've told @KepingYan to look into it. I 'll drop python 3.6 [here](https://github.com/oap-project/raydp/pull/214), and it seems like raydp...

`val_accuracy` equals to 0 when runing pytorch_nyctaxi.py

This example just serves as a demonstration of how to train pytorch models on data loaded/processed by raydp. The data is randomly generated, so it is not expected that the...

How do I store the full graph data in the ObjectStore of each node?

Hi, glad you tried raydp. Ray's object store is shared among nodes. By calling our `create_ml_dataset_from_spark`, you create a `MLDataset`, which is partitioned. That means your data is probably distibuted...

How do I store the full graph data in the ObjectStore of each node?

Is it a GNN application? Like each node needs a full graph, but node/edge features can be partitioned? Anyway I guess you can save the graph to parquet fist, and...

How do I store the full graph data in the ObjectStore of each node?

Hi @YeahNew , I can not find your reply, but I saw it in my mailbox. Have you solved the problem? I think you don't need to use MLDataset for...

RuntimeError: A Ray actor died during training and the maximum number of retries (1) is exhausted

I'm not sure about this, but it seems to have something to do with OpenMP. Can you run some xgboost_ray examples to verify if it's raydp's problem?

RayDP on Databricks

hi @LAITRUNGMINHDUC , RayDP does not has its own SQL implementation for now, I think the performance would be very similar to vanilla spark. If you want to have better...

How to use hive in Spark on RayDp？

This error indicates that the Raydp Jar is not included in the pyspark driver's classpath for some reason. Can you check driver_cp in ray_cluster.py and see if it is a...

How to use hive in Spark on RayDp？

Are you using java 9? In our tests, we use java 8 and spark 3.2.1, you can try this configuration. Can you use pyspark to start a session without raydp?