Zhi Lin

Intel Shanghai ML engineer

Results 84 comments of


                                            Zhi Lin

Can I use spark.createDataFrame() with a list of ObjectRef from various remote Ray workers?

You can try to use `sdf.mapInPandas` instead of rdd flatmap. Here is a [doc](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html). This step is quite similar to the `to_spark` function in the previously mentioned PR(arrow table and...

Can I use spark.createDataFrame() with a list of ObjectRef from various remote Ray workers?

Notice that `x` passed to `map_func` would be a iterator of pandas dataframe. If not clear, please search `mapInPandas` in the [doc](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html). The function should look like this one in...

Can I use spark.createDataFrame() with a list of ObjectRef from various remote Ray workers?

> I wonder what implications I might encounter without using Dataset as the intermediary. I think there is not big difference. We need to use `to_pandas` because the data stored...

How to request all available resources in the ray cluster in init_spark()?

When printing available resources, have all executor actors started? It might take some time. Does all nodes have the same resources(at least CPU and memory)?

How to request all available resources in the ray cluster in init_spark()?

hmm, I think you can use 2GB memory per core, so that you can use all cores on your cluster. If that's not enough for your workload, then you have...

Error while using foreachBatch in writeStreaming

Hi @gbraes , we are very excited to see that you have tried many pipelines with raydp! We have never tested raydp with kafka though. Can you please give a...

Error while using foreachBatch in writeStreaming

Hi @gbraes , glad you made it work! This is quite strange, since 0.4.2 just added support for ray 1.11 and 1.12. There are no major changes. We'll look into...

Implement shuffle service based on ray object store

This might be a good idea! Thanks for your advice. I have a concern that ray dataset use arrow format while spark dataframe use its own format, though. But we'll...

ray.get() dead-locks in applyInPandas()

hi @Hoeze, applyInPandas will start python workers, and these workers are not connected to ray. Actor itself is a process, so it's not quite possible to 'reuse' its session. In...

Failed validating 'additionalProperties' in schema when using Python 3.8 for docker example

Sorry, this file is pretty stale now, we did not update it for a long time. Please refer to [here](https://docs.ray.io/en/latest/cluster/kubernetes/index.html) to start a ray cluster on k8s.

‹
1
2
3
4
5
6
7
8
9
›