Zhi Lin comments

Results 84 comments of


                                            Zhi Lin

How to use hive in Spark on RayDp？

https://stackoverflow.com/questions/57519804/using-jackson-2-9-9-in-java-spark Maybe this link helps. I'm not sure if this error is related to RayDP

Setting log level in RayDP

Hi, It seems like only under `ray.init('auto', logging_level='warn')` and add `configs={"spark.driver.extraJavaOptions": "-Dlog4j.configuration=/home/lzhi/log4j.properties"}` in raydp.init_spark can the logging level be set. We'll look into it. Thanks for raising this issue

Setting log level in RayDP

Hi @gbraes, This problem is ray's jar wraps slf4j, while pyspark/jars/ also contains slf4j. When spark starts, we need to add ray.jar to classpath, so it'll find two bindings, and...

Setting log level in RayDP

That is also possible, because as slf4j says, it's somehow random(up to jvm) to decide which one to load when multiple jars is found in classpath. Glad to hear that...

Can I use spark.createDataFrame() with a list of ObjectRef from various remote Ray workers?

You might want to take a look at `ray.experiment.data`, to which we are currently adding support. But it requires ray-nightly, so for now you can probably use our MLDataset. You...

Can I use spark.createDataFrame() with a list of ObjectRef from various remote Ray workers?

1. yes they are not related. I was just suggesting maybe you can try our MLDataset, or if you want to solve the problem, you should use cloudpickle 2. No,...

Can I use spark.createDataFrame() with a list of ObjectRef from various remote Ray workers?

What you want to do is probably this: ```python import ray import pandas as pd @ray.remote def create_small_dataframe(i): return pd.DataFrame(data=np.random.randint(5*i, size=(3, 4))) # these are ObjectRef[pd.Dataframe] obj_ref1 = create_small_dataframe.remote(1) obj_ref2...