Zhi Lin
Zhi Lin
https://stackoverflow.com/questions/57519804/using-jackson-2-9-9-in-java-spark Maybe this link helps. I'm not sure if this error is related to RayDP
Hi, It seems like only under `ray.init('auto', logging_level='warn')` and add `configs={"spark.driver.extraJavaOptions": "-Dlog4j.configuration=/home/lzhi/log4j.properties"}` in raydp.init_spark can the logging level be set. We'll look into it. Thanks for raising this issue
Hi @gbraes, This problem is ray's jar wraps slf4j, while pyspark/jars/ also contains slf4j. When spark starts, we need to add ray.jar to classpath, so it'll find two bindings, and...
That is also possible, because as slf4j says, it's somehow random(up to jvm) to decide which one to load when multiple jars is found in classpath. Glad to hear that...
You might want to take a look at `ray.experiment.data`, to which we are currently adding support. But it requires ray-nightly, so for now you can probably use our MLDataset. You...
1. yes they are not related. I was just suggesting maybe you can try our MLDataset, or if you want to solve the problem, you should use cloudpickle 2. No,...
What you want to do is probably this: ```python import ray import pandas as pd @ray.remote def create_small_dataframe(i): return pd.DataFrame(data=np.random.randint(5*i, size=(3, 4))) # these are ObjectRef[pd.Dataframe] obj_ref1 = create_small_dataframe.remote(1) obj_ref2...
oh yes, I see. Just FYI, in ray-nightly, a feature similar to this is under development, in ray.experimental.data. If you don't need it to be a spark dataframe, you might...
The python function in `flatMap` is performed by pyspark workers, and these are not controlled by raydp. Therefore, these processes are not connected to ray, hence the exception. You need...
Have you tried this: ```python def map_func(x): # command for executors to connect to ray cluster # ray.init will also work ray.client().connect() # actual work using ray ray.get(ray.cloudpickle.loads(x['Pandas_df_ref'])) myrdd =...