raydp icon indicating copy to clipboard operation
raydp copied to clipboard

Can't start raydp when ray head node is not the same as the raydp node

Open tdeboer-ilmn opened this issue 2 years ago • 1 comments

I am trying to setup raydp on my ray cluster, but I am creating the ray client like this

import ray, raydp
ray.init(address='ray://10.112.80.176:10001')
spark = raydp.init_spark(app_name='RayDP Example',
                         num_executors=2,
                         executor_cores=2,
                         executor_memory='4GB'
                     )

But this results in these errors

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 spark = raydp.init_spark(app_name='RayDP Example',
      2                          num_executors=2,
      3                          executor_cores=2,
      4                          executor_memory='4GB'
      5                      )

File ~/raymodin/lib/python3.8/site-packages/raydp/context.py:126, in init_spark(app_name, num_executors, executor_cores, executor_memory, configs)
    123 try:
    124     _global_spark_context = _SparkContext(
    125         app_name, num_executors, executor_cores, executor_memory, configs)
--> 126     return _global_spark_context.get_or_create_session()
    127 except:
    128     _global_spark_context = None

File ~/raymodin/lib/python3.8/site-packages/raydp/context.py:70, in _SparkContext.get_or_create_session(self)
     68     return self._spark_session
     69 self.handle = RayDPConversionHelper.options(name=RAYDP_OBJ_HOLDER_NAME).remote()
---> 70 spark_cluster = self._get_or_create_spark_cluster()
     71 self._spark_session = spark_cluster.get_spark_session(
     72     self._app_name,
     73     self._num_executors,
     74     self._executor_cores,
     75     self._executor_memory,
     76     self._configs)
     77 return self._spark_session

File ~/raymodin/lib/python3.8/site-packages/raydp/context.py:63, in _SparkContext._get_or_create_spark_cluster(self)
     61 if self._spark_cluster is not None:
     62     return self._spark_cluster
---> 63 self._spark_cluster = SparkCluster(self._configs)
     64 return self._spark_cluster

File ~/raymodin/lib/python3.8/site-packages/raydp/spark/ray_cluster.py:34, in SparkCluster.__init__(self, configs)
     32 self._app_master_bridge = None
     33 self._configs = configs
---> 34 self._set_up_master(None, None)
     35 self._spark_session: SparkSession = None

File ~/raymodin/lib/python3.8/site-packages/raydp/spark/ray_cluster.py:40, in SparkCluster._set_up_master(self, resources, kwargs)
     37 def _set_up_master(self, resources: Dict[str, float], kwargs: Dict[Any, Any]):
     38     # TODO: specify the app master resource
     39     self._app_master_bridge = RayClusterMaster(self._configs)
---> 40     self._app_master_bridge.start_up()

File ~/raymodin/lib/python3.8/site-packages/raydp/spark/ray_cluster_master.py:56, in RayClusterMaster.start_up(self, popen_kwargs)
     54 self._gateway = self._launch_gateway(extra_classpath, popen_kwargs)
     55 self._app_master_java_bridge = self._gateway.entry_point.getAppMasterBridge()
---> 56 self._set_properties()
     57 self._host = ray.util.get_node_ip_address()
     58 self._create_app_master(extra_classpath)

File ~/raymodin/lib/python3.8/site-packages/raydp/spark/ray_cluster_master.py:145, in RayClusterMaster._set_properties(self)
    142 node = ray.worker.global_worker.node
    144 options["ray.run-mode"] = "CLUSTER"
--> 145 options["ray.node-ip"] = node.node_ip_address
    146 options["ray.address"] = node.redis_address
    147 options["ray.redis.password"] = node.redis_password

AttributeError: 'NoneType' object has no attribute 'node_ip_address'

Seems that it tries to assume that the local machine is the ray server... Is there a way to configure raydp?

tdeboer-ilmn avatar Apr 07 '22 21:04 tdeboer-ilmn

Hi @tdeboer-ilmn , Glad you tried raydp. You are right, raydp.init_spark is assumed to be called in the ray cluster. If you need to use ray client, for the current stable release, you need to wrap your driver program in a ray actor, so that it can be executed on a node in the ray cluster. If you are willing to try raydp-nightly, then you can use raydp.init_spark on your local machine, and it works fine with ray client. However, to_spark does not work now because ray has not merged my PR.

kira-lin avatar Apr 08 '22 02:04 kira-lin

RayDP now works directly in ray client mode. Closing this as stale

kira-lin avatar Apr 17 '23 01:04 kira-lin