GraphScope
GraphScope copied to clipboard
[BUG] session config k8s_coordinator_pod_node_selector Not effective
Describe the bug add label for k8s worker
kubectl label nodes node-worker graphscope=1
create session
session = graphscope.session(
k8s_coordinator_cpu=1,
k8s_coordinator_mem="1Gi",
k8s_vineyard_cpu=4,
k8s_vineyard_mem="5Gi",
vineyard_shared_mem="5Gi",
k8s_engine_cpu=2,
k8s_namespace='gs-new-orc-jacky100',
k8s_engine_mem="5Gi",
num_workers=3,
k8s_coordinator_pod_node_selector={"graphscope":"1"},
k8s_engine_pod_node_selector={"graphscope":"1"},
k8s_image_tag="latest",
k8s_client_config='~/.kube/config')
error log
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Deployment in
version \"v1\" cannot be handled as a Deployment: json: cannot unmarshal string into Go struct field
PodSpec.spec.template.spec.nodeSelector of type map[string]string","reason":"BadRequest","code":400}
Hi, @JackyYangPassion. Thanks for the report.
Could you please provide the full error log? Thanks.
Thanks for your reply @dashanji the full log from the jupyter notbook
2024-01-10 20:50:11,794 [INFO][cluster:235]: Launching coordinator...
2024-01-10 20:50:12,802 [INFO][cluster:414]: Stopping coordinator
2024-01-10 20:50:12,825 [INFO][cluster:434]: Stopped coordinator
2024-01-10 20:50:12,825 [INFO][cluster:414]: Stopping coordinator
2024-01-10 20:50:12,826 [INFO][cluster:434]: Stopped coordinator
ââââââââââââââââââââââââââââââââ Traceback (most recent call last) âââââââââââââââââââââââââââââââââŪ
â in <module>:31 â
â â
â 28 â â â â â â â k8s_coordinator_pod_node_selector={"graphscope":"1"}, â
â 29 â â â â â â â k8s_engine_pod_node_selector={"graphscope":"1"}, â
â 30 â â â â â â â k8s_image_tag="latest", â
â âą 31 â â â â â â â k8s_client_config='~/.kube/config') â
â 32 print('========= Session created. ==========') â
â 33 â
â â
â /usr/local/python3/lib/python3.7/site-packages/graphscope/client/session.py:563 in __init__ â
â â
â 560 â â atexit.register(self.close) â
â 561 â â # create and connect session â
â 562 â â with CaptureKeyboardInterrupt(self.close): â
â âą 563 â â â self._connect() â
â 564 â â â
â 565 â â self._disconnected: bool = False â
â 566 â
â â
â /usr/local/python3/lib/python3.7/site-packages/graphscope/client/session.py:909 in _connect â
â â
â 906 â â â
â 907 â â # launching graphscope service â
â 908 â â if self._launcher is not None: â
â âą 909 â â â self._launcher.start() â
â 910 â â â self._coordinator_endpoint = self._launcher.coordinator_endpoint â
â 911 â â â
â 912 â â # waiting service ready â
â â
â /usr/local/python3/lib/python3.7/site-packages/graphscope/deploy/kubernetes/cluster.py:389 in â
â start â
â â
â 386 â â â self._create_namespace() â
â 387 â â â self._create_role_and_binding() â
â 388 â â â â
â âą 389 â â â self._create_services() â
â 390 â â â time.sleep(1) â
â 391 â â â â
â 392 â â â self._waiting_for_services_ready() â
â â
â /usr/local/python3/lib/python3.7/site-packages/graphscope/deploy/kubernetes/cluster.py:301 in â
â _create_services â
â â
â 298 â â return args â
â 299 â â
â 300 â def _create_services(self): â
â âą 301 â â self._create_coordinator() â
â 302 â â
â 303 â def _waiting_for_services_ready(self): â
â 304 â â response = self._app_api.read_namespaced_deployment_status( â
â â
â /usr/local/python3/lib/python3.7/site-packages/graphscope/deploy/kubernetes/cluster.py:274 in â
â _create_coordinator â
â â
â 271 â â â
â 272 â â deployment = coordinator.get_coordinator_deployment() â
â 273 â â response = self._app_api.create_namespaced_deployment( â
â âą 274 â â â self._namespace, deployment â
â 275 â â ) â
â 276 â â targets.append(response) â
â 277 â
â â
â /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/api/apps_v1_api.py:353 in â
â create_namespaced_deployment â
â â
â 350 â â â â returns the request thread. â
â 351 â â """ â
â 352 â â kwargs['_return_http_data_only'] = True â
â âą 353 â â return self.create_namespaced_deployment_with_http_info(namespace, body, **kwarg â
â 354 â â
â 355 â def create_namespaced_deployment_with_http_info(self, namespace, body, **kwargs): # â
â 356 â â """create_namespaced_deployment # noqa: E501 â
â â
â /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/api/apps_v1_api.py:466 in â
â create_namespaced_deployment_with_http_info â
â â
â 463 â â â _return_http_data_only=local_var_params.get('_return_http_data_only'), # no â
â 464 â â â _preload_content=local_var_params.get('_preload_content', True), â
â 465 â â â _request_timeout=local_var_params.get('_request_timeout'), â
â âą 466 â â â collection_formats=collection_formats) â
â 467 â â
â 468 â def create_namespaced_replica_set(self, namespace, body, **kwargs): # noqa: E501 â
â 469 â â """create_namespaced_replica_set # noqa: E501 â
â â
â /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/api_client.py:353 in call_api â
â â
â 350 â â â â â â â â body, post_params, files, â
â 351 â â â â â â â â response_type, auth_settings, â
â 352 â â â â â â â â _return_http_data_only, collection_formats, â
â âą 353 â â â â â â â â _preload_content, _request_timeout, _host) â
â 354 â â â
â 355 â â return self.pool.apply_async(self.__call_api, (resource_path, â
â 356 â â â â â â â â â â â â â method, path_params, â
â â
â /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/api_client.py:184 in __call_api â
â â
â 181 â â â method, url, query_params=query_params, headers=header_params, â
â 182 â â â post_params=post_params, body=body, â
â 183 â â â _preload_content=_preload_content, â
â âą 184 â â â _request_timeout=_request_timeout) â
â 185 â â â
â 186 â â self.last_response = response_data â
â 187 â
â â
â /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/api_client.py:397 in request â
â â
â 394 â â â â â â â â â â post_params=post_params, â
â 395 â â â â â â â â â â _preload_content=_preload_content, â
â 396 â â â â â â â â â â _request_timeout=_request_timeout, â
â âą 397 â â â â â â â â â â body=body) â
â 398 â â elif method == "PUT": â
â 399 â â â return self.rest_client.PUT(url, â
â 400 â â â â â â â â â â query_params=query_params, â
â â
â /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/rest.py:285 in POST â
â â
â 282 â â â â â â â post_params=post_params, â
â 283 â â â â â â â _preload_content=_preload_content, â
â 284 â â â â â â â _request_timeout=_request_timeout, â
â âą 285 â â â â â â â body=body) â
â 286 â â
â 287 â def PUT(self, url, headers=None, query_params=None, post_params=None, â
â 288 â â â body=None, _preload_content=True, _request_timeout=None): â
â â
â /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/rest.py:238 in request â
â â
â 235 â â â logger.debug("response body: %s", r.data) â
â 236 â â â
â 237 â â if not 200 <= r.status <= 299: â
â âą 238 â â â raise ApiException(http_resp=r) â
â 239 â â â
â 240 â â return r â
â 241 â
â°âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââŊ
ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': '6966ccb2-e7da-461a-a521-e4864dda18c4', 'Cache-Control':
'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid':
'0c2b55b8-02df-4c93-956c-e04dc793d0cb', 'X-Kubernetes-Pf-Prioritylevel-Uid':
'0ddb7b8c-60c6-44e5-ac99-c6e7df6626ae', 'Date': 'Wed, 10 Jan 2024 12:50:11 GMT', 'Content-Length': '295'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Deployment in
version \"v1\" cannot be handled as a Deployment: json: cannot unmarshal string into Go struct field
PodSpec.spec.template.spec.nodeSelector of type map[string]string","reason":"BadRequest","code":400}
@JackyYangPassion Thanks, we try to reproduce the bug.
/cc @yecol @sighingnow, this issus/pr has had no activity for a long time, please help to review the status and assign people to work on it.