GraphScope icon indicating copy to clipboard operation
GraphScope copied to clipboard

[BUG] session config k8s_coordinator_pod_node_selector Not effective

Open JackyYangPassion opened this issue 1 year ago â€Ē 4 comments

Describe the bug add label for k8s worker

kubectl label nodes  node-worker  graphscope=1

create session

session = graphscope.session(
                             k8s_coordinator_cpu=1,
                             k8s_coordinator_mem="1Gi",
                             k8s_vineyard_cpu=4,
                             k8s_vineyard_mem="5Gi",
                             vineyard_shared_mem="5Gi",
                             k8s_engine_cpu=2,
                             k8s_namespace='gs-new-orc-jacky100',
                             k8s_engine_mem="5Gi",
                             num_workers=3,
                             k8s_coordinator_pod_node_selector={"graphscope":"1"},
                             k8s_engine_pod_node_selector={"graphscope":"1"},
                             k8s_image_tag="latest",
                             k8s_client_config='~/.kube/config')

error log

HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Deployment in 
version \"v1\" cannot be handled as a Deployment: json: cannot unmarshal string into Go struct field 
PodSpec.spec.template.spec.nodeSelector of type map[string]string","reason":"BadRequest","code":400}

JackyYangPassion avatar Jan 10 '24 12:01 JackyYangPassion

Hi, @JackyYangPassion. Thanks for the report.

Could you please provide the full error log? Thanks.

dashanji avatar Jan 10 '24 12:01 dashanji

Thanks for your reply @dashanji the full log from the jupyter notbook

2024-01-10 20:50:11,794 [INFO][cluster:235]: Launching coordinator...
2024-01-10 20:50:12,802 [INFO][cluster:414]: Stopping coordinator
2024-01-10 20:50:12,825 [INFO][cluster:434]: Stopped coordinator
2024-01-10 20:50:12,825 [INFO][cluster:414]: Stopping coordinator
2024-01-10 20:50:12,826 [INFO][cluster:434]: Stopped coordinator
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────â•Ū
│ in <module>:31                                                                                   │
│                                                                                                  │
│   28 │   │   │   │   │   │   │    k8s_coordinator_pod_node_selector={"graphscope":"1"},          │
│   29 │   │   │   │   │   │   │    k8s_engine_pod_node_selector={"graphscope":"1"},               │
│   30 │   │   │   │   │   │   │    k8s_image_tag="latest",                                        │
│ ❱ 31 │   │   │   │   │   │   │    k8s_client_config='~/.kube/config')                            │
│   32 print('========= Session created. ==========')                                              │
│   33                                                                                             │
│                                                                                                  │
│ /usr/local/python3/lib/python3.7/site-packages/graphscope/client/session.py:563 in __init__      │
│                                                                                                  │
│    560 │   │   atexit.register(self.close)                                                       │
│    561 │   │   # create and connect session                                                      │
│    562 │   │   with CaptureKeyboardInterrupt(self.close):                                        │
│ ❱  563 │   │   │   self._connect()                                                               │
│    564 │   │                                                                                     │
│    565 │   │   self._disconnected: bool = False                                                  │
│    566                                                                                           │
│                                                                                                  │
│ /usr/local/python3/lib/python3.7/site-packages/graphscope/client/session.py:909 in _connect      │
│                                                                                                  │
│    906 │   │                                                                                     │
│    907 │   │   # launching graphscope service                                                    │
│    908 │   │   if self._launcher is not None:                                                    │
│ ❱  909 │   │   │   self._launcher.start()                                                        │
│    910 │   │   │   self._coordinator_endpoint = self._launcher.coordinator_endpoint              │
│    911 │   │                                                                                     │
│    912 │   │   # waiting service ready                                                           │
│                                                                                                  │
│ /usr/local/python3/lib/python3.7/site-packages/graphscope/deploy/kubernetes/cluster.py:389 in    │
│ start                                                                                            │
│                                                                                                  │
│   386 │   │   │   self._create_namespace()                                                       │
│   387 │   │   │   self._create_role_and_binding()                                                │
│   388 │   │   │                                                                                  │
│ ❱ 389 │   │   │   self._create_services()                                                        │
│   390 │   │   │   time.sleep(1)                                                                  │
│   391 │   │   │                                                                                  │
│   392 │   │   │   self._waiting_for_services_ready()                                             │
│                                                                                                  │
│ /usr/local/python3/lib/python3.7/site-packages/graphscope/deploy/kubernetes/cluster.py:301 in    │
│ _create_services                                                                                 │
│                                                                                                  │
│   298 │   │   return args                                                                        │
│   299 │                                                                                          │
│   300 │   def _create_services(self):                                                            │
│ ❱ 301 │   │   self._create_coordinator()                                                         │
│   302 │                                                                                          │
│   303 │   def _waiting_for_services_ready(self):                                                 │
│   304 │   │   response = self._app_api.read_namespaced_deployment_status(                        │
│                                                                                                  │
│ /usr/local/python3/lib/python3.7/site-packages/graphscope/deploy/kubernetes/cluster.py:274 in    │
│ _create_coordinator                                                                              │
│                                                                                                  │
│   271 │   │                                                                                      │
│   272 │   │   deployment = coordinator.get_coordinator_deployment()                              │
│   273 │   │   response = self._app_api.create_namespaced_deployment(                             │
│ ❱ 274 │   │   │   self._namespace, deployment                                                    │
│   275 │   │   )                                                                                  │
│   276 │   │   targets.append(response)                                                           │
│   277                                                                                            │
│                                                                                                  │
│ /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/api/apps_v1_api.py:353 in       │
│ create_namespaced_deployment                                                                     │
│                                                                                                  │
│    350 │   │   │   │    returns the request thread.                                              │
│    351 │   │   """                                                                               │
│    352 │   │   kwargs['_return_http_data_only'] = True                                           │
│ ❱  353 │   │   return self.create_namespaced_deployment_with_http_info(namespace, body, **kwarg  │
│    354 │                                                                                         │
│    355 │   def create_namespaced_deployment_with_http_info(self, namespace, body, **kwargs):  #  │
│    356 │   │   """create_namespaced_deployment  # noqa: E501                                     │
│                                                                                                  │
│ /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/api/apps_v1_api.py:466 in       │
│ create_namespaced_deployment_with_http_info                                                      │
│                                                                                                  │
│    463 │   │   │   _return_http_data_only=local_var_params.get('_return_http_data_only'),  # no  │
│    464 │   │   │   _preload_content=local_var_params.get('_preload_content', True),              │
│    465 │   │   │   _request_timeout=local_var_params.get('_request_timeout'),                    │
│ ❱  466 │   │   │   collection_formats=collection_formats)                                        │
│    467 │                                                                                         │
│    468 │   def create_namespaced_replica_set(self, namespace, body, **kwargs):  # noqa: E501     │
│    469 │   │   """create_namespaced_replica_set  # noqa: E501                                    │
│                                                                                                  │
│ /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/api_client.py:353 in call_api   │
│                                                                                                  │
│   350 │   │   │   │   │   │   │   │      body, post_params, files,                               │
│   351 │   │   │   │   │   │   │   │      response_type, auth_settings,                           │
│   352 │   │   │   │   │   │   │   │      _return_http_data_only, collection_formats,             │
│ ❱ 353 │   │   │   │   │   │   │   │      _preload_content, _request_timeout, _host)              │
│   354 │   │                                                                                      │
│   355 │   │   return self.pool.apply_async(self.__call_api, (resource_path,                      │
│   356 │   │   │   │   │   │   │   │   │   │   │   │   │      method, path_params,                │
│                                                                                                  │
│ /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/api_client.py:184 in __call_api │
│                                                                                                  │
│   181 │   │   │   method, url, query_params=query_params, headers=header_params,                 │
│   182 │   │   │   post_params=post_params, body=body,                                            │
│   183 │   │   │   _preload_content=_preload_content,                                             │
│ ❱ 184 │   │   │   _request_timeout=_request_timeout)                                             │
│   185 │   │                                                                                      │
│   186 │   │   self.last_response = response_data                                                 │
│   187                                                                                            │
│                                                                                                  │
│ /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/api_client.py:397 in request    │
│                                                                                                  │
│   394 │   │   │   │   │   │   │   │   │   │    post_params=post_params,                          │
│   395 │   │   │   │   │   │   │   │   │   │    _preload_content=_preload_content,                │
│   396 │   │   │   │   │   │   │   │   │   │    _request_timeout=_request_timeout,                │
│ ❱ 397 │   │   │   │   │   │   │   │   │   │    body=body)                                        │
│   398 │   │   elif method == "PUT":                                                              │
│   399 │   │   │   return self.rest_client.PUT(url,                                               │
│   400 │   │   │   │   │   │   │   │   │   │   query_params=query_params,                         │
│                                                                                                  │
│ /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/rest.py:285 in POST             │
│                                                                                                  │
│   282 │   │   │   │   │   │   │   post_params=post_params,                                       │
│   283 │   │   │   │   │   │   │   _preload_content=_preload_content,                             │
│   284 │   │   │   │   │   │   │   _request_timeout=_request_timeout,                             │
│ ❱ 285 │   │   │   │   │   │   │   body=body)                                                     │
│   286 │                                                                                          │
│   287 │   def PUT(self, url, headers=None, query_params=None, post_params=None,                  │
│   288 │   │   │   body=None, _preload_content=True, _request_timeout=None):                      │
│                                                                                                  │
│ /usr/local/python3/lib/python3.7/site-packages/kubernetes/client/rest.py:238 in request          │
│                                                                                                  │
│   235 │   │   │   logger.debug("response body: %s", r.data)                                      │
│   236 │   │                                                                                      │
│   237 │   │   if not 200 <= r.status <= 299:                                                     │
│ ❱ 238 │   │   │   raise ApiException(http_resp=r)                                                │
│   239 │   │                                                                                      │
│   240 │   │   return r                                                                           │
│   241                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────â•Ŋ
ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': '6966ccb2-e7da-461a-a521-e4864dda18c4', 'Cache-Control': 
'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 
'0c2b55b8-02df-4c93-956c-e04dc793d0cb', 'X-Kubernetes-Pf-Prioritylevel-Uid': 
'0ddb7b8c-60c6-44e5-ac99-c6e7df6626ae', 'Date': 'Wed, 10 Jan 2024 12:50:11 GMT', 'Content-Length': '295'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Deployment in 
version \"v1\" cannot be handled as a Deployment: json: cannot unmarshal string into Go struct field 
PodSpec.spec.template.spec.nodeSelector of type map[string]string","reason":"BadRequest","code":400}

JackyYangPassion avatar Jan 10 '24 13:01 JackyYangPassion

@JackyYangPassion Thanks, we try to reproduce the bug.

dashanji avatar Jan 10 '24 13:01 dashanji

/cc @yecol @sighingnow, this issus/pr has had no activity for a long time, please help to review the status and assign people to work on it.

github-actions[bot] avatar Feb 27 '24 13:02 github-actions[bot]