fairing icon indicating copy to clipboard operation
fairing copied to clipboard

Running Fairing on GKE takes secret user (IAM user) credential instead of basic user credential

Open Akashdesarda opened this issue 6 years ago • 5 comments
trafficstars

/kind bug

What steps did you take and what happened: I tried to to run examples/prediction/xgboost-high-level-apis.ipynb on a GKE cluster (which does not have secret user or Cloud IAM access). After starting fairing job it is trying to get credential as a secret user (Cloud IAM), but mt cluster dosen't have one. I have deployed Kubeflow using Basic authentication (i.e. using username and password)

What did you expect to happen: I expected it should have not tried to get secret credential but basic credential. The problem arrives when it tries to deploy the completed job on GKE as it does not correct credential.

Anything else you would like to add: I think the problem is regarding obtaining correct credential. The example jupyter notebook doesnt explain from where it is taking credential, if it had did maybe be I would have provided with correct credential.

Environment:

  • Fairing version: 0.5.3

  • Kubeflow version: 0.5.0

  • Kubernetes version: Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.8-gke.10", GitCommit:"f53039cc1e5295eed20969a4f10fb6ad99461e37", GitTreeState:"clean", BuildDate:"2019-06-19T20:48:40Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"}

  • OS: Ubuntu 18.04 bionic

Akashdesarda avatar Jul 30 '19 05:07 Akashdesarda

Issue-Label Bot is automatically applying the label kind/bug to this issue, with a confidence of 0.98. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

issue-label-bot[bot] avatar Jul 30 '19 05:07 issue-label-bot[bot]

@Akashdesarda so you have a owned secret voulme which include user name and password, and you just want to mount your secret volume, instread of mount the user-gcp-sa, right?

Here I just logged a ticket #328 to enhance the pod_spec_mutators mechanism, I also think we should let user can sepcify the secret.

jinchihe avatar Jul 30 '19 06:07 jinchihe

@Akashdesarda so you have a owned secret voulme which include user name and password, and you just want to mount your secret volume, instread of mount the user-gcp-sa, right?

Here I just logged a ticket #328 to enhance the pod_spec_mutators mechanism, I also think we should let user can sepcify the secret.

I think it may solve the problem

Akashdesarda avatar Jul 31 '19 10:07 Akashdesarda

I believe I'm also running into this issue on the same xgboost-high-level-apis.ipynb notebook.

My environment is similar to @Akashdesarda -

  • fairing: version 0.5.3 - I have tried using different versions of fairing installed via pip3 install fairing as well as pip3 install git+git://github.com/kubeflow/fairing.git@master
  • kubeflow: version 0.6 - installed on GKE with basic authentication (instead of Cloud IAP), per the Deploy Kubeflow Using CLI instructions

(Maybe unrelated but I did have to also follow the steps listed in kubeflow/issues/3571 to mount the GCP credentials at /secrets/gcp-service-account-credentials/user-gcp-sa.json and make them visible to my Kubeflow notebooks.)

Problematic cell in the notebook

from fairing import TrainJob

train_job = TrainJob(HousingServe, 
                     input_files=['ames_dataset/train.csv', "requirements.txt"],
                     docker_registry=DOCKER_REGISTRY,
                     backend=BackendClass(build_context_source=BuildContext))

train_job.submit()

Error message

Using default base docker image: registry.hub.docker.com/library/python:3.6.7
---------------------------------------------------------------------------
ApiException                              Traceback (most recent call last)
<ipython-input-16-25d8ef90c104> in <module>
      4                      input_files=['ames_dataset/train.csv', "requirements.txt"],
      5                      docker_registry=DOCKER_REGISTRY,
----> 6                      backend=BackendClass(build_context_source=BuildContext))
      7 
      8 train_job.submit()

/opt/conda/lib/python3.6/site-packages/fairing/ml_tasks/tasks.py in __init__(self, entry_point, base_docker_image, docker_registry, input_files, backend, pod_spec_mutators)
     72                  input_files=None, backend=None, pod_spec_mutators=None):
     73         super().__init__(entry_point, base_docker_image, docker_registry,
---> 74                          input_files, backend, pod_spec_mutators)
     75 
     76     def submit(self):

/opt/conda/lib/python3.6/site-packages/fairing/ml_tasks/tasks.py in __init__(self, entry_point, base_docker_image, docker_registry, input_files, backend, pod_spec_mutators)
     58                                                  base_image=self.base_docker_image,
     59                                                  registry=self.docker_registry,
---> 60                                                  needs_deps_installation=needs_deps_installation)
     61         logger.warning("Using builder: {}".format(type(self.builder)))
     62 

/opt/conda/lib/python3.6/site-packages/fairing/backends/backends.py in get_builder(self, preprocessor, base_image, registry, needs_deps_installation, pod_spec_mutators)
    115         elif (fairing.utils.is_running_in_k8s() or
    116               not ml_tasks_utils.is_docker_daemon_exists()) and \
--> 117                 KubeManager().secret_exists(constants.GCP_CREDS_SECRET_NAME, self._namespace):
    118             return ClusterBuilder(preprocessor=preprocessor,
    119                                   base_image=base_image,

/opt/conda/lib/python3.6/site-packages/fairing/kubernetes/manager.py in secret_exists(self, name, namespace)
     94 
     95     def secret_exists(self, name, namespace):
---> 96         secrets = client.CoreV1Api().list_namespaced_secret(namespace)
     97         secret_names = [secret.metadata.name for secret in secrets.items]
     98         return name in secret_names

/opt/conda/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py in list_namespaced_secret(self, namespace, **kwargs)
  12995             return self.list_namespaced_secret_with_http_info(namespace, **kwargs)
  12996         else:
> 12997             (data) = self.list_namespaced_secret_with_http_info(namespace, **kwargs)
  12998             return data
  12999 

/opt/conda/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py in list_namespaced_secret_with_http_info(self, namespace, **kwargs)
  13098                                         _preload_content=params.get('_preload_content', True),
  13099                                         _request_timeout=params.get('_request_timeout'),
> 13100                                         collection_formats=collection_formats)
  13101 
  13102     def list_namespaced_service(self, namespace, **kwargs):

/opt/conda/lib/python3.6/site-packages/kubernetes/client/api_client.py in call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, async_req, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    332                                    body, post_params, files,
    333                                    response_type, auth_settings,
--> 334                                    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    335         else:
    336             thread = self.pool.apply_async(self.__call_api, (resource_path, method,

/opt/conda/lib/python3.6/site-packages/kubernetes/client/api_client.py in __call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    166                                      post_params=post_params, body=body,
    167                                      _preload_content=_preload_content,
--> 168                                      _request_timeout=_request_timeout)
    169 
    170         self.last_response = response_data

/opt/conda/lib/python3.6/site-packages/kubernetes/client/api_client.py in request(self, method, url, query_params, headers, post_params, body, _preload_content, _request_timeout)
    353                                         _preload_content=_preload_content,
    354                                         _request_timeout=_request_timeout,
--> 355                                         headers=headers)
    356         elif method == "HEAD":
    357             return self.rest_client.HEAD(url,

/opt/conda/lib/python3.6/site-packages/kubernetes/client/rest.py in GET(self, url, headers, query_params, _preload_content, _request_timeout)
    229                             _preload_content=_preload_content,
    230                             _request_timeout=_request_timeout,
--> 231                             query_params=query_params)
    232 
    233     def HEAD(self, url, headers=None, query_params=None, _preload_content=True, _request_timeout=None):

/opt/conda/lib/python3.6/site-packages/kubernetes/client/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
    220 
    221         if not 200 <= r.status <= 299:
--> 222             raise ApiException(http_resp=r)
    223 
    224         return r

ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'b61ef1a1-13c3-4132-8a87-ea459c6015c5', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Wed, 04 Sep 2019 17:14:47 GMT', 'Content-Length': '300'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"secrets is forbidden: User \"system:serviceaccount:kubeflow:default-editor\" cannot list resource \"secrets\" in API group \"\" in the namespace \"kubeflow\"","reason":"Forbidden","details":{"kind":"secrets"},"code":403}

dawu76 avatar Sep 04 '19 19:09 dawu76

/priority p2 /area engprod

jtfogarty avatar Jan 15 '20 22:01 jtfogarty