Failed to launch Katib experiment - 404 page not found
/kind bug
What steps did you take and what happened: Setup kubeflow and installed katib manually as mentioned in https://github.com/kubeflow/katib/issues/1415 Start a katib experiment with kale out of a jupyter notebook. The experiment was created and the pipeline was also uploaded but not launched.
Type: RPC
Method: katib.create_katib_experiment()
Code: 6 (UnhandledError)
Transaction ID: ylpewg72bh
Message: Failed to launch Katib experiment
Details: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'text/plain; charset=utf-8', 'X-Content-Type-Options': 'nosniff', 'Date': 'Mon, 08 Mar 2021 12:14:43 GMT', 'Content-Length': '19'})
HTTP response body: 404 page not found
kale.log:
2021-03-08 12:14:42 run:83 [[DEBUG]] [TID=axkqxleth9] [] Decoding ctx of RPC function 'kfp.create_experiment'
2021-03-08 12:14:42 run:95 [[DEBUG]] [TID=axkqxleth9] [/home/jovyan/medium/minikf/titanic-katib.ipynb] Decoding kwargs of RPC function 'kfp.create_experiment'
2021-03-08 12:14:42 run:104 [[DEBUG]] [TID=axkqxleth9] [/home/jovyan/medium/minikf/titanic-katib.ipynb] Importing RPC function 'kfp.create_experiment'
2021-03-08 12:14:42 run:114 [[INFO]] [TID=axkqxleth9] [/home/jovyan/medium/minikf/titanic-katib.ipynb] Executing RPC function 'create_experiment(experiment_name=test-v1ef9)'
2021-03-08 12:14:43 _client:352 [[INFO]] Creating experiment test-v1ef9.
2021-03-08 12:14:43 run:83 [[DEBUG]] [TID=ylpewg72bh] [] Decoding ctx of RPC function 'katib.create_katib_experiment'
2021-03-08 12:14:43 run:95 [[DEBUG]] [TID=ylpewg72bh] [/home/jovyan/medium/minikf/titanic-katib.ipynb] Decoding kwargs of RPC function 'katib.create_katib_experiment'
2021-03-08 12:14:43 run:104 [[DEBUG]] [TID=ylpewg72bh] [/home/jovyan/medium/minikf/titanic-katib.ipynb] Importing RPC function 'katib.create_katib_experiment'
2021-03-08 12:14:43 run:114 [[INFO]] [TID=ylpewg72bh] [/home/jovyan/medium/minikf/titanic-katib.ipynb] Executing RPC function 'create_katib_experiment(pipeline_id=832dfc28-61be-4fb5-af12-7877778b26ef, pipeline_metadata={'autosnapshot': True, 'docker_image': 'jupyter-kale:latest', 'experiment': {'id': '7f611f1b-bf8e-4709-80ef-c55d6644931c', 'name': 'test'}, 'experiment_name': 'test-v1ef9', 'katib_metadata': {'parameters': [{'feasibleSpace': {'max': '2000', 'min': '100', 'step': '100'}, 'name': 'N_ESTIMATORS', 'parameterType': 'int'}, {'feasibleSpace': {'list': ['10', '20', '30', '40', '50', '100']}, 'name': 'MAX_DEPTH', 'parameterType': 'categorical'}, {'feasibleSpace': {'max': '4', 'min': '1', 'step': '1'}, 'name': 'MIN_SAMPLES_LEAF', 'parameterType': 'int'}, {'feasibleSpace': {'list': ['2', '5', '10']}, 'name': 'MIN_SAMPLES_SPLIT', 'parameterType': 'categorical'}], 'objective': {'additionalMetricNames': [], 'goal': 0.85, 'objectiveMetricName': 'random-forest-accuracy', 'type': 'maximize'}, 'algorithm': {'algorithmName': 'random', 'algorithmSettings': [{'name': 'random_state', 'value': '10'}, {'name': 'acq_optimizer', 'value': 'auto'}, {'name': 'acq_func', 'value': 'gp_hedge'}, {'name': 'base_estimator', 'value': 'GP'}]}, 'maxTrialCount': 10, 'maxFailedTrialCount': 3, 'parallelTrialCount': 5}, 'katib_run': True, 'pipeline_description': 'Fine tune a RF classifier on the Titanic dataset', 'pipeline_name': 'titanic-hp-tuning', 'snapshot_volumes': True, 'steps_defaults': [], 'volumes': []}, output_path=/home/jovyan/medium/minikf)'
2021-03-08 12:14:43 katib:181 [[INFO]] [TID=ylpewg72bh] [/home/jovyan/medium/minikf/titanic-katib.ipynb] Saving Katib experiment definition at /home/jovyan/medium/minikf/test-v1ef9.katib.yaml
2021-03-08 12:14:43 katib:91 [[DEBUG]] [TID=ylpewg72bh] [/home/jovyan/medium/minikf/titanic-katib.ipynb] Launching Katib Experiment 'test-v1ef9'...
2021-03-08 12:14:43 katib:97 [[ERROR]] [TID=ylpewg72bh] [/home/jovyan/medium/minikf/titanic-katib.ipynb] Failed to launch Katib experiment
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/kale/rpc/katib.py", line 95, in _launch_katib_experiment
katib_experiment)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/custom_objects_api.py", line 178, in create_namespaced_custom_object
(data) = self.create_namespaced_custom_object_with_http_info(group, version, namespace, plural, body, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/custom_objects_api.py", line 277, in create_namespaced_custom_object_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 334, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 168, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 377, in request
body=body)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 265, in POST
body=body)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 221, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'text/plain; charset=utf-8', 'X-Content-Type-Options': 'nosniff', 'Date': 'Mon, 08 Mar 2021 12:14:43 GMT', 'Content-Length': '19'})
HTTP response body: 404 page not found
What did you expect to happen: The katib experiment is launched.
Anything else you would like to add: Can I figure out which DNS address is being requested?
Environment:
- Kubeflow version: kfctl v1.2.0-0-gbc038f9
- OnPremise Kubernetes Cluster
- Kubernetes version: v1.17.0
- OS: Ubuntu 18.04.5 LTS
@anneum Once you create Katib SDK client you can pass the kubeconfig path: https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py#L33-L43.
@andreyvelich thank you for your response. I don't fully understand what you mean by that.
The notebook server was created from the kubeflow Notebook Servers Section and is therefore already inside the cluster.

I click "compile and run katib job" and there is no option where I can pass something like the kubeconfig path.
Got it. This issue might refer to Kale itself. /cc @StefanoFioravanzo @yanniszark
@andreyvelich thank for the ping. Kale currently doesn't support providing a custom Kubeconfig. Can you make sure the notebook Pod does have a proper kubeconfig and you can query for experiments with kubectl?
Thanks @StefanoFioravanzo.
@anneum Please try to execute kubectl command from your notebook.
@StefanoFioravanzo I am very surprised about this but yes, I see the pods in my namespace.
tf-docker ~ > kubectl get pods
NAME READY STATUS RESTARTS AGE
jupyter-kale-0 2/2 Running 4 15d
ml-pipeline-ui-artifact-8669b444d8-mq4wd 2/2 Running 4 15d
ml-pipeline-visualizationserver-744ffd6cdf-x57tb 2/2 Running 2 15d
test-0 2/2 Running 2 4d21h
The question about which URL is called is due to the fact that we have a company proxy and I would like to exclude that it is because of that.
@anneum Did you also try to run KFP pipelines? I'd like to understand if this issue is confined to creating Katib experiments or if it is an issue on the Kale side not being able to contact the K8s API Server.
@StefanoFioravanzo I can create and run standard pipelines (without a katib job) out of my notebook server.
tf-docker ~ > kfp pipeline list
+--------------------------------------+-------------------------------------------------+---------------------------+
| Pipeline ID | Name | Uploaded at |
+======================================+=================================================+===========================+
| c5fda645-e075-4f09-964f-66593d1ce87e | pipeline-p147r | 2021-03-03T13:13:43+00:00 |
+--------------------------------------+-------------------------------------------------+---------------------------+
same probelm as here , get 404 error when i try to crete katib_experiment by SDK。do u fix your problem ?
@Ulov888 How did you install Katib?
I am getting a 404 error when I try to create katib_experiment by SDK using the Notebook servers.
I'm trying to implement this example.
Versions:
Python : 3.6.8
Kubeflow : kfctl_k8s_istio.v1.0.1
kubeflow-katib 0.10.1
kubernetes 10.0.1
@Siddarth-Pattnaik I think you should update your Kubeflow version to 1.1 at least to use Katib SDK. In Kubeflow 1.0.1 we had Katib v1alpha3 version which SDK doesn't support.
@StefanoFioravanzo as with @anneum, I can open a terminal from within the notebook that is capable of running both kubectl get pods as well as kfp pipeline list without issue.
The issue remains as follows:
2021-09-17 03:14:28 run:120 [ERROR] [TID=jhypgoxrcf] [/home/jovyan/Untitled.ipynb] RPC function 'create_katib_experiment' raised an RPCError
Traceback (most recent call last):
File "/kale/backend/kale/rpc/katib.py", line 107, in _launch_katib_experiment
katib_experiment)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api/custom_objects_api.py", line 183, in create_namespaced_custom_object
(data) = self.create_namespaced_custom_object_with_http_info(group, version, namespace, plural, body, **kwargs) # noqa: E501
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api/custom_objects_api.py", line 289, in create_namespaced_custom_object_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 345, in call_api
_preload_content, _request_timeout)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 176, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 388, in request
body=body)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 278, in POST
body=body)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 231, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'text/plain; charset=utf-8', 'X-Content-Type-Options': 'nosniff', 'Date': 'Fri, 17 Sep 2021 03:14:28 GMT', 'Content-Length': '19'})
HTTP response body: 404 page not found
When I attempt to create a job, the pipeline is uploaded without issue, as well as the KFP experiment (both of which attempt to access the k8s api), what breaks is the Katib experiment itself.

This issue has persisted across multiple kale notebook versions, my KF version is 1.3
@andreyvelich I can use the SDK normally (following this guide ) within the notebook terminal

Please advise?
I see the same issue on Kubeflow 1.3. Was anyone able to fix this?
2021-12-03 23:15:24 run:120 [ERROR] [TID=kl824ee10y] [/home/jovyan/kale/examples/dog-breed-classification/dog-breed-katib.ipynb] RPC function 'create_katib_experiment' raised an RPCError Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/kale/rpc/katib.py", line 104, in _launch_katib_experiment katib_experiment) File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/custom_objects_api.py", line 178, in create_namespaced_custom_object (data) = self.create_namespaced_custom_object_with_http_info(group, version, namespace, plural, body, **kwargs) File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/custom_objects_api.py", line 277, in create_namespaced_custom_object_with_http_info collection_formats=collection_formats) File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 334, in call_api _return_http_data_only, collection_formats, _preload_content, _request_timeout) File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 168, in __call_api _request_timeout=_request_timeout) File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 377, in request body=body) File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 266, in POST body=body) File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 222, in request raise ApiException(http_resp=r) kubernetes.client.rest.ApiException: (404) Reason: Not Found HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'text/plain; charset=utf-8', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'ab77065d-ed04-4fa5-bd00-a66298a0e074', 'X-Kubernetes-Pf-Prioritylevel-Uid': '55506d34-196f-4b07-b92d-80a43a2898e6', 'Date': 'Fri, 03 Dec 2021 23:15:24 GMT', 'Content-Length': '19'}) HTTP response body: 404 page not found
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.