examples
examples copied to clipboard
KF v1.6.0-rc.1 - MNIST E2E on Kubeflow on Vanilla k8s - TypeError: write() argument must be str, not <class 'bytes'>
Hello,
Testing KF v1.6.0-rc.1 Mnist E2E on vanilla K8s, I get an error executing tfjob_client.wait_for_job.
from kubeflow.tfjob import TFJobClient tfjob_client = TFJobClient() tfjob_client.wait_for_job(train_name, namespace=namespace, watch=True)
Using KF v.1.5, it was successful but using v1.6.0-rc.1 an exception is raised.
TypeError: write() argument must be str, not <class 'bytes'>
Replacing previous iostream file version, the execution is successful. /opt/conda/lib/python3.8/site-packages/ipykernel/iostream.py
Comparing the previous version with the latest version
The following code seems to cause the issue.
if not isinstance(string, str): raise TypeError(f"write() argument must be str, not {type(string)}")
I don't know how to fix the issue but my current workaround is to replace the iostream.py file with the previous version.
Thank you
@kubeflow/wg-training-leads any input on this ?
There are no changes in SDK with respect to dependencies. btw, ipykernel is not a dependency of training-sdk . https://pypi.org/project/kubeflow-training/
And this is tested in CI as well https://github.com/kubeflow/training-operator/blob/master/.github/workflows/integration-tests.yaml#L38
@johnugeorge
There are no changes in SDK with respect to dependencies. btw, ipykernel is not a dependency of training-sdk . https://pypi.org/project/kubeflow-training/
And this is tested in CI as well https://github.com/kubeflow/training-operator/blob/master/.github/workflows/integration-tests.yaml#L38
Reproduced the same situation JupytherLab Version 3.4.3 using jupyter-tensorflow-full:v1.6.0-rc.1
Executing
from kubeflow.tfjob import TFJobClient tfjob_client = TFJobClient() tfjob_client.wait_for_job(train_name, namespace=namespace, watch=True)
Get the error
TypeError Traceback (most recent call last)
Input In [18], in <cell line: 3>()
1 from kubeflow.tfjob import TFJobClient
2 tfjob_client = TFJobClient()
----> 3 tfjob_client.wait_for_job(train_name, namespace=namespace, watch=True)
File ~/git_tf-operator/sdk/python/kubeflow/tfjob/api/tf_job_client.py:220, in TFJobClient.wait_for_job(self, name, namespace, timeout_seconds, polling_interval, watch, status_callback)
217 namespace = utils.get_default_target_namespace()
219 if watch:
--> 220 tfjob_watch(
221 name=name,
222 namespace=namespace,
223 timeout_seconds=timeout_seconds)
224 else:
225 return self.wait_for_condition(
226 name,
227 ["Succeeded", "Failed"],
(...)
230 polling_interval=polling_interval,
231 status_callback=status_callback)
File ~/.local/lib/python3.8/site-packages/retrying.py:49, in retry.<locals>.wrap.<locals>.wrapped_f(*args, **kw)
47 @six.wraps(f)
48 def wrapped_f(*args, **kw):
---> 49 return Retrying(*dargs, **dkw).call(f, *args, **kw)
File ~/.local/lib/python3.8/site-packages/retrying.py:212, in Retrying.call(self, fn, *args, **kwargs)
209 if self.stop(attempt_number, delay_since_first_attempt_ms):
210 if not self._wrap_exception and attempt.has_exception:
211 # get() on an attempt with an exception should cause it to be raised, but raise just in case
--> 212 raise attempt.get()
213 else:
214 raise RetryError(attempt)
File ~/.local/lib/python3.8/site-packages/retrying.py:247, in Attempt.get(self, wrap_exception)
245 raise RetryError(self)
246 else:
--> 247 six.reraise(self.value[0], self.value[1], self.value[2])
248 else:
249 return self.value
File /opt/conda/lib/python3.8/site-packages/six.py:703, in reraise(tp, value, tb)
701 if value.__traceback__ is not tb:
702 raise value.with_traceback(tb)
--> 703 raise value
704 finally:
705 value = None
File ~/.local/lib/python3.8/site-packages/retrying.py:200, in Retrying.call(self, fn, *args, **kwargs)
198 while True:
199 try:
--> 200 attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
201 except:
202 tb = sys.exc_info()
File ~/git_tf-operator/sdk/python/kubeflow/tfjob/api/tf_job_watch.py:55, in watch(name, namespace, timeout_seconds)
52 status = last_condition.get('type', '')
53 update_time = last_condition.get('lastTransitionTime', '')
---> 55 tbl(tfjob_name, status, update_time)
57 if name == tfjob_name:
58 if status == 'Succeeded' or status == 'Failed':
File /opt/conda/lib/python3.8/site-packages/table_logger/table_logger.py:204, in TableLogger.__call__(self, *args)
200 raise ValueError('Expected number of columns is {}. Got {}.'.format(
201 len(self.formatters), len(row_cells)))
203 line = self.format_row(*row_cells)
--> 204 self.print_line(line)
File /opt/conda/lib/python3.8/site-packages/table_logger/table_logger.py:308, in TableLogger.print_line(self, text)
307 def print_line(self, text):
--> 308 self.file.write(text.encode(self.encoding))
309 self.file.write(b'\n')
310 self.file.flush()
File /opt/conda/lib/python3.8/site-packages/ipykernel/iostream.py:529, in OutStream.write(self, string)
519 """Write to current stream after encoding if necessary
520
521 Returns
(...)
525
526 """
528 if not isinstance(string, str):
--> 529 raise TypeError(f"write() argument must be str, not {type(string)}")
531 if self.echo is not None:
532 try:
TypeError: write() argument must be str, not <class 'bytes'>
Replacing iostream.py file with the previous version, get proper result
from kubeflow.tfjob import TFJobClient
tfjob_client = TFJobClient()
tfjob_client.wait_for_job(train_name, namespace=namespace, watch=True)`
mnist-train-05e7 Created 2022-08-22T14:52:59Z
mnist-train-05e7 Running 2022-08-22T14:53:08Z
mnist-train-05e7 Running 2022-08-22T14:53:08Z
mnist-train-05e7 Succeeded 2022-08-22T14:53:30Z
Note this error isn't blocking, the example is served and deployed with success.
Hi Julioo, I am newbi to kubeflow. feel a little comfuse with this mnist E2E on kubeflow on Vanilla k8s example. Pls help First, Should we run jupyter-tensorflow-full:v1.6.0-rc.1 image on k8s which install kubeflow? or we can run jupyter-tensorflow-full:v1.6.0-rc.1 anywhere in docker runtime? Second, I found the notebook first import kubenete client, but I dont found anywhere pip install kubenete? and don't we need to obvious config how to connect to our k8s cluster?