CI cases test_lightgbm failed sometimes
Step #2: ________________________________ test_lightgbm _________________________________
Step #2: [gw6] linux -- Python 3.7.5 /usr/local/bin/python
Step #2:
Step #2: def test_lightgbm():
Step #2: file_dir = os.path.dirname(__file__)
Step #2: notebook_rel_path = "../../../examples/lightgbm/distributed-training.ipynb"
Step #2: notebook_abs_path = os.path.normpath(
Step #2: os.path.join(file_dir, notebook_rel_path))
Step #2: expected_messages = [
Step #2: "Copying gs://fairing-lightgbm/regression-example/regression.train.weight",
Step #2: "[LightGBM] [Info] Finished initializing network", # dist training setup
Step #2: "[LightGBM] [Info] Iteration:10, valid_1 l2 : 0.2",
Step #2: "[LightGBM] [Info] Finished training",
Step #2: "Prediction mean: 0.5",
Step #2: ", count: 500"
Step #2: ]
Step #2: > run_notebook_test(notebook_abs_path, expected_messages)
Step #2:
Step #2: tests/integration/gcp/test_running_in_notebooks.py:50:
Step #2: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Step #2: tests/integration/helpers.py:18: in run_notebook_test
Step #2: output_path = execute_notebook(notebook_path, parameters=parameters)
Step #2: tests/integration/helpers.py:14: in execute_notebook
Step #2: parameters=parameters)
Step #2: /usr/local/lib/python3.7/site-packages/papermill/execute.py:108: in execute_notebook
Step #2: raise_for_execution_errors(nb, output_path)
Step #2: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Step #2:
Step #2: nb = {'cells': [{'cell_type': 'code', 'metadata': {'inputHidden': True, 'hide_input': True}, 'execution_count': None, 'sour...nd
_time': '2019-11-22T07:14:42.232762', 'duration': 60.111942, 'exception': True}}, 'nbformat': 4, 'nbformat_minor': 2}
Step #2: output_path = '/tmp/tmp6ga1cg6y/out.ipynb'
Step #2:
Step #2: def raise_for_execution_errors(nb, output_path):
...
> raise error
Step #2: E papermill.exceptions.PapermillExecutionError:
Step #2: E ---------------------------------------------------------------------------
Step #2: E Exception encountered at "In [8]":
Step #2: E ---------------------------------------------------------------------------
Step #2: E RuntimeError Traceback (most recent call last)
Step #2: E <ipython-input-8-0e4bf631467b> in <module>
Step #2: E ----> 1 lightgbm.execute(config=predict_params, docker_registry=DOCKER_REGISTRY)
Step #2: E
Step #2: E /usr/local/lib/python3.7/site-packages/kubeflow_fairing-0.7.0.1-py3.7.egg/kubeflow/fairing/frameworks/lightgbm.py in
execute(config, docker_registry, base_image, namespace, stream_log, cores_per_worker, memory_per_worker, pod_spec_mutators)
Step #2: E 315 config['machine_list_file'] = "mlist.txt"
Step #2: E 316 output_map = generate_context_files(
Step #2: E --> 317 config, config_file_name, num_machines)
Step #2: E 318
Step #2: E 319 preprocessor = BasePreProcessor(
Step #2: E
Step #2: E /usr/local/lib/python3.7/site-packages/kubeflow_fairing-0.7.0.1-py3.7.egg/kubeflow/fairing/frameworks/lightgbm.py in
generate_context_files(config, config_file_name, num_machines)
Step #2: E 253 MLIST_FILE_NAME)]
Step #2: E 254 entrypoint_file_name = _generate_entrypoint(
Step #2: E --> 255 copy_files_before, copy_files_after, config_in_docker, init_cmds, copy_patitioned_files)
Step #2: E 256 output_map[entrypoint_file_name] = ENTRYPOINT
Step #2: E 257 output_map[utils.__file__] = os.path.join(
W
Step #2: E --> 317 config, config_file_name, num_machines)
Step #2: E 318
Step #2: E 319 preprocessor = BasePreProcessor(
Step #2: E
Step #2: E /usr/local/lib/python3.7/site-packages/kubeflow_fairing-0.7.0.1-py3.7.egg/kubeflow/fairing/frameworks/lightgbm.py in
generate_context_files(config, config_file_name, num_machines)
Step #2: E 253 MLIST_FILE_NAME)]
Step #2: E 254 entrypoint_file_name = _generate_entrypoint(
Step #2: E --> 255 copy_files_before, copy_files_after, config_in_docker, init_cmds, copy_patitioned_files)
Step #2: E 256 output_map[entrypoint_file_name] = ENTRYPOINT
Step #2: E 257 output_map[utils.__file__] = os.path.join(
Step #2: E
Step #2: E /usr/local/lib/python3.7/site-packages/kubeflow_fairing-0.7.0.1-py3.7.egg/kubeflow/fairing/frameworks/lightgbm.py in
_generate_entrypoint(copy_files_before, copy_files_after, config_file, init_cmds, copy_patitioned_files)
Step #2: E 123
Step #2: E 124 # copying files that are common to all workers
Step #2: E --> 125 buf.extend(_get_commands_for_file_ransfer(copy_files_before))
Step #2: E 126
Step #2: E 127 buf.append("echo 'All files are copied!'")
Step #2: E
Step #2: E /usr/local/lib/python3.7/site-packages/kubeflow_fairing-0.7.0.1-py3.7.egg/kubeflow/fairing/frameworks/lightgbm.py in
_get_commands_for_file_ransfer(files_map)
Step #2: E 92 cmds.append(storage_obj.copy_cmd(k, v))
Step #2: E 93 else:
Step #2: E ---> 94 raise RuntimeError("Remote file {} does't exist".format(k))
Step #2: E 95 return cmds
Step #2: E 96
Step #2: E
Step #2: E RuntimeError: Remote file gs://kubeflow-ci-fairing/lightgbm/example/model_2019_11_22_07_13_48.txt does't exist
Step #2:
Step #2: /usr/local/lib/python3.7/site-packages/papermill/execute.py:192: PapermillExecutionError
Issue-Label Bot is automatically applying the label kind/bug to this issue, with a confidence of 0.99. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!
Links: app homepage, dashboard and code for this bot.
@abhi-g Any idea for the problem? Thanks.
I'll take a look.at this.
On Thu, Jan 2, 2020 at 6:36 PM Jin Chi He [email protected] wrote:
@abhi-g https://github.com/abhi-g Any idea for the problem? Thanks.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubeflow/fairing/issues/425?email_source=notifications&email_token=ACZ2UZULQMKBTGSLPP7JYDTQ32QENA5CNFSM4JQMZK32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIADSGQ#issuecomment-570439962, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ2UZR55SDAYGT6AP3GKFDQ32QENANCNFSM4JQMZK3Q .
/area engprod /priority p2