neptune-client
neptune-client copied to clipboard
BUG: neptune sync is not fault-tolerant
Describe the bug
When errors occur with a single run during neptune sync
, the scipt stops, but it should skip it and print the error.
Reproduction
- write a neptune log from inside a docker container, s.t. there exist permission errors
- try to sync from outside the docker container
Works for other kinds of file corruptions as well.
Expected behavior
When neptune encounters a run it cant sync, it should skip it, continue with the next and at the end list all runs it couldnt sync.
Traceback
cornelius@pssr2:~/PCJax/logs$ neptune sync -p user/Project
Traceback (most recent call last):
File "/users-2/cornelius/.conda/envs/pcjax/bin/neptune", line 8, in <module>
sys.exit(main())
File "/users-2/cornelius/.conda/envs/pcjax/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/users-2/cornelius/.conda/envs/pcjax/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/users-2/cornelius/.conda/envs/pcjax/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/users-2/cornelius/.conda/envs/pcjax/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/users-2/cornelius/.conda/envs/pcjax/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/users-2/cornelius/.conda/envs/pcjax/lib/python3.10/site-packages/neptune/new/cli/commands.py", line 173, in sync
sync_runner.sync_all_containers(path, project_name)
File "/users-2/cornelius/.conda/envs/pcjax/lib/python3.10/site-packages/neptune/new/cli/sync.py", line 242, in sync_all_containers
self.sync_all_offline_containers(base_path, project_name)
File "/users-2/cornelius/.conda/envs/pcjax/lib/python3.10/site-packages/neptune/new/cli/sync.py", line 220, in sync_all_offline_containers
self.sync_offline_containers(base_path, project_name, offline_dirs)
File "/users-2/cornelius/.conda/envs/pcjax/lib/python3.10/site-packages/neptune/new/cli/sync.py", line 213, in sync_offline_containers
registered_containers = self.register_offline_containers(base_path, project, offline_dirs)
File "/users-2/cornelius/.conda/envs/pcjax/lib/python3.10/site-packages/neptune/new/cli/sync.py", line 191, in register_offline_containers
self._move_offline_container(
File "/users-2/cornelius/.conda/envs/pcjax/lib/python3.10/site-packages/neptune/new/cli/sync.py", line 177, in _move_offline_container
(base_path / OFFLINE_DIRECTORY / offline_dir).rename(
File "/users-2/cornelius/.conda/envs/pcjax/lib/python3.10/pathlib.py", line 1234, in rename
self._accessor.rename(self, target)
PermissionError: [Errno 13] Permission denied: '/users-2/cornelius/PCJax/logs/.neptune/offline/run__ba1c7901-881f-4af6-820e-014e0a698319' -> '/users-2/cornelius/PCJax/logs/.neptune/async/run__9877526b-8f3d-4c95-a813-a66b26e926cd/exec-0-offline'
Neptune Version
neptune-client 0.16.16 pypi_0 pypi
Hey @cemde
Could you try updating to the lastest release and let me know if the issue persists?
@Blaizzy still exists
That's odd.
Has it worked in the past?
I never noticed it before, but I also never logged from inside a docker image. The PermissionError
is justified. It should just be excepted properly and then logged. in pseudo python:
objects2sync = [obj1,obj2,....]
failed_objs = []
for obj in tqdm(objects2sync):
try:
sync_object(obj)
except:
failed_objs.append(obj._id, obj_short_id, inspect.traceback())
print("Successful:", objects2sync - failed_objs)
print("Failed:", failed_objs))
Let me check with the team and come back to you
Hey @cemde
I've discussed it with the team and decided to send your issue to our product team as a feature request. They will take it from here and explore how to incorporate it into our future plans.
While I don't have an ETA for this feature, I do want to keep you in the loop. You can stay up-to-date with our product roadmap by checking out our portal at https://portal.neptune.ai/tabs/15-planned.
Thanks for sharing your feedback! Really appreciate it :)