torchx
torchx copied to clipboard
moved image diff logging to their respective workspace
#809 was flawed and didn't change the programming logic. I tried to think of any way for build_workspace_and_update_role() to return whether a change in the built image is detected. There isn't one, so the logging message needs to be printed at the workspace level:
- tmpDirworkspace -> always creates a directory using workspace
- dirworkspace -> has no logic to be able to determine if the directory is changed
- dockerworkspace -> used docker_client to determine whether a new image is built
The alternative would be to not test for changes to the image and always print the same thing, or modify build_workspace_and_update_role() abstract function and its implementations to return a BOOL of whether a change happened.
In docker workspace we add a test whether a cfg.image_repo is provided because local_docker does not define it/ use it (it is accepted as an argument because its a runopts for dockerworkspace)
After reviewing, the easiest way to not affect other implementations would just be to print an extra message within the docker_workspace. You would then end up with contradictory message when reusing an image however. E.g
torchx 2024-03-28 20:38:29 INFO Building workspace docker image (this may take a while)...
torchx 2024-03-28 20:38:29 INFO Reused image with tag `sha256:4aad7a6e24d58bb66404dead0c5b2f91e97acb76875c69e238dbabb9b8cd3be0`
torchx 2024-03-28 20:38:29 INFO Built new image `sha256:4aad7a6e24d58bb66404dead0c5b2f91e97acb76875c69e238dbabb9b8cd3be0` based on original image `nvcr.io/nvidia/pytorch:23.10-py3` and changes in workspace `#######`
Personally, i no longer require knowing through the console whether or not my image tag was updated or reused. Implementing the log message properly seems to be more likely to cause harm then good. Going forward, i can either:
- Modify the PR to just be a change within docker workspace that prints an extra message within it.
- I can close this PR, but it would be good to rollback #809 in that case since it added complexity without actually fixing the problem.