test-infra icon indicating copy to clipboard operation
test-infra copied to clipboard

Nova Linux build job runs as root inside its Docker container

Open huydhn opened this issue 1 year ago • 7 comments

The container in question https://hub.docker.com/r/pytorch/manylinux-builder/tags.

There are reports from ExecuTorch and Torchtune about the problematic way of running Nova Linux build job as root inside the container.

  1. On ExecuTorch, buck2 refuses to run as root https://github.com/pytorch/executorch/actions/runs/8655199802/job/23733768589#step:14:125, which blocks ET from using the workflow. For the context, ET use buck2 to gather the source files before passing them to cmake.
  2. On TorchTune, running the build as root leaves some artifacts like the conda folder on the runner, which could prevent subsequent jobs to clean it up as @clee2000 discovered it https://github.com/pytorch/torchtune/actions/runs/8639310064/job/23685409883?pr=688.

Usually, running thing as root is bad, and we should figure out a way to not do it anymore

cc @seemethere @atalman @kit1980 @clee2000 @dbort @malfet

huydhn avatar Apr 12 '24 01:04 huydhn

cc @atalman for NOVA workflows

@malfet: This sort of runner clean up shoud be done by the runner and not in the workflows, ARC should be able to do this cc @DanilBaibak @jeanschmidt

clee2000 avatar Apr 16 '24 21:04 clee2000

AI: To figure out a POC going forward for Nova workflow.

huydhn avatar Apr 30 '24 21:04 huydhn

cc @atalman

huydhn avatar Apr 30 '24 21:04 huydhn

https://github.com/pytorch/executorch/issues/3502 is caused by the hack I added to executorch to work around this "running as root" issue: buck2 does not like running as root.

dbort avatar May 03 '24 22:05 dbort

This issue has been partially resolved now that we have a hook to chown everything back to the correct runner user.

huydhn avatar Jun 11 '24 21:06 huydhn

The files are now chowned, but does the job still run with root as the user?

dbort avatar Jun 11 '24 23:06 dbort

Checking with @atalman on this, and we have no plan to fix the run by root part atm. But, the files are now chowned to the correct user before and after the job finishes. I close the issue because there is already the buck workaround on ET. However, if you plan to remove that patch eventually, I could keep the issue open for a future fix.

huydhn avatar Jun 12 '24 17:06 huydhn