FoundationPose
FoundationPose copied to clipboard
Improvements to docker
Hello friends, thank you for building such a wonderful project. I noticed a few non-standard uses of Docker and this PR provides some suggested improvements. I don't recommend merging it until there has been discussion and testing by the team.
The current dockerfile does the work of preparing the container for developing the code. None of the commands in the dockerfile touch any of the FoundationPose code. Then, run_container.sh
mounts a bunch of volumes, which in turn gives the container access to the code. Since some other things still need to happen, build_all.sh
needs to be run.
Normally, a docker image is rather complete and all someone needs to run the code is pull the image. That's not the case here. These extra steps require more work to set up, and make it very difficult to simply deploy the FP image to a server.
Luckily, docker has some other approaches that are widely accepted. For development time, when you want to share files with the container, there is Docker Compose. Docker compose makes it easy to set up networks, mount volumes, and just about everything else that run_container.sh
was doing. Plus since the volumes are available to every RUN
command, everything in build_all.sh
can be moved to the dockerfile.
So now instead of run_container.sh
, you can simply use the native docker compose up
command!
For use in servers, docker provides a COPY
command that can place files from the project into the container. So, I added a second dockerfile for that, dockerfile.prod
. If you build that image, it'll include everything from the git repo in the container and will run all the commands from build_all.sh
. Then this image can quickly run anywhere, without requiring the git repo code. You could even create a Github Action that will automatically publish this image whenever an PR is merged.
I'd love to hear people's thoughts on this. I'm not a FoundationPose developer and I realize you have a workflow, but I've been using docker for 10 years, and I thought I could share some of my experience. If you switch to Docker Compose, things will become easier and experiences will be more consistent across developers. Having a 2nd dockerfile may seem weird, but actually a lot of people do it.
If you think this is too much to do all at once, I can keep the old dockerfile, run_container.sh
, build_all.sh
and present this new way as an alternative. If and once it's determined Compose is easier, then you can remove those files later.
Also, it could be possible to make a much lightweight container, which is safer and easier to deploy. There are quite a bit of build tools (g++, gcc, build-essential, cmake, etc) that could be removed from the final container if we took advantage of Docker's Multi-Stage Builds. This allows you to have multiple FROM
statements in your dockerfile. You can use some of them to build code, then you simply copy the built cover over to the final container. I could certainly help with this if you're interested.
Thanks, I hope you like this. Have a great day!
Hi @StevePotter thanks for your great suggestion, this seems very useful! I still need to find a time to test this myself as I'm swamped with other projects now. To be backward compatible, would you mind creating a separate docker/
folder (e.g. docker_compose
) and put the new stuff there? Right now I'd prefer to keep the old one as it is, but later if many folks have verified this, I'd be happy to replace that.
Okay great, I will do that
I got a little sidetracked, but will devote some time to this next week. I also plan the following improvements:
- Use requirements.txt or conda environment.yml to declare packages
- Include weights in the docker image
- Use multi-stage build so the runtime image uses a base image like
nvidia/cudagl:11.3.0-runtime-ubuntu20.04
. The current image is about 20gb and when I tried it out, it cut it down to about 10gb - Supply an argument to toggle cuda version. I tested on 11.8 and 12.1, and those work. Would be nice for users to have a choice
Hi! I was trying this method, and I ran into an issue where the line cd /foundationpose/mycpp/ failed because it couldn't find the directory. I was wondering if this could because I placed the docker-compose file in a different location than intended; I currently place it as a subdirectory inside of main.
@EquilibriaW you are right. somehow I messed it up. I'll fix
Hey @StevePotter, thank you for the amazing work of condensing everything into one docker-compose setup and thus saving us all lots of time!
I've tried to run your Dockerfile.prod
inside WSL2 with Ubuntu 20.04 and using the 4090 fix, e.g. using FROM nvidia/cuda:12.1.0-devel-ubuntu20.04
instead of FROM nvidia/cuda:11.8.0-devel-ubuntu20.04
and also changing to C++17 inside /bundlesdf/mycuda/setup.py
according to issue #27. Everything is working out fine, til this line (the same error did also appear when using the default approach, so it is very likely a configuration error on my side (Cuda does not get detected correctly), rather than an error in your docker compose file):
=> ERROR [foundationpose 15/15] RUN cd /foundationpose/bundlesdf/mycuda && rm -rf build *egg* && 5.1s
------
> [foundationpose 15/15] RUN cd /foundationpose/bundlesdf/mycuda && rm -rf build *egg* && pip install -e .:
0.866 Obtaining file:///foundationpose/bundlesdf/mycuda
0.867 Preparing metadata (setup.py): started
2.359 Preparing metadata (setup.py): finished with status 'done'
3.434 Installing collected packages: common
3.435 Running setup.py develop for common
4.969 error: subprocess-exited-with-error
4.969
4.969 × python setup.py develop did not run successfully.
4.969 │ exit code: 1
4.969 ╰─> [94 lines of output]
4.969 running develop
4.969 running egg_info
4.969 creating common.egg-info
4.969 writing common.egg-info/PKG-INFO
4.969 writing dependency_links to common.egg-info/dependency_links.txt
4.969 writing top-level names to common.egg-info/top_level.txt
4.969 writing manifest file 'common.egg-info/SOURCES.txt'
4.969 reading manifest file 'common.egg-info/SOURCES.txt'
4.969 writing manifest file 'common.egg-info/SOURCES.txt'
4.969 running build_ext
4.969 building 'common' extension
4.969 creating /foundationpose/bundlesdf/mycuda/build
4.969 creating /foundationpose/bundlesdf/mycuda/build/temp.linux-x86_64-cpython-38
4.969 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
4.969 /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py:266: UserWarning: Unknown distribution option: 'extra_cflags'
4.969 warnings.warn(msg)
4.969 /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py:266: UserWarning: Unknown distribution option: 'extra_cuda_cflags'
4.969 warnings.warn(msg)
4.969 /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
4.969 !!
4.969
4.969 ********************************************************************************
4.969 Please avoid running ``setup.py`` and ``easy_install``.
4.969 Instead, use pypa/build, pypa/installer or other
4.969 standards-based tools.
4.969
4.969 See https://github.com/pypa/setuptools/issues/917 for details.
4.969 ********************************************************************************
4.969
4.969 !!
4.969 easy_install.initialize_options(self)
4.969 /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
4.969 !!
4.969
4.969 ********************************************************************************
4.969 Please avoid running ``setup.py`` directly.
4.969 Instead, use pypa/build, pypa/installer or other
4.969 standards-based tools.
4.969
4.969 See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
4.969 ********************************************************************************
4.969
4.969 !!
4.969 self.initialize_options()
4.969 /opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no g++ version bounds defined for CUDA version 12.1
4.969 warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
4.969 Traceback (most recent call last):
4.969 File "<string>", line 2, in <module>
4.969 File "<pip-setuptools-caller>", line 34, in <module>
4.969 File "/foundationpose/bundlesdf/mycuda/setup.py", line 21, in <module>
4.969 setup(
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/__init__.py", line 104, in setup
4.969 return distutils.core.setup(**attrs)
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 184, in setup
4.969 return run_commands(dist)
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
4.969 dist.run_commands()
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
4.969 self.run_command(cmd)
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/dist.py", line 967, in run_command
4.969 super().run_command(command)
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
4.969 cmd_obj.run()
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py", line 34, in run
4.969 self.install_for_development()
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py", line 111, in install_for_development
4.969 self.run_command('build_ext')
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
4.969 self.distribution.run_command(command)
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/dist.py", line 967, in run_command
4.969 super().run_command(command)
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
4.969 cmd_obj.run()
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 91, in run
4.969 _build_ext.run(self)
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
4.969 self.build_extensions()
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 873, in build_extensions
4.969 build_ext.build_extensions(self)
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 479, in build_extensions
4.969 self._build_extensions_serial()
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 505, in _build_extensions_serial
4.969 self.build_extension(ext)
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 252, in build_extension
4.969 _build_ext.build_extension(self, ext)
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 560, in build_extension
4.969 objects = self.compiler.compile(
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 677, in unix_wrap_ninja_compile
4.969 cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 576, in unix_cuda_flags
4.969 cflags + _get_cuda_arch_flags(cflags))
4.969 File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1980, in _get_cuda_arch_flags
4.969 arch_list[-1] += '+PTX'
4.969 IndexError: list index out of range
4.969 [end of output]
4.969
4.969 note: This error originates from a subprocess, and is likely not a problem with pip.
4.974 error: subprocess-exited-with-error
4.974
4.974 × python setup.py develop did not run successfully.
4.974 │ exit code: 1
4.974 ╰─> [94 lines of output]
4.974 running develop
4.974 running egg_info
4.974 creating common.egg-info
4.974 writing common.egg-info/PKG-INFO
4.974 writing dependency_links to common.egg-info/dependency_links.txt
4.974 writing top-level names to common.egg-info/top_level.txt
4.974 writing manifest file 'common.egg-info/SOURCES.txt'
4.974 reading manifest file 'common.egg-info/SOURCES.txt'
4.974 writing manifest file 'common.egg-info/SOURCES.txt'
4.974 running build_ext
4.974 building 'common' extension
4.974 creating /foundationpose/bundlesdf/mycuda/build
4.974 creating /foundationpose/bundlesdf/mycuda/build/temp.linux-x86_64-cpython-38
4.974 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
4.974 /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py:266: UserWarning: Unknown distribution option: 'extra_cflags'
4.974 warnings.warn(msg)
4.974 /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py:266: UserWarning: Unknown distribution option: 'extra_cuda_cflags'
4.974 warnings.warn(msg)
4.974 /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
4.974 !!
4.974
4.974 ********************************************************************************
4.974 Please avoid running ``setup.py`` and ``easy_install``.
4.974 Instead, use pypa/build, pypa/installer or other
4.974 standards-based tools.
4.974
4.974 See https://github.com/pypa/setuptools/issues/917 for details.
4.974 ********************************************************************************
4.974
4.974 !!
4.974 easy_install.initialize_options(self)
4.974 /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
4.974 !!
4.974
4.974 ********************************************************************************
4.974 Please avoid running ``setup.py`` directly.
4.974 Instead, use pypa/build, pypa/installer or other
4.974 standards-based tools.
4.974
4.974 See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
4.974 ********************************************************************************
4.974
4.974 !!
4.974 self.initialize_options()
4.974 /opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no g++ version bounds defined for CUDA version 12.1
4.974 warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
4.974 Traceback (most recent call last):
4.974 File "<string>", line 2, in <module>
4.974 File "<pip-setuptools-caller>", line 34, in <module>
4.974 File "/foundationpose/bundlesdf/mycuda/setup.py", line 21, in <module>
4.974 setup(
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/__init__.py", line 104, in setup
4.974 return distutils.core.setup(**attrs)
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 184, in setup
4.974 return run_commands(dist)
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
4.974 dist.run_commands()
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
4.974 self.run_command(cmd)
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/dist.py", line 967, in run_command
4.974 super().run_command(command)
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
4.974 cmd_obj.run()
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py", line 34, in run
4.974 self.install_for_development()
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py", line 111, in install_for_development
4.974 self.run_command('build_ext')
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
4.974 self.distribution.run_command(command)
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/dist.py", line 967, in run_command
4.974 super().run_command(command)
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
4.974 cmd_obj.run()
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 91, in run
4.974 _build_ext.run(self)
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
4.974 self.build_extensions()
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 873, in build_extensions
4.974 build_ext.build_extensions(self)
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 479, in build_extensions
4.974 self._build_extensions_serial()
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 505, in _build_extensions_serial
4.974 self.build_extension(ext)
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 252, in build_extension
4.974 _build_ext.build_extension(self, ext)
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 560, in build_extension
4.974 objects = self.compiler.compile(
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 677, in unix_wrap_ninja_compile
4.974 cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 576, in unix_cuda_flags
4.974 cflags + _get_cuda_arch_flags(cflags))
4.974 File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1980, in _get_cuda_arch_flags
4.974 arch_list[-1] += '+PTX'
4.974 IndexError: list index out of range
4.974 [end of output]
4.974
4.974 note: This error originates from a subprocess, and is likely not a problem with pip.
------
failed to solve: process "/bin/bash --login -c cd /foundationpose/bundlesdf/mycuda && rm -rf build *egg* && pip install -e ." did not complete successfully: exit code: 1
Facing the same issue as @mrtnbm here, hope it gets resolved :)