gramine
gramine copied to clipboard
Python fails to load all site-packages/system path inside Gramine
Description of the problem
Python fails to load example/workload(Pytorch) when necessary libraries are in different site-packages even after mounting(CentOS)
Ref: https://github.com/gramineproject/examples/tree/master/pytorch
I was debugging Pytorch Example inside CentOS Docker Container. Initially it was failing with below mentioned error:
Traceback (most recent call last):
File "./pytorchexample.py", line 4, in <module>
from torchvision import models
File "/usr/local/lib64/python3.6/site-packages/torchvision/__init__.py", line 4, in <module>
from .extension import _HAS_OPS
File "/usr/local/lib64/python3.6/site-packages/torchvision/extension.py", line 6, in <module>
import torch
File "/usr/local/lib64/python3.6/site-packages/torch/__init__.py", line 29, in <module>
from .torch_version import __version__ as __version__
File "/usr/local/lib64/python3.6/site-packages/torch/torch_version.py", line 3, in <module>
from pkg_resources import packaging # type: ignore[attr-defined]
ModuleNotFoundError: No module named 'pkg_resources'
I searched for pkg_resources in my system and found in below path
Torchvision: /usr/local/lib64/python3.6/site-packages pkg_resources: /usr/lib/python3.6/site-packages
Torchvision internally requires pkg_resources library which is installed in /usr/lib path. I mounted the pkg_resources path in manifest.template
[[fs.mounts]]
type = "chroot"
uri = "file:/usr/lib/python3.6/site-packages/"
path = "/usr/lib/python3.6/site-packages/"
Even after the mounting the test still fails with same error.
After debugging for a while, we noticed that python with gramine-direct does not have same sys.path as normal python has. None of the /usr/lib path were present inside sys.path and since it does not look there, it fails to find the libraries installed in other site-packages location
[intel@fc881c4f9bda sample]$ gramine-direct ./python get_path.py
['/', '/lib64/python36.zip', '/lib64/python3.6', '/lib64/python3.6/lib-dynload', '/usr/local/lib64/python3.6/site-packages', '/usr/local/lib/python3.6/site-packages', '/lib64/python3.6/site-packages']
[intel@fc881c4f9bda sample]$ python3 get_path.py
['/home/intel/gramine_install/usr/lib64/python3.6/site-packages', '/home/intel/anjali/gramine/Scripts', '/usr/lib64/python36.zip', '/usr/lib64/python3.6', '/usr/lib64/python3.6/lib-dynload', '/usr/local/lib64/python3.6/site-packages', '/usr/local/lib/python3.6/site-packages', '/usr/lib64/python3.6/site-packages', '/usr/lib/python3.6/site-packages']
Steps to reproduce
Install setuptools in different location like /usr/lib/python3.6/site-packages Install torch #By Default it loads in /usr/local/lib64 or /home/username/local
Expected results
sys.get_path should return all the system paths:
[/usr/lib64/python36.zip', '/usr/lib64/python3.6', '/usr/lib64/python3.6/lib-dynload', '/usr/local/lib64/python3.6/site-packages', '/usr/local/lib/python3.6/site-packages', '/usr/lib64/python3.6/site-packages', '/usr/lib/python3.6/site-packages']
Actual results
['/', '/lib64/python36.zip', '/lib64/python3.6', '/lib64/python3.6/lib-dynload', '/usr/local/lib64/python3.6/site-packages', '/usr/local/lib/python3.6/site-packages', '/lib64/python3.6/site-packages']
Gramine commit hash
49413a8a5817e67f055a22edeb8804def851deb8
I confirm this. Me and Anjali (@anjalirai-intel) had a debug session. Neither of us has any clue why this discrepancy happens.
@woju ?
loader.insecure__use_host_env = true
in manifest could be the culprit. I suspect you fiddle with PYTHONPATH
and such on host (which is often the case with Gramine) and then forward these to inside Gramine, where the paths are different (at least some).
Hi @boryspoplawski
I tried with your observation and removed the loader.insecure__use_host_env = true from manifest
loader.insecure__use_host_env = true ['/', '/home/intel/gramine_install/usr/lib64/python3.6/site-packages', '/home/intel/anjali/test/gramine/Scripts', '/', '/lib64/python36.zip', '/lib64/python3.6', '/lib64/python3.6/lib-dynload', '/usr/local/lib64/python3.6/site-packages', '/usr/local/lib/python3.6/site-packages', '/lib64/python3.6/site-packages']
loader.insecure__use_host_env = false ['/', '/lib64/python36.zip', '/lib64/python3.6', '/lib64/python3.6/lib-dynload', '/usr/local/lib64/python3.6/site-packages', '/usr/local/lib/python3.6/site-packages', '/lib64/python3.6/site-packages']
I still don't see /usr/lib
Maybe you can try adding loader.argv0_override="/usr/libexec/platform-python3.6" to the manifest? I think python guesses the path from argv[0].
Hi @lejunzhu
Adding this to manifest file, workload passes and it returns correct system paths
gramine-direct ./python get_path.py ['/', '/usr/lib64/python36.zip', '/usr/lib64/python3.6', '/usr/lib64/python3.6/lib-dynload', '/usr/local/lib64/python3.6/site-packages', '/usr/local/lib/python3.6/site-packages', '/usr/lib64/python3.6/site-packages', '/usr/lib/python3.6/site-packages']
Quick update: @woju said that it's hard to explain because Python's distribution for each OS distro (Ubuntu, CentOS, Arch, etc.) is slightly different, and those path finding routines are different between them.
There seems to be no general workaround/way that would work for all users and for all OS distros.
The only reasonable workaround seems to be specifying loader.argv0_override
manifest option as @lejunzhu mentioned, in a specific Docker image, with well-known paths (so that the paths can be hard-coded in the Gramine manifest file). @woju will hopefully join this thread with some explanations.
@anjalirai-intel Is this something we need to keep open? Looks like the problem is non-trivial to solve, and not really the Gramine fault. The workaround also seems to work (well, the workaround now requires the usage of loader.argv
since loader.argv0_override
was deprecated).
@dimakuv Yes, there is a workaround. If you want to close it with workaround that is also fine.