mxnet
mxnet copied to clipboard
[BUGFIX] add cudatoolkit dll dir to dll_directory
Description
For people who use conda create python env, install cudatoolkit
is the most efficient way to get Cuda support.
However, the package will not implement the system variables. Then mxnet will not find the corresponding cuda dlls.
By adding $CONDA_PREFIX/Library/bin
can solve this problem.
for the people who always working in jupyter, adding "CUDA_PREFIX" to env of jupyter kernelspec file will be necessary.
BTW, I still write an other more brute way:
import pathlib
_ = list(
map(
lambda x: os.add_dll_directory(str(x)),
(p for p in map(pathlib.Path, set(map(str.lower, os.environ['path'].split(';')))) if next(p.glob('*.dll'), None) is not None),
)
)
This way add all the dir has dll in $PATH
to dll search path. But, I not sure too long dll dir path will harm to some performance. By the slice test on my machine. It has a tiny influence to the import modules.
I'm a beginner of mxnet that just finish the installation, Sorry for the inconvenience this PR may bring to you.
Checklist
Essentials
- [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
- [x] Changes are complete (i.e. I finished coding on this PR)
- [ ] All changes have test coverage
- [x] Code is well-documented
Changes
- [ ] Feature1, tests, (and when applicable, API doc)
- [ ] Feature2, tests, (and when applicable, API doc)
Comments
- If this change is a backward incompatible change, why must this change be made.
- Interesting edge cases to note here
Hey @specter119 , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:
- To trigger all jobs: @mxnet-bot run ci [all]
- To trigger specific jobs: @mxnet-bot run ci [job1, job2]
CI supported jobs: [sanity, unix-gpu, clang, edge, website, unix-cpu, miscellaneous, windows-gpu, windows-cpu, centos-cpu, centos-gpu]
Note: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged.
@leezu Thanks for your reply. Yeah, I'm sure I have activated the conda env in the right way, and other deep learning frameworks work well with cuda in this env. but I don't know why conda does not set up the dll directories.
I have also met import errors like https://github.com/apache/incubator-mxnet/issues/18276 and https://github.com/apache/incubator-mxnet/issues/17887. I also tried manually load dlls by ctypes.cdll.LoadLibrary
can fix this.
Sorry for the delayed response. Do you mean that ctypes.cdll.LoadLibrary
does not require os.add_dll_directory
?
I mean they do the same thing. Both loading all dll files by ctypes.cdll.LoadLibrary
and adding the dll folder by os.add_dll_directory
can fix the import error.
Yeah, I'm sure I have activated the conda env in the right way
Conda documentation says "For now, most DLLs are installed into (install prefix)\Library\bin. This path is added to os.environ["PATH"] for all Python processes, so that DLLs can be located, regardless of the value of the system's PATH environment variable." https://docs.conda.io/projects/conda-build/en/latest/resources/use-shared-libraries.html#shared-libraries-in-windows
Please check if os.environ["PATH"] is correct in your case? Ie. os.environ["PATH"] is expected to contain the (install prefix)\Library\bin directory
@leezu Yeah, 'Library\bin' is in the os.environ['PATH']
please the refer https://docs.python.org/3/library/os.html#os.add_dll_directory
New in version 3.8: Previous versions of CPython would resolve DLLs using the default behavior for the current process. This led to inconsistencies, such as only sometimes searching PATH or the current working directory, and OS functions such as AddDllDirectory having no effect
I think the $PATH
has no effect on the dll dir when pyton>= 3.8 in the windows platform.
Ok, great. Thank you for pointing out the change. Is there an issue in the Conda repository that discusses how to address the change in Python 3.8+?
@leezu it seems that the conda repository doesn't have a similar issue.
I don't have enough knowledge about the DLL lib files, however, I have an assumption.
The packages provided by conda can find the DLL files they need, for example, the GPU-enabled packages were built in an environment with cudatoolkit
provided. However, the mxnet provides the whl files, which were not built in an environment like that. The import error like mxnet is not a common case to be discussed in the issues of the conda repository.
this one solve my problem on python 3.10 wtih "mxnet-1.9.0+mkl-cp310-cp310-win_amd64.whl", thank lord, this is a tough one i must say.