mxnet icon indicating copy to clipboard operation
mxnet copied to clipboard

[BUGFIX] add cudatoolkit dll dir to dll_directory

Open specter119 opened this issue 3 years ago • 8 comments

Description

For people who use conda create python env, install cudatoolkit is the most efficient way to get Cuda support. However, the package will not implement the system variables. Then mxnet will not find the corresponding cuda dlls. By adding $CONDA_PREFIX/Library/bin can solve this problem. for the people who always working in jupyter, adding "CUDA_PREFIX" to env of jupyter kernelspec file will be necessary.

BTW, I still write an other more brute way:

import pathlib

_ = list(
    map(
        lambda x: os.add_dll_directory(str(x)),
        (p for p in map(pathlib.Path, set(map(str.lower, os.environ['path'].split(';')))) if next(p.glob('*.dll'), None) is not None),
    )
)

This way add all the dir has dll in $PATH to dll search path. But, I not sure too long dll dir path will harm to some performance. By the slice test on my machine. It has a tiny influence to the import modules.

I'm a beginner of mxnet that just finish the installation, Sorry for the inconvenience this PR may bring to you.

Checklist

Essentials

  • [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • [x] Changes are complete (i.e. I finished coding on this PR)
  • [ ] All changes have test coverage
  • [x] Code is well-documented

Changes

  • [ ] Feature1, tests, (and when applicable, API doc)
  • [ ] Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

specter119 avatar Jul 09 '21 14:07 specter119

Hey @specter119 , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [sanity, unix-gpu, clang, edge, website, unix-cpu, miscellaneous, windows-gpu, windows-cpu, centos-cpu, centos-gpu]


Note: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged.

mxnet-bot avatar Jul 09 '21 14:07 mxnet-bot

@leezu Thanks for your reply. Yeah, I'm sure I have activated the conda env in the right way, and other deep learning frameworks work well with cuda in this env. but I don't know why conda does not set up the dll directories.

I have also met import errors like https://github.com/apache/incubator-mxnet/issues/18276 and https://github.com/apache/incubator-mxnet/issues/17887. I also tried manually load dlls by ctypes.cdll.LoadLibrary can fix this.

specter119 avatar Jul 10 '21 02:07 specter119

Sorry for the delayed response. Do you mean that ctypes.cdll.LoadLibrary does not require os.add_dll_directory?

leezu avatar Jul 17 '21 18:07 leezu

I mean they do the same thing. Both loading all dll files by ctypes.cdll.LoadLibrary and adding the dll folder by os.add_dll_directory can fix the import error.

specter119 avatar Jul 18 '21 03:07 specter119

Yeah, I'm sure I have activated the conda env in the right way

Conda documentation says "For now, most DLLs are installed into (install prefix)\Library\bin. This path is added to os.environ["PATH"] for all Python processes, so that DLLs can be located, regardless of the value of the system's PATH environment variable." https://docs.conda.io/projects/conda-build/en/latest/resources/use-shared-libraries.html#shared-libraries-in-windows

Please check if os.environ["PATH"] is correct in your case? Ie. os.environ["PATH"] is expected to contain the (install prefix)\Library\bin directory

leezu avatar Jul 19 '21 15:07 leezu

@leezu Yeah, 'Library\bin' is in the os.environ['PATH'] please the refer https://docs.python.org/3/library/os.html#os.add_dll_directory

New in version 3.8: Previous versions of CPython would resolve DLLs using the default behavior for the current process. This led to inconsistencies, such as only sometimes searching PATH or the current working directory, and OS functions such as AddDllDirectory having no effect

I think the $PATH has no effect on the dll dir when pyton>= 3.8 in the windows platform.

specter119 avatar Jul 20 '21 08:07 specter119

Ok, great. Thank you for pointing out the change. Is there an issue in the Conda repository that discusses how to address the change in Python 3.8+?

leezu avatar Jul 20 '21 14:07 leezu

@leezu it seems that the conda repository doesn't have a similar issue.

I don't have enough knowledge about the DLL lib files, however, I have an assumption.

The packages provided by conda can find the DLL files they need, for example, the GPU-enabled packages were built in an environment with cudatoolkit provided. However, the mxnet provides the whl files, which were not built in an environment like that. The import error like mxnet is not a common case to be discussed in the issues of the conda repository.

specter119 avatar Jul 21 '21 04:07 specter119

this one solve my problem on python 3.10 wtih "mxnet-1.9.0+mkl-cp310-cp310-win_amd64.whl", thank lord, this is a tough one i must say.

XstormLeigh avatar Apr 07 '23 07:04 XstormLeigh