transfer-learning-conv-ai
transfer-learning-conv-ai copied to clipboard
No APEX Issue
If I don't have CUDA support, this code wouldn't work right, since you guys are using NVIDIA's apex which requires CUDA? Just wondering if there's an alternative?
You don't need apex to use the codebase, it's only if you want to do fp16 training. The code base also run on CPU but I'm not sure you can do the training, it would be very slow. If you only want to do inference (interact.py script) it should work. The interact.py script works fine on my laptop on CPU.
Yeah I suppose I won't be able to run the code due to infeasible training time, but when I run train.py, I get AttributeError: 'NoneType' object has no attribute 'split'
, for the line return tuple(int(x) for x in torch.version.cuda.split('.'))
and I'm guessing because there is no cuda on my laptop. I think the problem is that you guys use apex in the pytorch_pretraining_bert to implement OpenAIGPTDoubleHeadsModel and some other stuff that are imported in the module.
In which file is this line? (return tuple(int(x) for x in torch.version.cuda.split('.'))
)
I can't find it in our code base.
By the way if you have installed apex and don't have a GPU, you should uninstall it. It doesn't like having no GPUs.
It's not in your code, it's in apex's code which you guys import in modeling.py, which is imported in pytorch_pretrained_bert.py, which is imported in train.py.
Thanks for the advice I'll uninstall it.
edit:
Here is the full traceback if it's helpful:
Traceback (most recent call last): File "", line 1, in
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # execute the script File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/Users/tnguyen/Desktop/recourse-nlp/transfer-learning-conv-ai/train.py", line 19, in from pytorch_pretrained_bert import (OpenAIAdam, OpenAIGPTDoubleHeadsModel, OpenAIGPTTokenizer, File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, **kwargs) File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytorch_pretrained_bert/init.py", line 7, in from .modeling import (BertConfig, BertModel, BertForPreTraining, File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, **kwargs) File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 228, in from apex.normalization.fused_layer_norm import FusedLayerNorm as BertLayerNorm File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, **kwargs) File " ", line 983, in _find_and_load File " ", line 967, in _find_and_load_unlocked File " ", line 668, in _load_unlocked File " ", line 638, in _load_backward_compatible File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/init.py", line 2, in from . import amp File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, **kwargs) File " ", line 983, in _find_and_load File " ", line 967, in _find_and_load_unlocked File " ", line 668, in _load_unlocked File " ", line 638, in _load_backward_compatible File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/init.py", line 1, in from .amp import init, half_function, float_function, promote_function,
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, **kwargs) File "", line 983, in _find_and_load File " ", line 967, in _find_and_load_unlocked File " ", line 668, in _load_unlocked File " ", line 638, in _load_backward_compatible File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/amp.py", line 3, in from .lists import functional_overrides, torch_overrides, tensor_overrides File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, **kwargs) File " ", line 983, in _find_and_load File " ", line 967, in _find_and_load_unlocked File " ", line 668, in _load_unlocked File " ", line 638, in _load_backward_compatible File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/lists/torch_overrides.py", line 69, in if utils.get_cuda_version() >= (9, 1, 0): File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/utils.py", line 9, in get_cuda_version return tuple(int(x) for x in torch.version.cuda.split('.')) AttributeError: 'NoneType' object has no attribute 'split'
It's not in your code, it's in apex's code which you guys import in modeling.py, which is imported in pytorch_pretrained_bert.py, which is imported in train.py.
Thanks for the advice I'll uninstall it...
I'm getting a the same error. The last two lines indicate that torch.version.cuda
is returning None
. This problem is that the method get_cuda_version
in .../python3.7/site-packages/apex/amp/utils.py"
on line 9 looks like:
def get_cuda_version():
return tuple(int(x) for x in torch.version.cuda.split('.'))
...where instead it should be:
def get_cuda_version():
return tuple(int(x) for x in torch.__version__.split('.'))
This is an issue with torch.
@DannyDannyDanny
If no GPU is detected on the system, you won't be able to use apex
.
We should improve the error message on importing apex
and raise an Exception, if some apex
methods are used.
A workaround would be to guard the apex
import with if torch.cuda.is_available()
.
Your suggestion won't work, since
torch.version.cuda
returns the CUDA version (e.g. 10.0.130
), while torch.__version__
returns the PyTorch version (e.g. 1.3.0.dev20190923
).
Is there any older version that works without a GPU?
If you are using 3090, cuda11.0 seems not ok for apex but cuda11.1 is ok. I did the following and successfully installed apex 0.1
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./