DIGITS icon indicating copy to clipboard operation
DIGITS copied to clipboard

Digits server won't start, device_query issue

Open AliaMYH opened this issue 7 years ago • 2 comments

After working normally for a very long time, suddenly this error pops up when I try and run my digits server. I'm not sure why. I ran device_query.py and posted the results as well. @lukeyeager

$$$$$$$$$$:~/digits$ ./digits-devserver -p 5004
  ___ ___ ___ ___ _____ ___
 |   \_ _/ __|_ _|_   _/ __|
 | |) | | (_ || |  | | \__ \
 |___/___\___|___| |_| |___/ 5.1-dev

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/aliahassan95/digits/digits/__main__.py", line 70, in <module>
    main()
  File "/home/aliahassan95/digits/digits/__main__.py", line 55, in main
    import digits.webapp
  File "digits/webapp.py", line 64, in <module>
    import digits.model.images.classification.views  # noqa
  File "/usr/lib/python2.7/dist-packages/gevent/builtins.py", line 93, in __import__
    result = _import(*args, **kwargs)
  File "digits/model/images/classification/views.py", line 12, in <module>
    from .forms import ImageClassificationModelForm
  File "/usr/lib/python2.7/dist-packages/gevent/builtins.py", line 93, in __import__
    result = _import(*args, **kwargs)
  File "digits/model/images/classification/forms.py", line 4, in <module>
    from ..forms import ImageModelForm
  File "/usr/lib/python2.7/dist-packages/gevent/builtins.py", line 93, in __import__
    result = _import(*args, **kwargs)
  File "digits/model/images/forms.py", line 6, in <module>
    from ..forms import ModelForm
  File "/usr/lib/python2.7/dist-packages/gevent/builtins.py", line 93, in __import__
    result = _import(*args, **kwargs)
  File "digits/model/forms.py", line 18, in <module>
    class ModelForm(Form):
  File "digits/model/forms.py", line 321, in ModelForm
    ) for index in config_value('gpu_list').split(',') if index],
  File "digits/device_query.py", line 259, in get_nvml_info
    raise RuntimeError('nvmlDeviceGetHandleByPciBusId() failed with error #%s' % rc)
RuntimeError: nvmlDeviceGetHandleByPciBusId() failed with error #2


$$$$$$$:~/digits/digits$ ./device_query.py 
Device #0:
>>> CUDA attributes:
  name                         GeForce GTX TITAN X
  totalGlobalMem               12799574016
  clockRate                    1076000
  major                        5
  minor                        2
Traceback (most recent call last):
  File "./device_query.py", line 318, in <module>
    info = get_nvml_info(i)
  File "./device_query.py", line 259, in get_nvml_info
    raise RuntimeError('nvmlDeviceGetHandleByPciBusId() failed with error #%s' % rc)
RuntimeError: nvmlDeviceGetHandleByPciBusId() failed with error #2

AliaMYH avatar Mar 16 '17 00:03 AliaMYH

cudaErrorMemoryAllocation = 2 The API call failed because it was unable to allocate enough memory to perform the requested operation. http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038

Is there another process using up all the memory on your GPU?

lukeyeager avatar Mar 16 '17 00:03 lukeyeager

Thanks @lukeyeager it was a memory issue for me.

sumsuddin avatar Jul 28 '20 06:07 sumsuddin