DeepStack icon indicating copy to clipboard operation
DeepStack copied to clipboard

Deepstack gpu docker image timeout

Open rickydua opened this issue 4 years ago • 18 comments

System: Debian Buster 10 Backports buster linux drivers GTX 960 GPU

Hey there, I'm using the latest image from docker hub deepquestai/deepstack:gpu. After following guide here, I manged to launch deepstack:gpu container but everytime I send an image for detection I get timeout error.

{'success': False, 'error': 'failed to process request before timeout', 'duration': 0}

Steps I took:

  • Installed latest docker, nvidia-docker2 and deepstack:gpu docker image
  • Started container with sudo docker run --gpus all -e VISION-DETECTION=True -v localstorage:/datastore -p 5000:5000 deepquestai/deepstack:gpu
  • Tried to send image from here with same python code to running container to localhost
  • Got timeout error after 1 min

More info:

sudo nvidia-docker run --name=deepstack --gpus all -e MODE=High -e VISION-DETECTION=True -v deepstack:/datastore -p 5000:5000 deepquestai/deepstack:gpu
DeepStack: Version 2021.02.1
/v1/vision/detection
---------------------------------------
---------------------------------------
v1/backup
---------------------------------------
v1/restore
[GIN] 2021/04/02 - 22:05:09 | 500 |          1m0s |      172.17.0.1 | POST     /v1/vision/detection

Host Nvidia SMI

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 960     On   | 00000000:07:00.0 Off |                  N/A |
|  7%   45C    P8    14W / 130W |      1MiB /  2000MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

root@21951afe7542:/app/server# cat logs/stderr.txt exit status 1chdir intelligencelayer\shared: The system cannot find the path specified

root@21951afe7542:/app/server# cat ../logs/stderr.txt 
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/app/intelligencelayer/shared/detection.py", line 69, in objectdetection
    detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
  File "/app/intelligencelayer/shared/./process.py", line 36, in __init__
    self.model = attempt_load(model_path, map_location=self.device)
  File "/app/intelligencelayer/shared/./models/experimental.py", line 159, in attempt_load
    torch.load(w, map_location=map_location)["model"].float().fuse().eval()
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 584, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 842, in _load
    result = unpickler.load()
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 834, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 823, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 803, in restore_location
    return default_restore_location(storage, str(map_location))
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 174, in default_restore_location
    result = fn(storage, location)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 150, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 134, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

rickydua avatar Apr 02 '21 22:04 rickydua

Same. Shame as it has been solid for months.

Kosh42 avatar Apr 15 '21 20:04 Kosh42

I wonder when there is going to be an update? It has been two months since a checkin of this project. It is a great project and it would be a shame to see it abandoned!

gillonba avatar Apr 21 '21 20:04 gillonba

Hello @rickx34 @Kosh42 @gillonba

Thanks for reporting this. Sorry we have not been able to attend to issues for a while now. We have an update to DeepStack coming this month.

On this issue, it appears DeepStack is unable to detect the gpu. Also, i notice from the results of nvidia-smi above that cuda version is N/A (CUDA Version: N/A )

Did you attempt to install cuda and what version of cuda was installed?

johnolafenwa avatar Apr 22 '21 09:04 johnolafenwa

@johnolafenwa I think the output of nvidia-smi is from host, I presume the docker image has cuda installed, I can nvidia-smi within the docker image and can get cuda version

rickydua avatar Apr 27 '21 04:04 rickydua

Folks, its a shame but we have to update the docs for the docker on linux... When you run a docker with GPU on linux you have to pass --privileged parameter so the container can access NVIDIA devices on the host. You can also mess with --device param but the quickest way would be just --privileged. docker run --gpus all --privileged ...<rest of the parameters>

chorus12 avatar Sep 11 '21 10:09 chorus12

I'm having this exact problem and same error on debian 11 but haven't been able to get past it. I tried --privileged as well.

bbrendon avatar Mar 11 '22 06:03 bbrendon

Have CPU working for all three VISION-SCENE, VISION-DETECTION, VISION-FACE. Really nice work!

Now with GPU option only VISION-SCENE, VISION-DETECTION are working. The VISION-FACE is timing out:

[GIN] 2022/03/18 - 22:53:21 | 500 | 1m0s | 172.17.0.1 | POST "/v1/vision/face/"

docker run --gpus all --privileged -e VISION-FACE=True -v /mnt/user/security/datastore:/datastore -p 5000:5000 deepquestai/deepstack:gpu-2022.01.1

Also tried deepquestai/deepstack:gpu-x5-beta with the same result.

Running Intel Core i5-6500 and GeForce GTX 1050 on Ubuntu 20.04 LTS, downloaded today, fresh install.

Cuda working inside docker as below is test / output:

docker run --gpus all nvidia/cuda:11.0-base nvidia-smi NVIDIA-SMI 510.54 Driver Version: 510.54 CUDA Version: 11.6

william-bohannan avatar Mar 18 '22 23:03 william-bohannan

Simple install code to make it quick to replicate, also includes python test scripts

install-notes.txt python.zip .

Also installed Nvidia cudnn8 with same timeout happening with DeepStack GPU Face, below steps taken..

OS="ubuntu2004" sudo apt-get update get https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-${OS}.pin sudo mv cuda-${OS}.pin /etc/apt/preferences.d/cuda-repository-pin-600 sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/7fa2af80.pub sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/ /" apt search libcudnn apt-get install libcudnn8 libcudnn8-dev

william-bohannan avatar Mar 19 '22 07:03 william-bohannan

must have been the memory of the graphics card, all working on the 1080ti which has 11GB of memory

william-bohannan avatar Mar 24 '22 12:03 william-bohannan

Hi,

gpu-2022.01.1 does not work for me on any endpoints. I get a timeout after 1m.

gpu-2021.09.1 works for me on every endpoints even without --privileged.

I have a GeForce GTX 1650 4G.

Here's my nvidia-smi on gpu-2022.01.1:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    Off  | 00000000:01:00.0 Off |                  N/A |
| 36%   40C    P8     7W /  75W |      0MiB /  3909MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Here's my nvidia-smi on gpu-2021.09.1:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    Off  | 00000000:01:00.0 Off |                  N/A |
| 37%   43C    P0    21W /  75W |   3072MiB /  3909MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

EDIT: notice the memory is 0MiB / 3909MiB on the gpu-2022.01.1... so nothing has been loaded I guess.

mikegleasonjr avatar Mar 30 '22 22:03 mikegleasonjr

Same issue for me with a GTX 750Ti, works perfectly with gpu-2021.09.1 but not with gpu-2022.01.1

Brando47 avatar May 04 '22 07:05 Brando47

Any update to this?

ghzgod avatar Jul 08 '22 10:07 ghzgod

Just an FYI, I ran across the same issue after redis crashed on the docker system I was running on. Probably not the most common cause, but the timeouts happened the same and tailing /app/logs/stderr.txt in the container revealed the issue.

rocket357 avatar Jul 16 '22 11:07 rocket357

So 1y and half later and no news on something like gpu usage on a image recognizing software.

JPM-git avatar Dec 27 '22 20:12 JPM-git

Any news on this subject ?

LeorFinacre avatar Apr 03 '24 20:04 LeorFinacre

Any news on this subject ?

The project has been dead for over two years. You can switch to CodeProject AI. https://www.codeproject.com/AI/docs/

michaelyorkpa avatar Apr 03 '24 23:04 michaelyorkpa

Oh thank you for pointing me to the successor. Just a quick question, I see :

The Docker GPU version is specific to nVidia's CUDA enabled cards with compute capability >= 6.0

So my graphic card with compute compatibility 3.0 is useless with this project I guess ? Just to be sure if there is a way to use it anyway or not ?

LeorFinacre avatar Apr 05 '24 11:04 LeorFinacre

I use the windows version so I can't speak to specifics. But, I'd grab the Docker CPU version, then once it's installed there are multiple modules you can use that will operate on older GPUs. Within CPAI are multiple types of processing modules available to install and use for image processing (and a few for sound, facial recognition, text process like license plates, etc.).

michaelyorkpa avatar Apr 05 '24 12:04 michaelyorkpa