eynollah icon indicating copy to clipboard operation
eynollah copied to clipboard

What is the known working GPU config?

Open Fractal-Anaphora opened this issue 1 year ago • 1 comments

I am using an Amazon pressed Ubuntu 16 Deep Learning AMI which contains CUDA 10, 10.1, 10.2, and 11.

I am using Mambaforge with Python 3.6 or 3.7

Tensorflow 2 is automatically used. I plan to try Tensorflow 1.x next.

The process is loaded into GPU memory, but the GPU is never used.

Is there a known working full stack config for eynollah on the GPU (OS+version, CUDA+version, Python+version, Tensorflow+version, etc) that you don't mind sharing?

Thanks,

Fractal-Anaphora avatar Aug 05 '22 20:08 Fractal-Anaphora

Hi @mach881040, for me it works well with NVIDIA 2070S GPU on Ubuntu 18.04, Python 3.7, Tensorflow 2.4.1 and CUDA 10.1. Note that there is also still a lot of room for improvement wrt GPU utilization - we hope to optimize this, but for our use case quality of results is much more important than throughput speed.

cneud avatar Aug 23 '22 19:08 cneud

The process is loaded into GPU memory, but the GPU is never used.

I can confirm this with Ubuntu 22.04, Python 3.8, TF 2.10. It's not about low utilisation. The OP says no utilisation, and that's what I see, too. The memory consumption is only 107 MB (and not increasing), GPU util is never anything other than 0%.

bertsky avatar Feb 11 '23 12:02 bertsky

Sorry, error on my part. Cause was an insufficient CUDA/TF installation. I probably ran into #72 as well.

(I am on CUDA 11.7 though, and now it does work. So the note in the Readme might not be correct.)

bertsky avatar Feb 11 '23 16:02 bertsky

BTW, is there a particular reason for keeping the TF1-style session management? I found that if I remove it completely (including the explicit GC calls), and avoid repeating load_model calls by storing the model refs in Eynollah's instance, it gets about 9% faster on average (while max RSS of course does increase from 4 GB to 7 GB).

bertsky avatar Feb 11 '23 17:02 bertsky

BTW, is there a particular reason for keeping the TF1-style session management? I found that if I remove it completely (including the explicit GC calls), and avoid repeating load_model calls by storing the model refs in Eynollah's instance, it gets about 9% faster on average (while max RSS of course does increase from 4 GB to 7 GB).

This should already be fixed with https://github.com/qurator-spk/eynollah/commit/7345f6bf678f36cf3a51576b0fa94df0919925d7 (which has since been merged), right?

The working config for (limited) GPU use is now documented in the README.

cneud avatar May 13 '23 11:05 cneud