merlin icon indicating copy to clipboard operation
merlin copied to clipboard

Switch to cuda causes trouble with pygpu

Open antje-schweitzer opened this issue 5 years ago • 3 comments

I have a student writing a thesis involving speech synthesis and we would like to use Merlin.

I installed Merlin using a virtual environment and pip install for the requirements. I can run both the demo and an own voice successfully on a machine that does not have GPU.

Switching to one with GPU, I get this when running the demo.

[...] ValueError: You are tring to use the old GPU back-end. It was removed from Theano. Use device=cuda* now. See https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29 for more information.

So I edited scripts/submit.sh changing device=gpu$gpu_id to device=cuda$gpu_id like here

THEANO_FLAGS="mode=FAST_RUN,device=cuda$gpu_id,"$MERLIN_THEANO_FLAGS

however now I get an error stating that pygpu is missing:

ERROR (theano.gpuarray): pygpu was configured but could not be imported or is too old (version 0.7 or higher required) Traceback (most recent call last): File "/mount/projekte19/sfb-732/a4/Leute/Antje/Teaching/SpeechSynthesis/Preparation/Merlin/merlin2/merlin-venv/merlin-venv/lib/python2.7/site-packages/theano/gpuarray/init.py", line 23, in import pygpu ImportError: No module named pygpu

and then it proceeds running without GPU.

pygpu does not seem to be available using pip, but using conda. Conda however does not provide bandmat, so switching to conda is not an option either.

I would be very grateful if someone could look into this - surely others must have the same problem (indeed a closed issue mentioned pygpu but did not provide explicit help).

How can I fix this? As it stands, my student and I can only use CPU, but we would like to build a voice involving a considerable amount of data, so I think we will need to use GPU.

Thanks, Antje

antje-schweitzer avatar Apr 04 '19 13:04 antje-schweitzer

If you have been able to solve the issue by installing pygpu with conda then you should be fine.

While the conda environment is active running pip install bandmat will install it to the miniconda environment.

ZackHodari avatar Apr 06 '19 14:04 ZackHodari

I solved this a while ago and forgot to post here, pasting my notes, which may not be 100% accurate but should help :)

First change line 13 in merlin/src/setup_env.sh to the following

MERLIN_THEANO_FLAGS="device=cuda,floatX=float32,on_unused_input=ignore"

Create a new conda environment

conda create -n merlin_gpu python=3
source activate merlin_gpu
pip install --upgrade pip

conda install matplotlib numpy scikit-learn scipy lxml theano
conda install -c conda-forge cudatoolkit
pip install bandmat

Test bandmat

python -m unittest discover bandmat

If you do not see the final output “OK”, it is likely you got the error "ValueError: numpy.ufunc has the wrong size, try recompiling”. Run the following, again the final command should give the final output “OK”

pip uninstall bandmat
cd ~/
git clone git://github.com/MattShannon/bandmat.git
cd bandmat
conda install cython
python setup.py build_ext --inplace
python setup.py install
python -m unittest discover bandmat

Test Theano

If the test below returns "Used the cpu" then something went wrong in setting up your environment.

First run export THEANO_FLAGS="floatX=float32,on_unused_input=ignore,device=cuda" , then run the test below with python

test_theano.py

from theano import function, config, shared, tensor
import numpy
import time
rng = numpy.random.RandomState(22)

# 10 x #cores x #threads per core
x = shared(numpy.asarray(rng.rand(10 * 30 * 768), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())

t0 = time.time()
for i in range(1000):
    r = f()
t1 = time.time()

print("Looping 1000 times took {} seconds".format(t1 - t0))
print("Result is {}".format(r))

if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

If it used the CPU and you got no other error then start again and make sure to repeat each step from a new session with a new environment.

Test Merlin

Run the following commands

cd merlin/egs/slt_arctic/s1
./01_setup.sh slt_arctic_demo
./02_prepare_conf_files.sh conf/global_settings.cfg
./03_train_duration_model.sh conf/duration_slt_arctic_demo.conf

If this does not fail then you should be running on a GPU, however to fully confirm this try training a voice and seeing if the training phase is quicker.

ZackHodari avatar Oct 25 '19 15:10 ZackHodari

If I remember correctly the snagging points were

  • fixing the THEANO_FLAGS in merlin's setup_env
  • ensuring bandmat installed successfully and re-compiling if not
  • install the cudatoolkit package with condo

ZackHodari avatar Oct 25 '19 15:10 ZackHodari