textgenrnn icon indicating copy to clipboard operation
textgenrnn copied to clipboard

High CPU usage, low GPU usage

Open KitEJohnson opened this issue 4 years ago • 11 comments

Hi there

When I train textgenrnn on a text file it seems to progress fairly slowly (28ms/step), with high CPU usage (>40%), and low GPU usage (c.10%). As I've got a fairly beefy GPU, a 2070, I'd have expected faster performance with the GPU taking more of the load. Is there any option to pass more of the work onto the GPU? It's recognized by tensorflow, and I've got CUDA and CuDNN installed.

Thanks in advance

KitEJohnson avatar Apr 01 '20 15:04 KitEJohnson

Hi, what parameters are you using? Some parameters like batchsize needs to be increased for the GPU to be able to stretch it legs

ZerxXxes avatar Apr 01 '20 17:04 ZerxXxes

Hi there.

I've upped the batchsize and this seems to have resolved the problem. Currently sitting at 2048 and far faster than before. Are there diminishing returns at some point for increases here?

KitEJohnson avatar Apr 01 '20 17:04 KitEJohnson

You can modify 3 lines of code to enabled Mixed Precision in Tensorflow, this will make your RTX2070 to also use its Tensorcores which for me (on a RTX2080Ti) gave about 2.2x speed increase.

you need to add these two lines right at the start of model.py

def textgenrnn_model(num_classes, cfg, context_size=None,
                     weights_path=None,
                     dropout=0.0,
                     optimizer=Adam(lr=4e-3)):
    '''
    Builds the model architecture for textgenrnn and
    loads the specified weights for the model.
    '''

    policy = mixed_precision.Policy('mixed_float16')
    mixed_precision.set_policy(policy)

    input = Input(shape=(cfg['max_length'],), name='input')

And further down in the same file you need to set the dtype: output = Dense(num_classes, name='output', dtype='float32', activation='softmax')(attention)

Tensorcores can only be used when your models parameters are multiples of 8 so you need to also change some values from default when you train a new model. Both max_length and dim_embeddings needs to be changed (from 40 and 100 which are default) to something that is multiple of 8, like 32 and 128 when you create your model.

ZerxXxes avatar Apr 01 '20 18:04 ZerxXxes

Thanks! I've also had the following error after training:

tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled

Would there be a fix for this?

KitEJohnson avatar Apr 01 '20 18:04 KitEJohnson

Yes, I belive this is a bug for Keras sequence multi-processing implementation. Its fixed in tensorflow-nightly builds and will be fixed in tensorflow 2.2 https://github.com/tensorflow/tensorflow/issues/35100

ZerxXxes avatar Apr 01 '20 18:04 ZerxXxes

Thank you for the quick response! I've edited model.py, but don't see any speed increase (still about 280ms/step). Would there be any particular reason for this?

I've attached my model.py in case I didn't correctly edit it. model.txt

KitEJohnson avatar Apr 01 '20 18:04 KitEJohnson

Hey, you pasted to much, only the two rows should be inserted in to the code

    policy = mixed_precision.Policy('mixed_float16')
    mixed_precision.set_policy(policy)

The other lines was just for orientation on where to paste them.

ZerxXxes avatar Apr 01 '20 19:04 ZerxXxes

Thanks again. Now when I run textgen = textgenrnn() I get the error name:

NameError: name 'mixed_precision' is not defined

Don't suppose there's something obvious I'm missing?

KitEJohnson avatar Apr 01 '20 21:04 KitEJohnson

Ah, no its my fault. I forgot you also need to import mixed precision support. at the top of the file with all the other imports, add: from tensorflow.keras.mixed_precision import experimental as mixed_precision

ZerxXxes avatar Apr 02 '20 05:04 ZerxXxes

it wont run on gpu for me with batch size 8096

test1230-lab avatar Jun 02 '20 00:06 test1230-lab

it's cpu only. not everything is gpu

breadbrowser avatar Jul 06 '22 00:07 breadbrowser