handson-ml3 icon indicating copy to clipboard operation
handson-ml3 copied to clipboard

Chap 16, cell[49], Reusing Pretrained Embeddings and Language Models.

Open Asjad22 opened this issue 2 years ago • 1 comments

Hello, I'm using Nvidia Geforce 1050 ti 4gb, after running cell [49], I'm getting this error. Kindly share the requirements of GPU which will run smoothly with this book.

W tensorflow/core/framework/op_kernel.cc:1733] RESOURCE_EXHAUSTED: failed to allocate memory

Asjad22 avatar Aug 12 '22 14:08 Asjad22

Hi @Asjad22 ,

Thanks for your question, and sorry for the late response.

The module is about 1GB in size (as indicated in the model's page), but that's just the size of the weights: you must also have enough RAM for the activations, which depends on the batch size (defaults to 32), and since the code trains the model, the activations must be preserved throughout all layers for backprop.

I just ran the model in Colab, which allocated a T4 GPU with 16GB or RAM, and the Runtime > Manage sessions page showed that 8.7GB were used:

image

So 4GB will definitely not be enough without changing the code. However, if you reduce the batch size, you may be able to get it to work. In model.fit(), try setting batch_size=1, just to see if it's possible. If so, then try increasing this value gradually until you find the limit.

Also, make sure you are not running any other GPU process in parallel.

Hope this helps.

ageron avatar Sep 26 '22 01:09 ageron

Thankyou for these details.

Asjad22 avatar Nov 29 '22 11:11 Asjad22

Hello, I'm resurrecting this closed issue as I've encountered exactly the same problem, but on Colab (free tier, T4 GPU). I kept getting a message that Your session crashed after using all available ram partway through the second epoch. I tried reducing the batch size to 16 and it got to the 4th epoch very slowly before telling me I've used up my free GPU runtime allotment.

The minimal code is as follows:

import tensorflow as tf
import tensorflow_datasets as tfds

raw_train_set, raw_valid_set, raw_test_set = tfds.load(
    name="imdb_reviews",
    split=["train[:90%]", "train[90%:]", "test"],
    as_supervised=True
)
BATCH_SIZE = 32
tf.random.set_seed(42)
train_set = raw_train_set.shuffle(5000, seed=42).batch(BATCH_SIZE).prefetch(1)
valid_set = raw_valid_set.batch(BATCH_SIZE).prefetch(1)
test_set = raw_test_set.batch(BATCH_SIZE).prefetch(1)
import os
import tensorflow_hub as hub

os.environ["TFHUB_CACHE_DIR"] = "my_tfhub_cache"
tf.random.set_seed(42)  # extra code – ensures reproducibility on CPU
model = tf.keras.Sequential([
    hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4",
                   trainable=True, dtype=tf.string, input_shape=[]),
    tf.keras.layers.Dense(64, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])
model.compile(loss="binary_crossentropy", optimizer="nadam",
              metrics=["accuracy"])
model.fit(train_set, validation_data=valid_set, epochs=10)

This is as far as I got:

image

Edit: I tried throwing it at an HPC with a 32 GB GPU and set the batch size to 16, and I still ran out of memory! I'm guessing something changed in tensorflow?

The relevant portion of the log is:

2024-03-15 09:42:30.715786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 31141 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0
Epoch 1/10
2024-03-15 09:43:12.730064: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2b56ce04c480 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-03-15 09:43:12.730287: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
2024-03-15 09:43:12.819729: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-03-15 09:43:12.879504: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8600
2024-03-15 09:43:13.260439: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
 287/1407 [=====>........................] - ETA: 56:36 - loss: 0.3923 - accuracy: 0.82772024-03-15 09:58:20.272597: W tensorflow/compiler/xla/service/gpu/runtime/graph_launch.cc:156] Evict all gpu graphs from executor 0x49e4450
2024-03-15 09:58:20.655647: W tensorflow/compiler/xla/service/gpu/runtime/support.cc:58] Intercepted XLA runtime error:
INTERNAL: There was an error before calling cuModuleGetFunction (2): cudaErrorMemoryAllocation : out of memory
Traceback (most recent call last):
  File "/project/6066126/w24_ml/imdb_transfer.py", line 40, in <module>
    model.fit(train_set, validation_data=valid_set, epochs=10, callbacks=callbacks)
  File "/project/6066126/w24_ml/venv/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/project/6066126/w24_ml/venv/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

This is with the following:

  • Python version 3.10.2
  • Tensorflow version 2.14.0
  • Cuda version 12.2

cfcurtis avatar Mar 14 '24 22:03 cfcurtis