handson-ml3
handson-ml3 copied to clipboard
Chap 16, cell[49], Reusing Pretrained Embeddings and Language Models.
Hello, I'm using Nvidia Geforce 1050 ti 4gb, after running cell [49], I'm getting this error. Kindly share the requirements of GPU which will run smoothly with this book.
W tensorflow/core/framework/op_kernel.cc:1733] RESOURCE_EXHAUSTED: failed to allocate memory
Hi @Asjad22 ,
Thanks for your question, and sorry for the late response.
The module is about 1GB in size (as indicated in the model's page), but that's just the size of the weights: you must also have enough RAM for the activations, which depends on the batch size (defaults to 32), and since the code trains the model, the activations must be preserved throughout all layers for backprop.
I just ran the model in Colab, which allocated a T4 GPU with 16GB or RAM, and the Runtime > Manage sessions page showed that 8.7GB were used:
data:image/s3,"s3://crabby-images/1bf0a/1bf0a245eda42f204b32ec358a62a1c9591c7cf9" alt="image"
So 4GB will definitely not be enough without changing the code. However, if you reduce the batch size, you may be able to get it to work. In model.fit()
, try setting batch_size=1
, just to see if it's possible. If so, then try increasing this value gradually until you find the limit.
Also, make sure you are not running any other GPU process in parallel.
Hope this helps.
Thankyou for these details.
Hello, I'm resurrecting this closed issue as I've encountered exactly the same problem, but on Colab (free tier, T4 GPU). I kept getting a message that Your session crashed after using all available ram
partway through the second epoch. I tried reducing the batch size to 16 and it got to the 4th epoch very slowly before telling me I've used up my free GPU runtime allotment.
The minimal code is as follows:
import tensorflow as tf
import tensorflow_datasets as tfds
raw_train_set, raw_valid_set, raw_test_set = tfds.load(
name="imdb_reviews",
split=["train[:90%]", "train[90%:]", "test"],
as_supervised=True
)
BATCH_SIZE = 32
tf.random.set_seed(42)
train_set = raw_train_set.shuffle(5000, seed=42).batch(BATCH_SIZE).prefetch(1)
valid_set = raw_valid_set.batch(BATCH_SIZE).prefetch(1)
test_set = raw_test_set.batch(BATCH_SIZE).prefetch(1)
import os
import tensorflow_hub as hub
os.environ["TFHUB_CACHE_DIR"] = "my_tfhub_cache"
tf.random.set_seed(42) # extra code – ensures reproducibility on CPU
model = tf.keras.Sequential([
hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4",
trainable=True, dtype=tf.string, input_shape=[]),
tf.keras.layers.Dense(64, activation="relu"),
tf.keras.layers.Dense(1, activation="sigmoid")
])
model.compile(loss="binary_crossentropy", optimizer="nadam",
metrics=["accuracy"])
model.fit(train_set, validation_data=valid_set, epochs=10)
This is as far as I got:
Edit: I tried throwing it at an HPC with a 32 GB GPU and set the batch size to 16, and I still ran out of memory! I'm guessing something changed in tensorflow?
The relevant portion of the log is:
2024-03-15 09:42:30.715786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 31141 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0
Epoch 1/10
2024-03-15 09:43:12.730064: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2b56ce04c480 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-03-15 09:43:12.730287: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
2024-03-15 09:43:12.819729: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-03-15 09:43:12.879504: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8600
2024-03-15 09:43:13.260439: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
287/1407 [=====>........................] - ETA: 56:36 - loss: 0.3923 - accuracy: 0.82772024-03-15 09:58:20.272597: W tensorflow/compiler/xla/service/gpu/runtime/graph_launch.cc:156] Evict all gpu graphs from executor 0x49e4450
2024-03-15 09:58:20.655647: W tensorflow/compiler/xla/service/gpu/runtime/support.cc:58] Intercepted XLA runtime error:
INTERNAL: There was an error before calling cuModuleGetFunction (2): cudaErrorMemoryAllocation : out of memory
Traceback (most recent call last):
File "/project/6066126/w24_ml/imdb_transfer.py", line 40, in <module>
model.fit(train_set, validation_data=valid_set, epochs=10, callbacks=callbacks)
File "/project/6066126/w24_ml/venv/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/project/6066126/w24_ml/venv/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:
This is with the following:
- Python version 3.10.2
- Tensorflow version 2.14.0
- Cuda version 12.2