tensorflow Significant difference in RSS memory usage between TF1 and TF2

Significant difference in RSS memory usage between TF1 and TF2

Open bergentruckung opened this issue 1 year ago • 9 comments

trafficstars

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

TF 2.13.1

Custom code

Yes

OS platform and distribution

Redhat Enterprise Linux 8.9

Mobile device

No response

Python version

3.11.4

Bazel version

5.4.0

GCC/compiler version

10.4

CUDA/cuDNN version

CUDA 12.2, cuDNN 8.9.5

GPU model and memory

A100 80GB

Current behavior?

When running the same keras workload on TF1 vs. TF2, I'm seeing a significant increase in memory utilization per epoch. This happens when using both CPUs/GPUs. The utilization for an epoch climbs up significantly after every epoch. See below for TF1 vs. TF2:

# For TF2

Memory usage after epoch 0 [mem_usage =  3.41 GB] 
Memory usage after epoch 1 [mem_usage =  3.88 GB] 
Memory usage after epoch 2 [mem_usage =  3.88 GB] 
Memory usage after epoch 3 [mem_usage =  4.32 GB] 
Memory usage after epoch 4 [mem_usage =  4.81 GB] 
Memory usage after epoch 5 [mem_usage =  5.26 GB] 
Memory usage after epoch 6 [mem_usage =  5.70 GB] 
Memory usage after epoch 7 [mem_usage =  6.14 GB] 
Memory usage after epoch 8 [mem_usage =  6.70 GB] 
Memory usage after epoch 9 [mem_usage =  7.15 GB] 
Memory usage after epoch 10 [mem_usage =  7.36 GB]
Memory usage after epoch 11 [mem_usage =  7.36 GB]
Memory usage after epoch 12 [mem_usage =  7.36 GB]
Memory usage after epoch 13 [mem_usage =  7.37 GB]
Memory usage after epoch 14 [mem_usage =  7.37 GB]
Memory usage after epoch 15 [mem_usage =  7.37 GB]
Memory usage after epoch 16 [mem_usage =  7.37 GB]
Memory usage after epoch 17 [mem_usage =  7.59 GB]
Memory usage after epoch 18 [mem_usage =  7.81 GB]
Memory usage after epoch 19 [mem_usage =  7.81 GB]

# For TF1

Memory usage after epoch 0 [mem_usage =  5.13 GB] 
Memory usage after epoch 1 [mem_usage =  5.14 GB] 
Memory usage after epoch 2 [mem_usage =  5.14 GB] 
Memory usage after epoch 3 [mem_usage =  5.15 GB] 
Memory usage after epoch 4 [mem_usage =  5.15 GB] 
Memory usage after epoch 5 [mem_usage =  5.15 GB] 
Memory usage after epoch 6 [mem_usage =  5.15 GB] 
Memory usage after epoch 7 [mem_usage =  5.15 GB] 
Memory usage after epoch 8 [mem_usage =  5.15 GB] 
Memory usage after epoch 9 [mem_usage =  5.15 GB] 
Memory usage after epoch 10 [mem_usage =  5.15 GB]
Memory usage after epoch 11 [mem_usage =  5.15 GB]
Memory usage after epoch 12 [mem_usage =  5.15 GB]
Memory usage after epoch 13 [mem_usage =  5.15 GB]
Memory usage after epoch 14 [mem_usage =  5.15 GB]
Memory usage after epoch 15 [mem_usage =  5.15 GB]
Memory usage after epoch 16 [mem_usage =  5.15 GB]
Memory usage after epoch 17 [mem_usage =  5.15 GB]
Memory usage after epoch 18 [mem_usage =  5.15 GB]
Memory usage after epoch 19 [mem_usage =  5.15 GB]

Standalone code to reproduce the issue

import tensorflow as tf
import psutil
import time
import os

def mem_usage_str():
    process = psutil.Process(os.getpid())
    gb =  process.memory_info().rss / (1024.**3)
    return ' [mem_usage = {:5.2f} GB]'.format(gb)

if int(tf.__version__.split('.')[0]) < 2:
    """
    Patch to fix TF/numpy1.20 compatibility issue
    """
    from   tensorflow.math          import reduce_prod
    from   tensorflow.python.ops    import array_ops

    def _constant_if_small(value, shape, dtype, name):
        try:
            if reduce_prod(shape) < 1000:  # monkey patch
                return array_ops.constant(value, shape=shape, dtype=dtype,
                                          name=name)
        except TypeError:
            # Happens when shape is a Tensor, list with Tensor elements, etc.
            pass
        return None

    array_ops._constant_if_small = _constant_if_small
    """
    End of patch
    """

def build_model():
    inputs = [tf.keras.layers.Input(shape=(300, 6), name='input_layer')]
    current_layer = inputs[0]

    current_layer = tf.keras.layers.LSTM(
        50,
        dropout=0.1,
        recurrent_dropout=0.1,
        return_sequences=False,
        name='lstm',
    )(current_layer)

    current_layer = tf.keras.layers.Dense(1)(current_layer)
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

    model = tf.keras.models.Model(inputs=inputs, outputs=current_layer)
    model.compile(loss='mse', optimizer=optimizer)

    return model

def run(model, X, y, n_epochs):
    tot_time = 0.

    print(f'Memory usage before training' + mem_usage_str())
    for i in range(n_epochs):
        start = time.time()
        model.fit(X, y, epochs=1, batch_size=4096, verbose=0)
        tot_time += time.time() - start
        print(f'Memory usage after epoch {i}' + mem_usage_str())

    print(f'Avg. time = {tot_time / n_epochs} seconds')

def run_example(p, n_epochs):
    import numpy as np

    model = build_model()
    X = np.random.randn(2 ** p, 300, 6)
    y = np.random.randn(2 ** p)

    run(model, X, y, n_epochs)

def main():
    run_example(
        16, # 2 ** 16 samples
        20, # 10 epochs
    )


# ------------------------------------------------------------------------------

if __name__ == "__main__":
    main()

Relevant log output

No response

Jan 24 '24 10:01 bergentruckung

Is it possible to have it both as a tagged template and as a "namespace" with the runEventLoop and other classes and interface?

I don't believe that is possible. It would have to be

import { slint }, * as the_rest from "slint-ui";
let instance = slint`...`;

I thought about this but I'm not sure anymore it's really worth the "convenience", for two reasons:

We can't make it type safe.
It's not an idiomatic use of template literals. We wouldn't really make use of templating after all - unless we add something like "inline" javascript handlers. But that brings in additional complications for the tooling.

If we want to have a way of creating a component instance from just a string, why don't we use a regular function?

let instance = slint.createInstanceFromString(export component App { ... });

Jan 30 '24 12:01 tronical

tensorflow tensorflow copied to clipboard

Significant difference in RSS memory usage between TF1 and TF2

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

tensorflow
tensorflow copied to clipboard