tensorflow icon indicating copy to clipboard operation
tensorflow copied to clipboard

Significant difference in RSS memory usage between TF1 and TF2

Open bergentruckung opened this issue 1 year ago • 9 comments
trafficstars

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

TF 2.13.1

Custom code

Yes

OS platform and distribution

Redhat Enterprise Linux 8.9

Mobile device

No response

Python version

3.11.4

Bazel version

5.4.0

GCC/compiler version

10.4

CUDA/cuDNN version

CUDA 12.2, cuDNN 8.9.5

GPU model and memory

A100 80GB

Current behavior?

When running the same keras workload on TF1 vs. TF2, I'm seeing a significant increase in memory utilization per epoch. This happens when using both CPUs/GPUs. The utilization for an epoch climbs up significantly after every epoch. See below for TF1 vs. TF2:

# For TF2

Memory usage after epoch 0 [mem_usage =  3.41 GB] 
Memory usage after epoch 1 [mem_usage =  3.88 GB] 
Memory usage after epoch 2 [mem_usage =  3.88 GB] 
Memory usage after epoch 3 [mem_usage =  4.32 GB] 
Memory usage after epoch 4 [mem_usage =  4.81 GB] 
Memory usage after epoch 5 [mem_usage =  5.26 GB] 
Memory usage after epoch 6 [mem_usage =  5.70 GB] 
Memory usage after epoch 7 [mem_usage =  6.14 GB] 
Memory usage after epoch 8 [mem_usage =  6.70 GB] 
Memory usage after epoch 9 [mem_usage =  7.15 GB] 
Memory usage after epoch 10 [mem_usage =  7.36 GB]
Memory usage after epoch 11 [mem_usage =  7.36 GB]
Memory usage after epoch 12 [mem_usage =  7.36 GB]
Memory usage after epoch 13 [mem_usage =  7.37 GB]
Memory usage after epoch 14 [mem_usage =  7.37 GB]
Memory usage after epoch 15 [mem_usage =  7.37 GB]
Memory usage after epoch 16 [mem_usage =  7.37 GB]
Memory usage after epoch 17 [mem_usage =  7.59 GB]
Memory usage after epoch 18 [mem_usage =  7.81 GB]
Memory usage after epoch 19 [mem_usage =  7.81 GB]

# For TF1

Memory usage after epoch 0 [mem_usage =  5.13 GB] 
Memory usage after epoch 1 [mem_usage =  5.14 GB] 
Memory usage after epoch 2 [mem_usage =  5.14 GB] 
Memory usage after epoch 3 [mem_usage =  5.15 GB] 
Memory usage after epoch 4 [mem_usage =  5.15 GB] 
Memory usage after epoch 5 [mem_usage =  5.15 GB] 
Memory usage after epoch 6 [mem_usage =  5.15 GB] 
Memory usage after epoch 7 [mem_usage =  5.15 GB] 
Memory usage after epoch 8 [mem_usage =  5.15 GB] 
Memory usage after epoch 9 [mem_usage =  5.15 GB] 
Memory usage after epoch 10 [mem_usage =  5.15 GB]
Memory usage after epoch 11 [mem_usage =  5.15 GB]
Memory usage after epoch 12 [mem_usage =  5.15 GB]
Memory usage after epoch 13 [mem_usage =  5.15 GB]
Memory usage after epoch 14 [mem_usage =  5.15 GB]
Memory usage after epoch 15 [mem_usage =  5.15 GB]
Memory usage after epoch 16 [mem_usage =  5.15 GB]
Memory usage after epoch 17 [mem_usage =  5.15 GB]
Memory usage after epoch 18 [mem_usage =  5.15 GB]
Memory usage after epoch 19 [mem_usage =  5.15 GB]

Standalone code to reproduce the issue

import tensorflow as tf
import psutil
import time
import os

def mem_usage_str():
    process = psutil.Process(os.getpid())
    gb =  process.memory_info().rss / (1024.**3)
    return ' [mem_usage = {:5.2f} GB]'.format(gb)

if int(tf.__version__.split('.')[0]) < 2:
    """
    Patch to fix TF/numpy1.20 compatibility issue
    """
    from   tensorflow.math          import reduce_prod
    from   tensorflow.python.ops    import array_ops

    def _constant_if_small(value, shape, dtype, name):
        try:
            if reduce_prod(shape) < 1000:  # monkey patch
                return array_ops.constant(value, shape=shape, dtype=dtype,
                                          name=name)
        except TypeError:
            # Happens when shape is a Tensor, list with Tensor elements, etc.
            pass
        return None

    array_ops._constant_if_small = _constant_if_small
    """
    End of patch
    """

def build_model():
    inputs = [tf.keras.layers.Input(shape=(300, 6), name='input_layer')]
    current_layer = inputs[0]

    current_layer = tf.keras.layers.LSTM(
        50,
        dropout=0.1,
        recurrent_dropout=0.1,
        return_sequences=False,
        name='lstm',
    )(current_layer)

    current_layer = tf.keras.layers.Dense(1)(current_layer)
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

    model = tf.keras.models.Model(inputs=inputs, outputs=current_layer)
    model.compile(loss='mse', optimizer=optimizer)

    return model

def run(model, X, y, n_epochs):
    tot_time = 0.

    print(f'Memory usage before training' + mem_usage_str())
    for i in range(n_epochs):
        start = time.time()
        model.fit(X, y, epochs=1, batch_size=4096, verbose=0)
        tot_time += time.time() - start
        print(f'Memory usage after epoch {i}' + mem_usage_str())

    print(f'Avg. time = {tot_time / n_epochs} seconds')

def run_example(p, n_epochs):
    import numpy as np

    model = build_model()
    X = np.random.randn(2 ** p, 300, 6)
    y = np.random.randn(2 ** p)

    run(model, X, y, n_epochs)

def main():
    run_example(
        16, # 2 ** 16 samples
        20, # 10 epochs
    )


# ------------------------------------------------------------------------------

if __name__ == "__main__":
    main()

Relevant log output

No response

bergentruckung avatar Jan 24 '24 10:01 bergentruckung

Is it possible to have it both as a tagged template and as a "namespace" with the runEventLoop and other classes and interface?

I don't believe that is possible. It would have to be

import { slint }, * as the_rest from "slint-ui";
let instance = slint`...`;

I thought about this but I'm not sure anymore it's really worth the "convenience", for two reasons:

  • We can't make it type safe.
  • It's not an idiomatic use of template literals. We wouldn't really make use of templating after all - unless we add something like "inline" javascript handlers. But that brings in additional complications for the tooling.

If we want to have a way of creating a component instance from just a string, why don't we use a regular function?

let instance = slint.createInstanceFromString(export component App { ... });

tronical avatar Jan 30 '24 12:01 tronical