ranking icon indicating copy to clipboard operation
ranking copied to clipboard

TensorFlow model cannot be parsed within the memory limit

Open tillwf opened this issue 3 years ago • 9 comments

Hello,

I'm trying to upload a model generated with TFRanking (32Mb) to BigQuery which I saved like this:

signatures = {
    'serving_default':
        make_keras_tft_serving_fn(
            ranker,
            tf_transform_output,
            context_cols,
            example_cols
        ).get_concrete_function(
            tf.TensorSpec(
                shape=[None],
                dtype=tf.string,
                name='examples'
            )
        ),
}
ranker.load_weights(checkpoint)
ranker.save(model_dir, save_format='tf', signatures=signatures)

but I got this error:

Error while reading data, error message: TensorFlow model cannot be parsed within the memory limit; try reducing the model size

error_bq_tfranking

Previously I managed to upload bigger model (>200Mb) created with regular TF 1.13 code, so I don't understand the message.

Did someone already encounter this ?

Thanks

On Ubuntu 18.04, Python 3.7.3

tensorflow==2.4.1
tensorflow-addons==0.12.1
tensorflow-datasets==4.2.0
tensorflow-estimator==2.4.0
tensorflow-hub==0.11.0
tensorflow-metadata==0.29.0
tensorflow-model-optimization==0.5.0
tensorflow-ranking==0.3.3
tensorflow-serving-api==2.4.1
tensorflow-transform==0.29.0

tillwf avatar Apr 26 '21 16:04 tillwf

I tried with the latest version of tensorflow-ranking (0.4.0) and it is still not working. Could someone help me ? Thank you

tillwf avatar Jun 02 '21 09:06 tillwf

Here is a ncdu of the model folder:

  335.0 MiB [##########] /train
   31.5 MiB [          ]  saved_model.pb
    3.1 MiB [          ] /validation
    2.4 MiB [          ] /variables
  792.0 KiB [          ]  keras_metadata.pb
   84.0 KiB [          ] /assets

I tried without the train folder, but it didn't change the message.

Any clue ?

tillwf avatar Jun 06 '21 18:06 tillwf

I followed the memory consumption of a script doing:

import tensorflow as tf
model = tf.saved_model.load("model_path")

(the model path does not contain the train folder)

and we see this

plot1

Is there a way to reduce this memory usage ? The model weight is only 30MiB on disk and become 2GiB in memory.

tillwf avatar Jun 26 '21 20:06 tillwf

I tried to reduce the size of the model by doing:

converter = tf.lite.TFLiteConverter.from_keras_model(ranker)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
converter.convert().save(model_dir, save_format='tf', signatures=signatures)

but I got this error:

2021-06-28 14:55:07.469989: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
W0628 14:55:07.577553 140086528587584 signature_serialization.py:151] Function `_wrapped_model` contains input name(s) args_0 with unsupported characters which will be renamed to args_0_275 in the SavedModel.
W0628 14:55:34.146763 140086528587584 save.py:243] Found untraced functions such as listwise_dense_features_layer_call_and_return_conditional_losses, listwise_dense_features_layer_call_fn, dense_3_layer_call_and_return_conditional_losses, dense_3_layer_call_fn, listwise_dense_features_layer_call_and_return_conditional_losses while saving (showing 5 of 65). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: /tmp/tmpte46l_mb/assets
I0628 14:55:41.276194 140086528587584 builder_impl.py:775] Assets written to: /tmp/tmpte46l_mb/assets
2021-06-28 14:55:48.804822: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2021-06-28 14:55:48.804951: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2021-06-28 14:55:48.949690: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1144] Optimization results for grappler item: graph_to_optimize
  function_optimizer: function_optimizer did nothing. time = 0.042ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.
​
*** tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot convert a Tensor of dtype resource to a NumPy array.

Could it be related to this answer https://github.com/tensorflow/tensorflow/issues/37441#issuecomment-775747315 ?

Does anyone has any idea how I can reduce the model memory size ?

tillwf avatar Jul 05 '21 12:07 tillwf

We suffer from the same issue, it seems to come from the structure of the model (number of ops): for now we tried reducing the number of layers and it seems to have reduced the issue. I'm not sure TF lite would work you wont be able to export it back to savedmodel format.

Did you find anything else on your side ?

tanguycdls avatar Aug 26 '21 14:08 tanguycdls

Hello @tanguycdls We did not find any solution yet, but it is critical for us. We won't be able to use TFRanking without this feature. We only have 3 layers for the moment which does not seem to be a big number. We will try with one layer just to see, but it is not a viable solution either.

tillwf avatar Aug 27 '21 09:08 tillwf

We think we found a workaround but we're still not sure it's viable: we tried transforming our keras models to the old (tf1 format) frozen model that we then re attach to a savedmodel. It seems to reduce the ram.

take a look at this: https://leimao.github.io/blog/Save-Load-Inference-From-TF2-Frozen-Graph/

and then when you have your concrete function reattach it to a tf.Module.

full_model = tf.function(lambda x: model(x))
full_model = full_model.get_concrete_function(
    tf.TensorSpec(model.inputs[0].shape, model.inputs[0].dtype)) # you should fix that to the correct input shapes

# Get frozen ConcreteFunction
frozen_func = convert_variables_to_constants_v2(full_model)
frozen_func.graph.as_graph_def()

module = tf.Module()
module.func = frozen_func
tf.savedmodel.save()... # must specify signature

again we're still working on the topic so we dont have any long term view of the solution: there might be an issue somewhere...

If you find an issue or have a better idea please tell us !

and some links: https://github.com/search?q=convert_variables_to_constants_v2&type=code

tanguycdls avatar Aug 27 '21 09:08 tanguycdls

Hello @tanguycdls Thank you again for your help. Did you find any proper solution ? Yours does not work for us as it raise another exception.

tillwf avatar Feb 22 '22 09:02 tillwf

Hello @tanguycdls Thank you again for your help. Did you find any proper solution ? Yours does not work for us as it raise another exception.

We still use the solution above this + some Grappler optims fixed the issue for most domains. You're using a model that cannot be freeze ?

tanguycdls avatar Feb 24 '22 16:02 tanguycdls