models icon indicating copy to clipboard operation
models copied to clipboard

Unable to use "mixed_float16" in Object detect API

Open tq3940 opened this issue 8 months ago • 3 comments

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [√] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
  • [√] I am reporting the issue to the correct repository. (Model Garden official or research directory)
  • [√] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/object_detection

2. Describe the bug

I'm trying to use "mixed_float16" to speed up my training on RTX 4090. Following the guide of official document of mixed_precision , I add the code: mixed_precision.set_global_policy('mixed_float16') in front of tf.compat.v1.app.run() in my train_tf2.py. However, the tensorflow reborted the following error:

        return _compute_losses_and_predictions_dicts(model, features, labels,
    File "/root/miniconda3/lib/python3.8/site-packages/object_detection/model_lib_v2.py", line 130, in _compute_losses_and_predictions_dicts  *
        losses_dict = model.loss(
    File "/root/miniconda3/lib/python3.8/site-packages/object_detection/meta_architectures/center_net_meta_arch.py", line 3967, in loss  *
        object_center_loss = self._compute_object_center_loss(
    File "/root/miniconda3/lib/python3.8/site-packages/object_detection/meta_architectures/center_net_meta_arch.py", line 3099, in _compute_object_center_loss  *
        loss += object_center_loss(
    File "/root/miniconda3/lib/python3.8/site-packages/object_detection/core/losses.py", line 94, in __call__  *
        return self._compute_loss(prediction_tensor, target_tensor, **params)
    File "/root/miniconda3/lib/python3.8/site-packages/object_detection/core/losses.py", line 855, in _compute_loss  *
        negative_loss = (tf.math.pow((1 - target_tensor), self._beta)*

    TypeError: Input 'y' of 'Mul' Op has type float16 that does not match type float32 of argument 'x'.

I also tried to add this code: tf.compat.v2.keras.mixed_precision.set_global_policy('mixed_float16') , which I modified on the basis of tf.compat.v2.keras.mixed_precision.set_global_policy('mixed_bfloat16') found in the file model_lib_v2.py

or add Environment variables by os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1', which was suggesd in this answer

But all of my attempt have failed with above error. I want to know how to solve this issue.

3. Steps to reproduce

add the code: mixed_precision.set_global_policy('mixed_float16') in front of tf.compat.v1.app.run() in my train_tf2.py.

4. Expected behavior

The model can be trained in "mixed precision" mode.

5. Additional context

None

6. System information

  • OS Platform and Distribution : Linux Ubuntu 22.04
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): 2.13.1
  • Python version: 3.8.10
  • CUDA/cuDNN version: 12.2(cuda) / 8.6.0.163(cudnn)
  • GPU model and memory: NVIDIA GeForce RTX 4090 24G

tq3940 avatar Jun 02 '24 07:06 tq3940