keras icon indicating copy to clipboard operation
keras copied to clipboard

Kernel initialization method, "Orthogonal" and "Identity", do not support "half datatype when using Convolution layers

Open maybeLee opened this issue 3 years ago • 2 comments

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.8.2, 2.9.0
  • Python version: 3.7
  • Bazel version (if compiling from source):
  • GPU model and memory:
  • Exact command to reproduce:

Describe the problem.

When I use some convolution layers with "orthogonal" or "identity" kernel initializations, those convolution layers cannot be successfully built when I set the datatype to be "half". However, other kernel initialization methods support "half" datatype. Therefore, it would be much helpful if you could fix this issue so convolutional layers can use both orthogonal kernel initialization and "half" datatype at the same time.

InvalidArgumentError: Value for attr 'T' of bfloat16 is not in the list of allowed values: double, float, half, complex64, complex128
	; NodeDef: {{node Qr}}; Op<name=Qr; signature=input:T -> q:T, r:T; attr=full_matrices:bool,default=false; attr=T:type,allowed=[DT_DOUBLE, DT_FLOAT, DT_HALF, DT_COMPLEX64, DT_COMPLEX128]> [Op:Qr]

Describe the current behavior. For six convolutional layers: Conv[1-3]D, Conv[1-3]Transpose, I iterated all possible kernel_initialization options: ["Zeros", "Ones", "Constant", "RandomNormal", "RandomUniform", "TruncatedNormal", "VarianceScaling", "Orthogonal", "lecun_uniform", "glorot_normal", "glorot_uniform", "he_normal", "lecun_normal", "Identity"] with "half" datatype. I found out that two initialization methods, orthogonal and identity, do not support "half" datatype. However, I assume they should support "half" datatype.

Describe the expected behavior. It would be much better if they can support the "half" datatype so I can let convolutional layers use both "half" datatype and these initialization methods at the same time.

Contributing.

  • Do you want to contribute a PR? (yes/no): please let me know if I can help
  • If yes, please read this page for instructions
  • Briefly describe your candidate solution(if contributing):

Standalone code to reproduce the issue. To reproduce all layers' result, please visit this link: https://colab.research.google.com/drive/1vZFTa_9ig1BI0bQ0VsD6DDq3_TbbGXqv?usp=sharing

Here I provide a simple code snippet to help you understand the issue:

# Show Error Message
import keras
kernel_initializer = "Orthogonal"
dtype="half"
x = keras.layers.Input((1,5), dtype=dtype)
y = keras.layers.Conv1D(filters=3, kernel_size=[2], 
                        padding="same", strides=[1], 
                        kernel_initializer = kernel_initializer, 
                        dtype=dtype)(x)

model = keras.models.Model(x,y)

The error message is:

NotFoundError: Could not find device for node: {{node Qr}} = Qr[T=DT_HALF, full_matrices=false]
All kernels registered for op Qr:
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_COMPLEX64]
  device='CPU'; T in [DT_COMPLEX128]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_COMPLEX128]
 [Op:Qr]

maybeLee avatar Aug 31 '22 18:08 maybeLee

@gowthamkpr I was able to replicate the issue on colab, please find the gist here for reference. Thank you!

sushreebarsa avatar Sep 03 '22 04:09 sushreebarsa

The suggest workaround on Keras side is to force the initializer to be float32, and cast the return back to layers.dtype.

qlzh727 avatar Oct 06 '22 17:10 qlzh727