`tf.vectorized_map` fallback cause
This is the current list of fallback we have on master with the related cause for our KPLs ~(it needs tensorflow/tensorflow#55562 or tf-nightly when merged)~:
pytest -rP keras_cv/layers/preprocessing/ | grep while_loop | sort | uniq
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting AdjustHue cause Input "delta" of op 'AdjustHue' expected to be loop invariant.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting AdjustSaturation cause Input "scale" of op 'AdjustSaturation' expected to be loop invariant.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting DepthwiseConv2dNative cause Input "filter" of op 'DepthwiseConv2dNative' expected to be not loop invariant.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting DynamicPartition cause there is no register converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting DynamicStitch cause there is no register converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting HistogramFixedWidth cause there is no register converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting ImageProjectiveTransformV3 cause there is no register converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting RandomShuffle cause there is no register converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting StridedSlice cause Input "input" of op 'StridedSlice' expected to be not loop invariant.
You can find exactly which test is involed running:
pytest -rP keras_cv/layers/preprocessing/
Some of these could be fixed with a refactor in our code, others need a contribution (trivial or not) for a registered converter (e.g. See https://github.com/keras-team/keras-cv/issues/259#issuecomment-1094140155)
I think that we could care about this:
- monitor how many fallback we are accumulating directly in the CI, with a Github Action, so that we could check this at the PR stage
- contribute the (trivial?) converters that we care about in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/parallel_for/pfor.py#L920-L997
- (extra) extend the same policy to
jit_compilewhen we have our compile API ready
/cc @wangpengmit @rohan100jain
Just for reference I copy-paste the current Pfor heuristic:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/parallel_for/pfor.py#L3372-L3389
The conversion process uses a simple greedy heuristic. It walks the loop body
and tries to express the functionality of running each node in a loop with a
new set of nodes. When converting an op several cases are possible:
- The op is not inside the loop body. Hence it can be used as is.
- The op does not depend on the iteration number and is stateless. In this
case, it can be used as is.
- The op is not stateful, and depends on iteration number only through control
dependencies. In this case, we can create a single op with same inputs and
attributes, but with "converted" control dependencies.
- The op is not stateful, and all its inputs are loop invariant. In this
case, similar to above, we can create a single op with same inputs and
attributes, but with "converted" control dependencies.
- The op is stateful or at least one of the inputs is not loop invariant. In
this case, we run the registered converter for that op to create a set of
converted ops. All nodes in the set will have converted control dependencies
corresponding to control dependencies of the original op. If the op returned
multiple outputs, "converted outputs" could be produced by different ops in
this set.
I want to add an example picking the first fallback in the list that has already a registered converter:
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting AdjustHue cause Input "delta" of op 'AdjustHue' expected to be loop invariant.
This is the converter:
@RegisterPFor("AdjustHue")
def _convert_adjust_hue(pfor_input):
images = pfor_input.stacked_input(0)
delta = pfor_input.unstacked_input(1)
return wrap(gen_image_ops.adjust_hue(images, delta), True)
So we cannot supersed the limit of the original design of tf.image.adjust_hue where it is already vectorized on the batch dimension but not on the delta arg as it is just float value:
def adjust_hue(image, delta, name=None):
....
....
Args:
image: RGB image or images. The size of the last dimension must be 3.
delta: float. How much to add to the hue channel.
IMHO this is going to point again to the original topic related to what kind of empiric evidence we are going to sustain within batch randomization, with the current native ops set we have in TF, vs other alternative solutions like https://github.com/keras-team/keras-cv/pull/146#issuecomment-1048315498 at least as a workaround to not generate a random augmentation for each sample in the batch for each op.
As tf.image.* is also currently MIA/orphan I don't think that we could extend the API and kernels to add the non scalar support of delta, scale etc.. params
Do you have talked about this internally between teams? Any action item?
We (the TF core team) don't monitor keras-team/keras-cv . The issue needs to be reposted to tensorflow/tensorflow to be under our radar.
@wangpengmit I understand but this is really a boundary topic. What kind of ticket do you want for TF (core)? Just for the missing converters?
It can be just a title and a link to this one. TF has a system for alerting TF developers of tensorflow/tensorflow issues.
Ok let see what we can do in TF at https://github.com/tensorflow/tensorflow/issues/55639 and what we need to change in Keras-cv
I update the fallback list on the current master https://github.com/keras-team/keras-cv/commit/b54292159c3af5239a150aede34fa07a4051d35f:
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting AdjustContrastv2 cause Input "contrast_factor" of op 'AdjustContrastv2' expected to be loop invariant.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting AdjustHue cause Input "delta" of op 'AdjustHue' expected to be loop invariant.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting AdjustSaturation cause Input "scale" of op 'AdjustSaturation' expected to be loop invariant.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting DepthwiseConv2dNative cause Input "filter" of op 'DepthwiseConv2dNative' expected to be not loop invariant.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting DynamicPartition cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting DynamicStitch cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting HistogramFixedWidth cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting RandomShuffle cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting StridedSlice cause Input "input" of op 'StridedSlice' expected to be not loop invariant.
@qlzh727 Just an extra note, as you have already seen in https://github.com/tensorflow/tensorflow/issues/56242
We need to extend this analysis also to the Keras repo for: https://github.com/keras-team/keras/blob/master/keras/layers/preprocessing/image_preprocessing.py
Ack. Will take a look later this week.
@qlzh727 Just an extra note, as you have already seen in tensorflow/tensorflow#56242
We need to extend this analysis also to the Keras repo for: https://github.com/keras-team/keras/blob/master/keras/layers/preprocessing/image_preprocessing.py
We could write a script to iterate the symbols in the keras_cv.layers.preprocessing namespace. That should cover all KPLs from keras core and keras cv as we alias the core ones there
We could write a script to iterate the symbols in the keras_cv.layers.preprocessing namespace. That should cover all KPLs from keras core and keras cv as we alias the core ones there
Yes, I used some quick and dirty bash command locally to produce these "reports". A new CI Action it would be ideal so that we have also a detector for new introduced fallbacks in candidate PRs.
(extra) extend the same policy to jit_compile when we have https://github.com/keras-team/keras-cv/issues/165#issuecomment-1083502165
This also would be nice to have later when we have the mentioned API in Keras master.
CCing @ishark .
Does this problem still exist?
Does this problem still exist?
Yes, this is the updated list today only for the preprocessing layers
pytest -rP keras_cv/layers/preprocessing/ | grep while_loop | sort | uniq
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting AdjustContrastv2
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting AdjustHue
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting AdjustSaturation
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting Bitcast
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting CropAndResize
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting DecodeJpeg
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting DepthwiseConv2dNative
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting EncodeJpegVariableQuality
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting HistogramFixedWidth
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting ImageProjectiveTransformV3
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting RandomShuffle
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting RngReadAndSkip
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting StatelessRandomGetKeyCounter
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting StatelessRandomUniformFullIntV2
WARNING tensorflow:pfor.py:1081 Using a while_loop for converting StatelessRandomUniformV2
@bhack Thank you for the quick reply! Saved me a lot of container-rebuilding time.
Also: :(
@wangpengmit Can I ask you why you have mentioned @ishark 1 month ago? Is there a new API ownership plan?
Hi all,
Thank you for your work on this issue. I was wondering if there is a recommended workaround (besides using TF 2.8), or if there is an estimate for when this may be fixed for the most commonly used augmentation layers. While my augmentation pipeline doesn't directly rely on any of the items listed in https://github.com/keras-team/keras-cv/issues/291#issuecomment-1230707124, some of the tf.keras.layers.RandomX layers rely on Bitcast, StatelessRandomUniformV2, RngReadAndSkip, and ImageProjectiveTransformV3. I'd be willing to help contribute a fix, but I'm not familiar with this part of the codebase or what the vectorization process typically looks like.
I've tried both TF Nightly and TF 2.9, and my time per epoch increases from 21 seconds on TF 2.8 to 9 minutes, a 25x slowdown. Removing all augmentations from my model restores the expected training time. This is a similar issue to https://github.com/keras-team/keras-cv/issues/581, though perhaps worse since I am using a few different augmentations.
If this is going to take a while to fix, would it make sense to roll back https://github.com/keras-team/keras/commit/dbc4526978865b372e794accbaa7c584f9c86a0f in the meantime? I don't see an indication in the documentation that these layers are effectively unusable at the moment if you want acceptable training speed, and it looks like nightly has already been broken for a while. Thank you for your time.
While my augmentation pipeline doesn't directly rely on any of the items listed in https://github.com/keras-team/keras-cv/issues/291#issuecomment-1230707124, some of the tf.keras.layers.RandomX layers rely on Bitcast, StatelessRandomUniformV2, RngReadAndSkip, and ImageProjectiveTransformV3
My list is just covering the "failing" TF ops that was used in the Keras-cv preprocessing layers implementation so it is not a list of the involved layers.
I'd be willing to help contribute a fix, but I'm not familiar with this part of the codebase or what the vectorization process typically looks like.
As we have a list of open points on this topic to clarify since the early days of this repo it is not so easy to identify a clear action/item contribution path.
Please check my list at: https://github.com/keras-team/keras-cv/issues/581#issuecomment-1184758866
@wangpengmit Can I ask you why you have mentioned @ishark 1 month ago? Is there a new API ownership plan?
Yeah, we still lack a primary owner of tf.vectorized_map, but @ishark is the secondary owner of it now.
I've tried both TF Nightly and TF 2.9, and my time per epoch increases from 21 seconds on TF 2.8 to 9 minutes, a 25x slowdown. Removing all augmentations from my model restores the expected training time. This is a similar issue to #581, though perhaps worse since I am using a few different augmentations.
Can you post which layers youre using?
Hi Luke,
Here is my augmentation code. I am using augment_inputs() to wrap multiple layers within my model, and model.fit() is being fed with a tf.data.DataSet. Please let me know if there is anything else I can provide.
def augment_inputs(inputs):
inputs = tf.keras.layers.RandomZoom(height_factor=(-0.20, 0))(inputs)
inputs = tf.keras.layers.RandomFlip()(inputs)
inputs = tf.keras.layers.RandomRotation(factor=0.1)(inputs)
inputs = tf.keras.layers.RandomTranslation(height_factor=0.1,
width_factor=0.1)(inputs)
return inputs
def build_model():
## Input 1
old_inputs = layers.Input(shape=IMAGE_SIZE, name='input_oldest')
# First input for 'data_a'
inputs = scaler(old_inputs)
inputs = augment_inputs(inputs)
# .... rest of the model building code below
Hi Luke,
Here is my augmentation code. I am using augment_inputs() to wrap multiple layers within my model, and model.fit() is being fed with a tf.data.DataSet. Please let me know if there is anything else I can provide.
def augment_inputs(inputs): inputs = tf.keras.layers.RandomZoom(height_factor=(-0.20, 0))(inputs) inputs = tf.keras.layers.RandomFlip()(inputs) inputs = tf.keras.layers.RandomRotation(factor=0.1)(inputs) inputs = tf.keras.layers.RandomTranslation(height_factor=0.1, width_factor=0.1)(inputs) return inputs def build_model(): ## Input 1 old_inputs = layers.Input(shape=IMAGE_SIZE, name='input_oldest') # First input for 'data_a' inputs = scaler(old_inputs) inputs = augment_inputs(inputs) # .... rest of the model building code below
I think I see the problem, you’re creating a new layer every function call. So you’ll retrace every single call - whcih is expensive. Create layers outside the function
Hello, my apologies, I missed github notifications. Will go through the issue history and get back to you. Thanks.
@LukeWood Thank you for the recommendation. I rewrote my augmentation following your recommendation to create the layers outside of the function, as shown below, but I am still seeing the 9 minute epochs, versus the 21 second epochs I expect.
def build_model():
backbone = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
weights='imagenet', input_shape=IMAGE_SIZE, include_top=False)
backbone.trainable = False
## Input 1
old_inputs = layers.Input(shape=IMAGE_SIZE, name='input_oldest')
# First input for 'data_a'
inputs = scaler(old_inputs)
# inputs = augment_inputs(inputs)
inputs = tf.keras.layers.RandomZoom(height_factor=(-0.20, 0))(inputs)
inputs = tf.keras.layers.RandomFlip()(inputs)
inputs = tf.keras.layers.RandomRotation(factor=0.1)(inputs)
inputs = tf.keras.layers.RandomTranslation(height_factor=0.1,
width_factor=0.1)(inputs)
oldest_outputs = tf.keras.layers.GlobalAveragePooling2D()(backbone(inputs, training=False))
#... rest of the model
return model, backbone
model, backbone = build_model()
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=METRICS)
model.fit(train,
validation_data=validation,
epochs=50
If you, like me, is still facing this issue, my "workaround" is:
pip uninstall keras -y
pip install keras==2.8.0
Hey @Mirandatz - we rolled back changes to the KPLs in the most recent keras release, and we'll be fixing them forwards in KerasCV! Sorry for any inconvenience.