tensorflow icon indicating copy to clipboard operation
tensorflow copied to clipboard

How to "tf.data.experimental.AutoShardPolicy.OFF" if tf.data is not used? But gets this message while using tf.keras.model.fit.

Open rrklearn2020 opened this issue 4 years ago • 6 comments
trafficstars

System information

  • TensorFlow version (you are using):2.5

Describe the feature and the current behavior/state. How to "tf.data.experimental.AutoShardPolicy.OFF" if tf.data is not used? But gets this message while using tf.keras.model.fit.

rrklearn2020 avatar Jun 10 '21 17:06 rrklearn2020

@rrklearn2020 , Can you please elaborate about your Feature. Also, please specify the Use Cases for this feature. Thanks!

tilakrayal avatar Jun 11 '21 08:06 tilakrayal

Its difficult to use TF.data.Dataset for relational database with Camera, PointCloud, Map and other data/information. On single system with multiple GPU's, its better to use distributed training with MirroredStrategy (Tensorflow). It would be great if the 'AutoShardpolicy' can be set to OFF in such use cases, either in the model.fit (TF.KERAS) statement or any other ways.

Please inform the effect of having 'AutoShardpolicy' set to OFF on single machine application, since 'Sharding is a method for distributing data across multiple machines'.

rrklearn2020 avatar Jun 11 '21 23:06 rrklearn2020

Keras may want to make autosharding configurable for non-dataset inputs. Assigning to @fchollet for triage

aaudiber avatar Jun 14 '21 22:06 aaudiber

@fchollet and @aaudiber , It would be a great help, if you could help to address this concern of 'AutoShardPolicy policy' .

By having the 'AutoShardPolicy policy' training takes long time in searching for tf.data.Dataset, for the training where tf.data.Dataset is not used. This consumes lots of time during each epoch.

By setting 'AutoShardPolicy policy' as OFF, reduces the time taken for training, with single machine with multiple GPU's.

rrklearn2020 avatar Jun 21 '21 15:06 rrklearn2020

Thanks for providing your insight, I think the issue which you are referring is in the code here https://github.com/keras-team/keras/blob/v2.11.0/keras/engine/training.py#L2303, where tf.data.experimental.AutoShardPolicy is set to DATA.

sachinprasadhs avatar Feb 22 '23 21:02 sachinprasadhs

As per the Keras training process, each type of data will be converted to tf.data format internally, even if you provide non tf.data to the model, internally it converts it to tf.data format and feeds it to the model. So according to this behavior, having tf.data.experimental.AutoShardPolicy.OFF does not fit into the existing logic. Thanks!

sachinprasadhs avatar Feb 23 '23 19:02 sachinprasadhs

Closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

sachinprasadhs avatar Mar 25 '23 02:03 sachinprasadhs