ipex-llm
ipex-llm copied to clipboard
[Nano] How-To Guides: Training - TensorFlow
Description
1. Why the change?
Add Nano how-to guides to our documentation for better user experience. How-to guides aim at giving multiple bite-sized, task-oriented, and executable examples for user to check when they need to do similar tasks.
2. Summary of the change
In this PR, guides for Training - TensorFlow Keras is covered. The guides source files are located under folder BigDL/python/nano/tutorial/notebook/training/tensorflow.
- [x] How to accelerate a TensorFlow Keras application on training workloads through multiple instances (based on this)
- [x] How to optimize your model with a sparse Embedding layer and SparseAdam optimizer (based on this)
And non-runnable guide:
3. How to test?
- [x] Document test: https://yuwentestdocs.readthedocs.io/en/nano-howto-training-tf/doc/Nano/Howto/index.html#tensorflow
- [x] GitHub Notebook Preview
- How to accelerate a TensorFlow Keras application on training workloads through multiple instances
- How to optimize your model with a sparse Embedding layer and SparseAdam optimizer
- [x] Notebook test on github action
Notebook test locally (conda create an empty environment with python=3.7)
- [x] How to accelerate a TensorFlow Keras application on training workloads through multiple instances
- [x] How to optimize your model with a sparse Embedding layer and SparseAdam optimizer
When importing Model
from bigdl.nano.tf.keras
after creating datasets etc., error Inter op parallelism cannot be modified after initialization
may occur (for example in the codes here):
@TheaperDeng Is this something we should fix? Or is this a limitation of
bigdl.nano.tf.keras.Model
we should mention in the how-to guide?
When importing
Model
frombigdl.nano.tf.keras
after creating datasets etc., errorInter op parallelism cannot be modified after initialization
may occur (for example in the codes here):@TheaperDeng Is this something we should fix? Or is this a limitation of
bigdl.nano.tf.keras.Model
we should mention in the how-to guide?
I think we have met similar issue when we write tf based chronos model right? how did we resolved it? @liangs6212
When importing
Model
frombigdl.nano.tf.keras
after creating datasets etc., errorInter op parallelism cannot be modified after initialization
may occur (for example in the codes here):@TheaperDeng Is this something we should fix? Or is this a limitation of
bigdl.nano.tf.keras.Model
we should mention in the how-to guide?I think we have met similar issue when we write tf based chronos model right? how did we resolved it? @liangs6212
I think it can be solved by modifying define_model_inputs_outputs
to define_model
.
like this:
def define_model(...):
inputs = tf.keras.layers.Input(shape=(img_size, img_size, 3))
x = tf.cast(inputs, tf.float32)
x = tf.keras.applications.resnet50.preprocess_input(x)
backbone = ResNet50(weights='imagenet')
backbone.trainable = False
x = backbone(x)
x = layers.Dense(512, activation='relu')(x)
outputs = layers.Dense(num_classes, activation='softmax')(x)
model = Model(inputs=inputs, outputs=outputs)
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
return model
I think it can be solved by modifying
define_model_inputs_outputs
todefine_model
. like this:def define_model(...): inputs = tf.keras.layers.Input(shape=(img_size, img_size, 3)) x = tf.cast(inputs, tf.float32) x = tf.keras.applications.resnet50.preprocess_input(x) backbone = ResNet50(weights='imagenet') backbone.trainable = False x = backbone(x) x = layers.Dense(512, activation='relu')(x) outputs = layers.Dense(num_classes, activation='softmax')(x) model = Model(inputs=inputs, outputs=outputs) model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy']) return model
It seems that this cannot solve the problem. The problem occurred when importing Model
from bigdl.nano.tf.keras
after we have importing tensorflow
for creating datasets, like this:
import tensorflow as tf
import tensorflow_datasets as tfds
def create_datasets(img_size, batch_size):
(train_ds, test_ds), info = tfds.load('imagenette/320px-v2',
data_dir='/tmp/data',
split=['train', 'validation'],
with_info=True,
as_supervised=True)
num_classes = info.features['label'].num_classes
def preprocessing(img, label):
return tf.image.resize(img, (img_size, img_size)), \
tf.one_hot(label, num_classes)
train_ds = train_ds.repeat().map(preprocessing).batch(batch_size)
test_ds = test_ds.map(preprocessing).batch(batch_size)
return train_ds, test_ds, info
train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)
from bigdl.nano.tf.keras import Model # <= error occurs here
I currently implemented a solution to this. That is importing Model
from bigdl.nano.tf.keras
at the beginning of the code snippet like this:
import tensorflow as tf
import tensorflow_datasets as tfds
from bigdl.nano.tf.keras import Model # <= import here
def create_datasets(img_size, batch_size):
(train_ds, test_ds), info = tfds.load('imagenette/320px-v2',
data_dir='/tmp/data',
split=['train', 'validation'],
with_info=True,
as_supervised=True)
num_classes = info.features['label'].num_classes
def preprocessing(img, label):
return tf.image.resize(img, (img_size, img_size)), \
tf.one_hot(label, num_classes)
train_ds = train_ds.repeat().map(preprocessing).batch(batch_size)
test_ds = test_ds.map(preprocessing).batch(batch_size)
return train_ds, test_ds, info
train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)
What I am confused here is: Do we need to remind user of this import order to avoid error? Or is this error something we should fix for bigdl.nano.tf.keras.Model
? @TheaperDeng
I think it can be solved by modifying
define_model_inputs_outputs
todefine_model
. like this:def define_model(...): inputs = tf.keras.layers.Input(shape=(img_size, img_size, 3)) x = tf.cast(inputs, tf.float32) x = tf.keras.applications.resnet50.preprocess_input(x) backbone = ResNet50(weights='imagenet') backbone.trainable = False x = backbone(x) x = layers.Dense(512, activation='relu')(x) outputs = layers.Dense(num_classes, activation='softmax')(x) model = Model(inputs=inputs, outputs=outputs) model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy']) return model
It seems that this cannot solve the problem. The problem occurred when importing
Model
frombigdl.nano.tf.keras
after we have importingtensorflow
for creating datasets, like this:import tensorflow as tf import tensorflow_datasets as tfds def create_datasets(img_size, batch_size): (train_ds, test_ds), info = tfds.load('imagenette/320px-v2', data_dir='/tmp/data', split=['train', 'validation'], with_info=True, as_supervised=True) num_classes = info.features['label'].num_classes def preprocessing(img, label): return tf.image.resize(img, (img_size, img_size)), \ tf.one_hot(label, num_classes) train_ds = train_ds.repeat().map(preprocessing).batch(batch_size) test_ds = test_ds.map(preprocessing).batch(batch_size) return train_ds, test_ds, info train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32) from bigdl.nano.tf.keras import Model # <= error occurs here
I currently implemented a solution to this. That is importing
Model
frombigdl.nano.tf.keras
at the beginning of the code snippet like this:import tensorflow as tf import tensorflow_datasets as tfds from bigdl.nano.tf.keras import Model # <= import here def create_datasets(img_size, batch_size): (train_ds, test_ds), info = tfds.load('imagenette/320px-v2', data_dir='/tmp/data', split=['train', 'validation'], with_info=True, as_supervised=True) num_classes = info.features['label'].num_classes def preprocessing(img, label): return tf.image.resize(img, (img_size, img_size)), \ tf.one_hot(label, num_classes) train_ds = train_ds.repeat().map(preprocessing).batch(batch_size) test_ds = test_ds.map(preprocessing).batch(batch_size) return train_ds, test_ds, info train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)
What I am confused here is: Do we need to remind user of this import order to avoid error? Or is this error something we should fix for
bigdl.nano.tf.keras.Model
? @TheaperDeng
I think we set up Inter op parallelism when importing nano, which must be done before user importing any tensorflow code. I think we may need to introduce something like patch_sklearn()
in sk-learn extension? @TheaperDeng @yangw1234
I think it can be solved by modifying
define_model_inputs_outputs
todefine_model
. like this:def define_model(...): inputs = tf.keras.layers.Input(shape=(img_size, img_size, 3)) x = tf.cast(inputs, tf.float32) x = tf.keras.applications.resnet50.preprocess_input(x) backbone = ResNet50(weights='imagenet') backbone.trainable = False x = backbone(x) x = layers.Dense(512, activation='relu')(x) outputs = layers.Dense(num_classes, activation='softmax')(x) model = Model(inputs=inputs, outputs=outputs) model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy']) return model
It seems that this cannot solve the problem. The problem occurred when importing
Model
frombigdl.nano.tf.keras
after we have importingtensorflow
for creating datasets, like this:import tensorflow as tf import tensorflow_datasets as tfds def create_datasets(img_size, batch_size): (train_ds, test_ds), info = tfds.load('imagenette/320px-v2', data_dir='/tmp/data', split=['train', 'validation'], with_info=True, as_supervised=True) num_classes = info.features['label'].num_classes def preprocessing(img, label): return tf.image.resize(img, (img_size, img_size)), \ tf.one_hot(label, num_classes) train_ds = train_ds.repeat().map(preprocessing).batch(batch_size) test_ds = test_ds.map(preprocessing).batch(batch_size) return train_ds, test_ds, info train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32) from bigdl.nano.tf.keras import Model # <= error occurs here
I currently implemented a solution to this. That is importing
Model
frombigdl.nano.tf.keras
at the beginning of the code snippet like this:import tensorflow as tf import tensorflow_datasets as tfds from bigdl.nano.tf.keras import Model # <= import here def create_datasets(img_size, batch_size): (train_ds, test_ds), info = tfds.load('imagenette/320px-v2', data_dir='/tmp/data', split=['train', 'validation'], with_info=True, as_supervised=True) num_classes = info.features['label'].num_classes def preprocessing(img, label): return tf.image.resize(img, (img_size, img_size)), \ tf.one_hot(label, num_classes) train_ds = train_ds.repeat().map(preprocessing).batch(batch_size) test_ds = test_ds.map(preprocessing).batch(batch_size) return train_ds, test_ds, info train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)
What I am confused here is: Do we need to remind user of this import order to avoid error? Or is this error something we should fix for
bigdl.nano.tf.keras.Model
? @TheaperDengI think we set up Inter op parallelism when importing nano, which must be done before user importing any tensorflow code. I think we may need to introduce something like
patch_sklearn()
in sk-learn extension? @TheaperDeng @yangw1234
This looks like a tensorflow problem.
The following code does not work either.
You cannot set threads after running tfds.load
. Not sure how to work around it yet.
import tensorflow as tf
import tensorflow_datasets as tfds
(train_ds, test_ds), info = tfds.load('mnist',
data_dir='/tmp/data',
split=['train', 'test'],
with_info=True,
as_supervised=True)
print(tf.config.threading.get_inter_op_parallelism_threads()) # returns 0
tf.config.threading.set_inter_op_parallelism_threads(1) # error