ipex-llm [Nano] How-To Guides: Training

[Nano] How-To Guides: Training - TensorFlow

Open Oscilloscope98 opened this issue 1 year ago • 6 comments

Description

1. Why the change?

Add Nano how-to guides to our documentation for better user experience. How-to guides aim at giving multiple bite-sized, task-oriented, and executable examples for user to check when they need to do similar tasks.

2. Summary of the change

In this PR, guides for Training - TensorFlow Keras is covered. The guides source files are located under folder BigDL/python/nano/tutorial/notebook/training/tensorflow.

[x] How to accelerate a TensorFlow Keras application on training workloads through multiple instances (based on this)
[x] How to optimize your model with a sparse Embedding layer and SparseAdam optimizer (based on this)

And non-runnable guide:

[x] How to choose the number of processes for multi-instance training

3. How to test?

[x] Document test: https://yuwentestdocs.readthedocs.io/en/nano-howto-training-tf/doc/Nano/Howto/index.html#tensorflow
[x] GitHub Notebook Preview
How to accelerate a TensorFlow Keras application on training workloads through multiple instances
How to optimize your model with a sparse Embedding layer and SparseAdam optimizer
[x] Notebook test on github action

Notebook test locally (conda create an empty environment with python=3.7)

[x] How to accelerate a TensorFlow Keras application on training workloads through multiple instances
[x] How to optimize your model with a sparse Embedding layer and SparseAdam optimizer

Sep 19 '22 10:09 Oscilloscope98

When importing Model from bigdl.nano.tf.keras after creating datasets etc., error Inter op parallelism cannot be modified after initialization may occur (for example in the codes here): MicrosoftTeams-image @TheaperDeng Is this something we should fix? Or is this a limitation of bigdl.nano.tf.keras.Model we should mention in the how-to guide?

Sep 20 '22 01:09 Oscilloscope98

When importing Model from bigdl.nano.tf.keras after creating datasets etc., error Inter op parallelism cannot be modified after initialization may occur (for example in the codes here): @TheaperDeng Is this something we should fix? Or is this a limitation of bigdl.nano.tf.keras.Model we should mention in the how-to guide?

I think we have met similar issue when we write tf based chronos model right? how did we resolved it? @liangs6212

Sep 20 '22 01:09 TheaperDeng

When importing Model from bigdl.nano.tf.keras after creating datasets etc., error Inter op parallelism cannot be modified after initialization may occur (for example in the codes here): @TheaperDeng Is this something we should fix? Or is this a limitation of bigdl.nano.tf.keras.Model we should mention in the how-to guide?

I think we have met similar issue when we write tf based chronos model right? how did we resolved it? @liangs6212

I think it can be solved by modifying define_model_inputs_outputs to define_model. like this:

def define_model(...):
    inputs = tf.keras.layers.Input(shape=(img_size, img_size, 3))
    x = tf.cast(inputs, tf.float32)
    x = tf.keras.applications.resnet50.preprocess_input(x)
    backbone = ResNet50(weights='imagenet')
    backbone.trainable = False
    x = backbone(x)
    x = layers.Dense(512, activation='relu')(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)
    model = Model(inputs=inputs, outputs=outputs)
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
    return model

Sep 20 '22 06:09 liangs6212

I think it can be solved by modifying define_model_inputs_outputs to define_model. like this:

def define_model(...):
    inputs = tf.keras.layers.Input(shape=(img_size, img_size, 3))
    x = tf.cast(inputs, tf.float32)
    x = tf.keras.applications.resnet50.preprocess_input(x)
    backbone = ResNet50(weights='imagenet')
    backbone.trainable = False
    x = backbone(x)
    x = layers.Dense(512, activation='relu')(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)
    model = Model(inputs=inputs, outputs=outputs)
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
    return model

It seems that this cannot solve the problem. The problem occurred when importing Model from bigdl.nano.tf.keras after we have importing tensorflow for creating datasets, like this:

import tensorflow as tf
import tensorflow_datasets as tfds

def create_datasets(img_size, batch_size):
    (train_ds, test_ds), info = tfds.load('imagenette/320px-v2',
                                          data_dir='/tmp/data',
                                          split=['train', 'validation'],
                                          with_info=True,
                                          as_supervised=True)
    
    num_classes = info.features['label'].num_classes
    
    def preprocessing(img, label):
        return tf.image.resize(img, (img_size, img_size)), \
               tf.one_hot(label, num_classes)

    train_ds = train_ds.repeat().map(preprocessing).batch(batch_size)
    test_ds = test_ds.map(preprocessing).batch(batch_size)
    return train_ds, test_ds, info

train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)

from bigdl.nano.tf.keras import Model # <= error occurs here

I currently implemented a solution to this. That is importing Model from bigdl.nano.tf.keras at the beginning of the code snippet like this:

import tensorflow as tf
import tensorflow_datasets as tfds

from bigdl.nano.tf.keras import Model # <= import here

def create_datasets(img_size, batch_size):
    (train_ds, test_ds), info = tfds.load('imagenette/320px-v2',
                                          data_dir='/tmp/data',
                                          split=['train', 'validation'],
                                          with_info=True,
                                          as_supervised=True)
    
    num_classes = info.features['label'].num_classes
    
    def preprocessing(img, label):
        return tf.image.resize(img, (img_size, img_size)), \
               tf.one_hot(label, num_classes)

    train_ds = train_ds.repeat().map(preprocessing).batch(batch_size)
    test_ds = test_ds.map(preprocessing).batch(batch_size)
    return train_ds, test_ds, info

train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)

What I am confused here is: Do we need to remind user of this import order to avoid error? Or is this error something we should fix for bigdl.nano.tf.keras.Model? @TheaperDeng

Sep 20 '22 06:09 Oscilloscope98

I think it can be solved by modifying define_model_inputs_outputs to define_model. like this:

def define_model(...):
    inputs = tf.keras.layers.Input(shape=(img_size, img_size, 3))
    x = tf.cast(inputs, tf.float32)
    x = tf.keras.applications.resnet50.preprocess_input(x)
    backbone = ResNet50(weights='imagenet')
    backbone.trainable = False
    x = backbone(x)
    x = layers.Dense(512, activation='relu')(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)
    model = Model(inputs=inputs, outputs=outputs)
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
    return model

It seems that this cannot solve the problem. The problem occurred when importing Model from bigdl.nano.tf.keras after we have importing tensorflow for creating datasets, like this:

import tensorflow as tf
import tensorflow_datasets as tfds

def create_datasets(img_size, batch_size):
    (train_ds, test_ds), info = tfds.load('imagenette/320px-v2',
                                          data_dir='/tmp/data',
                                          split=['train', 'validation'],
                                          with_info=True,
                                          as_supervised=True)
    
    num_classes = info.features['label'].num_classes
    
    def preprocessing(img, label):
        return tf.image.resize(img, (img_size, img_size)), \
               tf.one_hot(label, num_classes)

    train_ds = train_ds.repeat().map(preprocessing).batch(batch_size)
    test_ds = test_ds.map(preprocessing).batch(batch_size)
    return train_ds, test_ds, info

train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)

from bigdl.nano.tf.keras import Model # <= error occurs here

I currently implemented a solution to this. That is importing Model from bigdl.nano.tf.keras at the beginning of the code snippet like this:

import tensorflow as tf
import tensorflow_datasets as tfds

from bigdl.nano.tf.keras import Model # <= import here

def create_datasets(img_size, batch_size):
    (train_ds, test_ds), info = tfds.load('imagenette/320px-v2',
                                          data_dir='/tmp/data',
                                          split=['train', 'validation'],
                                          with_info=True,
                                          as_supervised=True)
    
    num_classes = info.features['label'].num_classes
    
    def preprocessing(img, label):
        return tf.image.resize(img, (img_size, img_size)), \
               tf.one_hot(label, num_classes)

    train_ds = train_ds.repeat().map(preprocessing).batch(batch_size)
    test_ds = test_ds.map(preprocessing).batch(batch_size)
    return train_ds, test_ds, info

train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)

What I am confused here is: Do we need to remind user of this import order to avoid error? Or is this error something we should fix for bigdl.nano.tf.keras.Model? @TheaperDeng

I think we set up Inter op parallelism when importing nano, which must be done before user importing any tensorflow code. I think we may need to introduce something like patch_sklearn() in sk-learn extension? @TheaperDeng @yangw1234

Sep 20 '22 06:09 jason-dai

I think it can be solved by modifying define_model_inputs_outputs to define_model. like this:

def define_model(...):
    inputs = tf.keras.layers.Input(shape=(img_size, img_size, 3))
    x = tf.cast(inputs, tf.float32)
    x = tf.keras.applications.resnet50.preprocess_input(x)
    backbone = ResNet50(weights='imagenet')
    backbone.trainable = False
    x = backbone(x)
    x = layers.Dense(512, activation='relu')(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)
    model = Model(inputs=inputs, outputs=outputs)
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
    return model

It seems that this cannot solve the problem. The problem occurred when importing Model from bigdl.nano.tf.keras after we have importing tensorflow for creating datasets, like this:

import tensorflow as tf
import tensorflow_datasets as tfds

def create_datasets(img_size, batch_size):
    (train_ds, test_ds), info = tfds.load('imagenette/320px-v2',
                                          data_dir='/tmp/data',
                                          split=['train', 'validation'],
                                          with_info=True,
                                          as_supervised=True)
    
    num_classes = info.features['label'].num_classes
    
    def preprocessing(img, label):
        return tf.image.resize(img, (img_size, img_size)), \
               tf.one_hot(label, num_classes)

    train_ds = train_ds.repeat().map(preprocessing).batch(batch_size)
    test_ds = test_ds.map(preprocessing).batch(batch_size)
    return train_ds, test_ds, info

train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)

from bigdl.nano.tf.keras import Model # <= error occurs here

I currently implemented a solution to this. That is importing Model from bigdl.nano.tf.keras at the beginning of the code snippet like this:

import tensorflow as tf
import tensorflow_datasets as tfds

from bigdl.nano.tf.keras import Model # <= import here

def create_datasets(img_size, batch_size):
    (train_ds, test_ds), info = tfds.load('imagenette/320px-v2',
                                          data_dir='/tmp/data',
                                          split=['train', 'validation'],
                                          with_info=True,
                                          as_supervised=True)
    
    num_classes = info.features['label'].num_classes
    
    def preprocessing(img, label):
        return tf.image.resize(img, (img_size, img_size)), \
               tf.one_hot(label, num_classes)

    train_ds = train_ds.repeat().map(preprocessing).batch(batch_size)
    test_ds = test_ds.map(preprocessing).batch(batch_size)
    return train_ds, test_ds, info

train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)

What I am confused here is: Do we need to remind user of this import order to avoid error? Or is this error something we should fix for bigdl.nano.tf.keras.Model? @TheaperDeng

This looks like a tensorflow problem. The following code does not work either. You cannot set threads after running tfds.load. Not sure how to work around it yet.

import tensorflow as tf
import tensorflow_datasets as tfds

(train_ds, test_ds), info = tfds.load('mnist',
                                      data_dir='/tmp/data',
                                      split=['train', 'test'],
                                      with_info=True,
                                      as_supervised=True)

print(tf.config.threading.get_inter_op_parallelism_threads()) # returns 0
tf.config.threading.set_inter_op_parallelism_threads(1) # error

Sep 21 '22 20:09 yangw1234

ipex-llm ipex-llm copied to clipboard

[Nano] How-To Guides: Training - TensorFlow

Description

1. Why the change?

2. Summary of the change

3. How to test?

ipex-llm
ipex-llm copied to clipboard