community RFC: On-device training with TensorFlow Lite

We're sharing this RFC to reflect our newest thoughts of implementing on-device training in TensorFlow Lite. We didn't setup a timeline to close the comments. We want to surface the RFC early for transparency and get feedback.

Status	Draft
Author(s)	Yu-Cheng Ling ([email protected]), Haoliang Zhang ([email protected]), Jaesung Chung ([email protected])
Sponsor	Jared Duke ([email protected])
Updated	2021-06-04

Introduction

TensorFlow Lite is TensorFlow's solution for on-device machine learning. Initially it only focused on inference use cases. We have increasingly heard from users regarding the need for on-device training. This proposal lays out the concrete plan & roadmap for supporting training in TensorFlow Lite.

Jun 07 '21 16:06 miaout17

Thanks for sharing the RFC. But I wonder if it is possible to change the model architecture on device at runtime. The first thing about the on-device training we can think of might be the transfer learning and we need to add new classes as user want. In that case, I think we need to change model architecture ( let's say, we need to change unit size of dense layer ). So is it possible with the current setup? And also I wonder if there is optimization techniques to make training on device realistic? I mean we might need big optimization in memory and computation perspective. Could you introduce some of them?

Jun 09 '21 08:06 jijoongmoon

I suggest to take a look at Continual Learning on the Edge with TensorFlow Lite

And https://arxiv.org/abs/2105.13127

Jun 13 '21 22:06 bhack

/cc @vlomonaco

Jun 13 '21 22:06 bhack

Another interesting scenario to evaluate is training in the context of Edge federated learning:

https://github.com/tensorflow/federated/issues/749 https://arxiv.org/abs/2104.03042 https://arxiv.org/abs/1909.11875 https://www.sciencedirect.com/science/article/pii/S266729522100009X

Jun 14 '21 10:06 bhack

Thanks @bhack for the tag! @lrzpellegrini, the main author of "Continual Learning at the Edge: Real-Time Training on Smartphone Devices" will take a look and provide some feedback.

Jun 14 '21 12:06 vlomonaco

/cc @gdemos01 @akhilmathurs

Jun 14 '21 13:06 bhack

Replying to @jijoongmoon

But I wonder if it is possible to change the model architecture on device at runtime. The first thing about the on-device training we can think of might be the transfer learning and we need to add new classes as user want. In that case, I think we need to change model architecture ( let's say, we need to change unit size of dense layer ). So is it possible with the current setup?

Great question.

When doing transfer learning on a classifier, changing the number of classes does not require changing the model "structure" (adding/removing ops). Changing the shape of the weights tensor should be sufficient. This proposal can handle the use case with no problem.

And also I wonder if there is optimization techniques to make training on device realistic? I mean we might need big optimization in memory and computation perspective. Could you introduce some of them?

For sure. We're focusing on making it generally work first. Once we reach that point, we can do more benchmarking and profiling to figure out what's most significant to be optimized, and work on it.

Jun 14 '21 17:06 miaout17

Thanks @bhack. @vlomonaco @lrzpellegrini thanks for taking a look and please feel free to comment.

Jun 14 '21 17:06 miaout17

Replying to @jijoongmoon

But I wonder if it is possible to change the model architecture on device at runtime. The first thing about the on-device training we can think of might be the transfer learning and we need to add new classes as user want. In that case, I think we need to change model architecture ( let's say, we need to change unit size of dense layer ). So is it possible with the current setup?

Great question.

When doing transfer learning on a classifier, changing the number of classes does not require changing the model "structure" (adding/removing ops). Changing the shape of the weights tensor should be sufficient. This proposal can handle the use case with no problem.

@miaout17 can you elaborate on how such a shape change process would work? I do not see such a use case in the current proposal. Thanks!

Jun 14 '21 20:06 lc0

Hi @miaout17, I had a more in-depth look. This direction looks promising and we are excited to finally see training on-device on the TFLite radar. I think for many Transfer Learning problems these features would be great. However, for Continual Learning (CL) flexibility is all that matters.

Can the model architecture, optimizer, loss function be changed over time?

It would be difficult to implement a CL approach without those features, apart from basic experience replay. @lrzpellegrini will provide more details.

Jun 15 '21 17:06 vlomonaco

Hi there, I had a look at the RFC. It seems to me that it moves in a very good direction.

I'm not aware of the current capabilities of TF-Lite as I only had the chance to use it in a very high-level way, but I really appreciate that the focus of the RFC is on the ability to transfer whole tf.functions to the final model. This can really boost the ability to learn on-device without forcing the programmer to delve too much in the low-level side of mobile implementations.

As a comparison, while implementing the CORe app described in "Continual Learning at the Edge: Real-Time Training on Smartphone Devices" I had to manually translate the Python version of our Continual Learning algorithm in C++ so that it could be used along the Caffe deep learning library. In this scenario even simple things like moving data, accessing tensors (weights, inputs, ...) add a lot of complexity and with that comes an absurd overhead on the programming side, so I really appreciate this tf.functions based approach 👍.

As Vincenzo pointed out, the main issues are on the flexibility side. In the simple scenario of a limited on-device fine-tuning, a simple fit based approach seems the best solution. However, this would really limit the capabilities of the framework: as I suspect, a fit-based approach would only allow for a very simple instance replay mechanism, which may be insufficient when working with Continual Learning algorithms.

On the other hand, supporting Continual Learning algorithms may require some flexibility on:

Ability to easily manipulate (read, write, store, load) tensors linked to weights, activations, gradients, etcetera.
Ability to change the model architecture (alas, not limited to changing the number of outputs of a certain layer). Some algorithms also require the ability to dynamically add new layers to the existing model or even to add new detached/undetached models
Ability to change the optimizer, loss, lr schedulers and other training related components
Ability to selectively freeze and unfreeze certain parts of the model

Of course not all CL algorithms need all these capabilities.

Consider that CL is a very variegated field but most algorithms leverage an instance replay mechanism (implemented by inserting/replacing new instances into the dataset) plus some simple regularization/distillation/bias normalization algorithm (which mostly require flexibility on the tensors manipulation side). More recent algorithms really push on the idea of manipulating the architecture of the model, but I guess that supporting this behavior would be the most problematic part of this.

Alas, I don't have a clear understanding of the translation capabilities of tf.functions from Python to TFLite models, so I'm not able to fully grasp the complexity required to accomplish this kind of flexibility.

Jun 17 '21 15:06 lrzpellegrini

I think that fedarated and continual learning are more relevant in the on device/edge use case cause, in this context, It is still hard to achieve few-shot/zero-shot learning of "general pourpose" (recent) very large scale models. At least untill we figure out how knowledge "hard distillation" on these models could be achieved efficently on constrained devices.

Jun 17 '21 20:06 bhack

can you elaborate on how such a shape change process would work?

Replying to @lc0

For example

Imagine you have a classifier where the last layer is a simple fully connected (e.g. tf.relu(tf.matmul(x, weight) + bias))
We can define a def set_classes_num(classes_num) TF function, which re-initializes the weight and bias variables to a different size. For example, if the number of hidden units is 1024 before the last layer, the weight can have shape [1024, classes_num] and the bias can have shape [classes_num]. The function can re-initialize the weights and bias to random value close to 0, and it will be ready to retrain the last layer.

We're building low level features to make describing the semantic possible. It's considerable to wraps these into easier to use API to make it more friendly for developers.

Let me know if this makes sense. I'm happy to try to write this as a more concrete pseudo code as well.

Jun 23 '21 04:06 miaout17

Replying to @vlomonaco and @lrzpellegrini

Thanks for the feedback!

For clarification: It sounds the continual learning automatically can modify the model structure without human interfering. Is my rough understand correct?

This seems more advanced than what we're currently targeting. Trying to break down the requirements:

Ability to easily manipulate (read, write, store, load) tensors linked to weights, activations, gradients, etcetera.

I think this should be doable (by wrapping required logic into TF functions).

Ability to change the model architecture (alas, not limited to changing the number of outputs of a certain layer). Some algorithms also require the ability to dynamically add new layers to the existing model or even to add new detached/undetached models Ability to change the optimizer, loss, lr schedulers and other training related components

We haven't tried these yet. However I think in theory:

A TFLite model is like a TF function. There is no easy way to change it (e.g. adding a layer) after the TFLite model is created.
However, I think it's possible to model some of these behavior with control flow (e.g. if a value is true, skip a layer or switch to another optimizer algorithm)
In the future, we can also explore on-device generation / modification of TFLite model, but it would be an even more advanced route.

Ability to selectively freeze and unfreeze certain parts of the model

This should be doable with control flow (e.g. skip some gradient computation and variable update when a boolean value is true)

Jun 23 '21 04:06 miaout17

Thanks for sharing this, excited to see progress here. As one of the authors of the Flower federated learning framework, I can say that on-device training support is one of the biggest challenges for cross-device federated learning right now.

After reading the RFC I was wondering how setting/changing hyperparameters would work on-device. Would we just add additional arguments (like epochs) to e.g. the train method

@tf.function
  def train(self, inputs, labels, epochs):
    self.model.fit(inputs, labels, epochs=epochs)

and then call train(train_input, train_labels, epochs=3)?

Jun 25 '21 11:06 danieljanes

About changing the model in training mode check:

https://discuss.tensorflow.org/t/how-to-implement-layerdrop-in-tensorflow-transformers/2396

Jun 27 '21 16:06 bhack

\cc @vassilisvas is the co-author of Continual Learning on the Edge with TensorFlow Lite and the leader of the Learning Agents & Robots MRG. This is an interesting conversation to keep our eyes on and maybe contribute to the discussion.

Jul 20 '21 06:07 gdemos01

Thank you for bringing on-device training to TFLite!

Based on this proposal I am not sure where do you plan to manage a training loop. Are you thinking of (1) keeping it inside of TFLite or (2) letting developer decide how to the training loop will be structured on device?

As @danieljanes pointed out, the API doesn’t show how the actual training step or training phase would be controlled. Moreover, optimizer and loss do not seem to be accessible from saved model. How would train method know which one to use?

Aug 07 '21 01:08 martinkersner

I have similar question to @martinkersner regarding the training loop from the context of Federated ML with TF-lite. It would be fantastic to let developer to decide how to train and structure the training loop on device. In this way, it opens up the possibility to forward the gradients from the training loop to further orchestration structure to allow centralised and decentralised Fed. ML.

I can understand the benefits to keep the training loop and structure inside TFLite, so that it can be distributed unified across all the platforms. And with the training loops open up to different platforms, you might need an additional lib extension for android, IoT and so on. But with the additional lib extensions to control training loop, you can reduce the dependencies on different platforms and speed up the development cycle for TFLite, since all the extension libs can have their own deployment cycle.

Nov 06 '21 11:11 yingding

We had already some research work at ICML 2021 to joint Federated and Continual learning with a TF reference impl:

https://github.com/wyjeong/FedWeIT

It could be nice to open this research subdomain to the Edge devices with TFlite.

Nov 06 '21 11:11 bhack

Is this finalized/approved? https://blog.tensorflow.org/2021/11/on-device-training-in-tensorflow-lite.html?m=1

Nov 10 '21 18:11 bhack

https://www.tensorflow.org/lite/examples/on_device_training/overview This is live yesterday (9.Nov) on ML Community Day stream.

Nov 10 '21 23:11 yingding

Another interesting use case, also if Imagenet probably It is a too large dataset for many edge computing TFlite platforms, Is this recent Deepmind paper One Pass ImageNet:

https://arxiv.org/abs/2111.01956

Nov 13 '21 14:11 bhack

Is this ready for community feedback? Are you ready to take this through review?

Jan 24 '22 18:01 ematejska

community community copied to clipboard

RFC: On-device training with TensorFlow Lite

Introduction

community
community copied to clipboard