keras-cv icon indicating copy to clipboard operation
keras-cv copied to clipboard

630/add efficientnet lite

Open sebastian-sz opened this issue 2 years ago • 1 comments

What does this PR do?

Adds EfficientNet Lite variants to keras_cv models.

Fixes #630

This is a port of PR from the Keras repository, as per [this comment].(https://github.com/keras-team/keras/pull/16905#issuecomment-1262811641)

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [x] Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
  • [x] Did you write any new necessary tests?
  • [ ] If this adds a new model, can you run a few training steps on TPU in Colab to ensure that no XLA incompatible OP are used?

Who can review?

@LukeWood

Let me know if I should tag anyone else :)

sebastian-sz avatar Oct 14 '22 04:10 sebastian-sz

@LukeWood I do have weights converted from the original tpu repository, but they expect different preprocessing:
Normalization(mean=127.0, variance=128.0**2) instead of Keras-CV Rescaling(1.0 / 255.0).

Using Rescaling layer with these weights will probably result in lower Imagenet accuracy.

sebastian-sz avatar Oct 14 '22 10:10 sebastian-sz

If not, I will run the script, but I currently have a backlog of training experiments, so this would be at least a week from now.

I don't know if you could discuss this with the Team but it would be very nice to launch a training job from a PR with a Github Action (after a manual trigger).

It will help us to follow the training on the same page and we could automate the upload of the log to https://tensorboard.dev/.

If not I think it will be a little bit hard to scale at some point.

bhack avatar Oct 19 '22 22:10 bhack

I don't know if you could discuss this with the Team but it would be very nice to launch a training job from a PR with a Github Action (after a manual trigger).

It will help us to follow the training on the same page and we could automate the upload of the log to https://tensorboard.dev/.

If not I think it will be a little bit hard to scale at some point.

This is on our radar. It's something we'd like to include eventually, but haven't yet prioritized.

ianstenbit avatar Oct 19 '22 22:10 ianstenbit

This is on our radar. It's something we'd like to include eventually, but haven't yet prioritized.

Good, when you will be ready I hope that we find a space to discuss this with the community before you finalize some details.

bhack avatar Oct 19 '22 22:10 bhack

@ianstenbit

Are you able to run our training script to verify that these models (just one is fine for now) converge on ImageNet / potentially provide weights alongside this PR?

Sorry, I do not have the compute resources to run Imagenet train job :(

sebastian-sz avatar Oct 20 '22 14:10 sebastian-sz

@ianstenbit

Are you able to run our training script to verify that these models (just one is fine for now) converge on ImageNet / potentially provide weights alongside this PR?

Sorry, I do not have the compute resources to run Imagenet train job :(

No worries at all -- we need a solution to provide GCP resources for training in situations like these. For now, I will try to train one of these in the next week.

ianstenbit avatar Oct 20 '22 21:10 ianstenbit

Thank you! Once I can produce some benchmark training results / weights for one of these I will merge.

ianstenbit avatar Oct 20 '22 21:10 ianstenbit

@ianstenbit Thanks! Looking forward to the results, as I am curious as well.

sebastian-sz avatar Oct 21 '22 05:10 sebastian-sz

@ianstenbit Thanks! Looking forward to the results, as I am curious as well.

I've just started a training run on the EfficientNetLiteB0. I'm using this as benchmark scores, so we're looking for close to 74.83% top-1 accuracy on ImageNet. (Our baseline for now is that we need to match 95% of that result to merge this + add weights, so that would be 71.09%)

ianstenbit avatar Oct 25 '22 15:10 ianstenbit

@ianstenbit Thank you for the update.

I'm using this as benchmark scores

It seems metrics from timm + papers with code are slightly higher than the ones from tensorflow/tpu repository.

Looking forward to the results!

sebastian-sz avatar Oct 26 '22 05:10 sebastian-sz

First training run had ~70.1% top-1 accuracy for EfficientNetLiteB0. I am re-starting a new run with a higher batch size and a warmped-up cosine decay LR schedule and I'll see how that goes. May need to add weight decay as well -- we shall see.

ianstenbit avatar Oct 27 '22 20:10 ianstenbit

After a few more training runs, I'm not able to get SOTA weights with this model just yet.

That said, it looks correct to me. I think weight regularization in our basic_training script is probably necessary to get good scores here. @LukeWood are you fine with merging this? I am still actively working on getting good weights

ianstenbit avatar Nov 02 '22 17:11 ianstenbit

/gcbrun

ianstenbit avatar Nov 02 '22 19:11 ianstenbit

@ianstenbit Sorry to hear about the results. If I have more time I will probably look into it too.

Thanks for the merge!

sebastian-sz avatar Nov 03 '22 06:11 sebastian-sz