OneTrainer icon indicating copy to clipboard operation
OneTrainer copied to clipboard

[Bug]: ema ajusts with epoch count not "ema update step interval"

Open ppbrown opened this issue 1 year ago • 7 comments

What happened?

I had previously wondered why changing the "ema update step interval" seemed to make no difference.

Today I found out. Changing the epoch count, changes ema behaviour.

What did you expect would happen?

sample images should not change when I change epoch count, if I have scheduler=linear, etc.

Relevant log output

No response

Output of pip freeze

No response

ppbrown avatar Jun 14 '24 12:06 ppbrown

If you use a learning rate scheduler other than constant, the epochs will be used to calculate the schedule. Different learning behavior should be expected.

Nerogar avatar Jun 16 '24 08:06 Nerogar

Then this is a documentation bug rather than a behaviour bug. The tooltip should mention this. Otherwise, the term says "steps", so reasonable expectation is that it Actually Means "training steps".

and/or rename the term to something that doesnt have "steps" in the name. If you just remove "step" and call it "Update interval", that's a start.

ppbrown avatar Jun 16 '24 13:06 ppbrown

The EMA step interval is an optimization option. It's not supposed to have an effect. Higher values increase training speed, but can reduce the quality of the EMA model.

Nerogar avatar Jun 16 '24 13:06 Nerogar

All tuning is related together, it affects how I have to organize my strategy for tuning evaluation. I am tuning on some subtle features. Previously I was training using linear scheduler. When doing comparison runs for training: I could set my EMA value, chose whatever total number of epochs I liked, and things would progress consistently. epoch 10 of run A, would be reasonably consistent with epoch 10 of run B

I changed to adafactor, and could do the same thing.... IF i dont define EMA.

But if I turn on EMA.. which I need in some cases... and I only actually want a short number of epochs... for consistency, I have to always set total number of epochs to 100. Even if I only really want a 20 or 30 epoch run for some sets Very annoying.

On Sun, Jun 16, 2024 at 6:53 AM Nerogar @.***> wrote:

The EMA step interval is an optimization option. It's not supposed to have an effect. Higher values increase training speed, but can reduce the quality of the EMA model.

— Reply to this email directly, view it on GitHub https://github.com/Nerogar/OneTrainer/issues/338#issuecomment-2171670687, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANEV6N6TRQ7MY7D4B4IBWDZHWKELAVCNFSM6AAAAABJKH6D2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZRGY3TANRYG4 . You are receiving this because you authored the thread.Message ID: @.***>

ppbrown avatar Jun 16 '24 14:06 ppbrown

ppbrown, do EMA working good only with big epoch number?

oO0 avatar Jun 16 '24 22:06 oO0

Depends on the dataset (and dataset size) and the current learning rate.

Sometimes it works best for what I'm doing with epoch=20, other times, more like 100 The higher LR I'm using, the stronger EMA effect I may need.

Going from memory now, but if I recall correctly:

For high learning rates, I typically want a shorter epoch count. But.. I then usually also need a stronger EMA effect... which needs a LONGER epoch count.

ppbrown avatar Jun 16 '24 22:06 ppbrown

Funny thing; i just noticed that the importance of adjusting EMA strength based on learning rate, is mention in this paper:

https://arxiv.org/abs/2312.02696

ppbrown avatar Jun 28 '24 13:06 ppbrown

Added link https://github.com/Nerogar/OneTrainer/wiki/Training

O-J1 avatar Feb 03 '25 04:02 O-J1