kinetics-i3d icon indicating copy to clipboard operation
kinetics-i3d copied to clipboard

train from scratch on ucf101 dataset

Open leviswind opened this issue 6 years ago • 10 comments

We try to train i3d model on ucf101 from scratch, but it converges much slower with a final validation accuray around 60%. Can you offer some suggestions on train i3d model without imagenet pretrained model.

leviswind avatar Jan 16 '19 03:01 leviswind

Hi,

i think 50-60% accuracy is to be expected when training I3D from scratch on RGB in UCF101. If you do the same on flow it should get ~80%. When averaging both we got 88% in the last version of the quo vadis paper.

In summary, i think your training setup should be fine.

Best,

Joao

On Wed, Jan 16, 2019 at 3:50 AM leviswind [email protected] wrote:

We try to train i3d model on ucf101 from scratch, but it converges much slower with a final validation accuray around 60%. Can you offer some suggestions on train i3d model without imagenet pretrained model.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deepmind/kinetics-i3d/issues/46, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6qaj792doQPF6DLeSuo3yYIm6AKfcOks5vDqGNgaJpZM4aCQxU .

joaoluiscarreira avatar Jan 16 '19 10:01 joaoluiscarreira

How to train i3d with optical-flow with imagenet pretrained model? Can you offer some details of training on UCF101 dataset.

leviswind avatar Jan 17 '19 07:01 leviswind

Also, what's the convergence speed should be when training optical-flow compared with rgb with imagenet pretraining @joaoluiscarreira

leviswind avatar Jan 17 '19 07:01 leviswind

The way we did it was that we inflated the weights of the imagenet model into 3D, then trained the model normally from there, without freezing batch norm. I think you can find code online for training the model if you search on google. I tend to remember that the flow model converges faster but this was a long time ago.

Best,

Joao

On Thu, Jan 17, 2019 at 7:36 AM leviswind [email protected] wrote:

Also, what's the convergence speed should be when training optical-flow compared with rgb with imagenet pretraining

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/deepmind/kinetics-i3d/issues/46#issuecomment-455071420, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6qameRG2etXHL7UwUIC5vL1isSGHVHks5vECgCgaJpZM4aCQxU .

joaoluiscarreira avatar Jan 17 '19 09:01 joaoluiscarreira

However, the input channel of the first conv layer is 2 for flow data compared with 3 for rgb. How to deal with the difference? @joaoluiscarreira . I'm really appreciated for your help.

leviswind avatar Jan 17 '19 23:01 leviswind

I think we just discarded the weights for one of the input channels in that first layer, before inflating.

Best,

Joao

On Thu, Jan 17, 2019 at 11:55 PM leviswind [email protected] wrote:

However, the input channel of the first conv layer is 2 for flow data compared with 3 for rgb. How to deal with the difference? @joaoluiscarreira https://github.com/joaoluiscarreira . I'm really appreciated for your help.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deepmind/kinetics-i3d/issues/46#issuecomment-455376247, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6qau9xAzRpEZOppvHgaG3JODy5j3pgks5vEQ2KgaJpZM4aCQxU .

joaoluiscarreira avatar Jan 18 '19 09:01 joaoluiscarreira

Actually i went back to check what exactly we did and for that particular layer we average the original weights for the 3 input channels then copied it twice -- so the initial weights are the same for both flow input dimensions. But i think it did not make much of a difference compared to the other option.

Joao

On Fri, Jan 18, 2019 at 9:26 AM João Carreira [email protected] wrote:

I think we just discarded the weights for one of the input channels in that first layer, before inflating.

Best,

Joao

On Thu, Jan 17, 2019 at 11:55 PM leviswind [email protected] wrote:

However, the input channel of the first conv layer is 2 for flow data compared with 3 for rgb. How to deal with the difference? @joaoluiscarreira https://github.com/joaoluiscarreira . I'm really appreciated for your help.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deepmind/kinetics-i3d/issues/46#issuecomment-455376247, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6qau9xAzRpEZOppvHgaG3JODy5j3pgks5vEQ2KgaJpZM4aCQxU .

joaoluiscarreira avatar Jan 18 '19 09:01 joaoluiscarreira

We suffered great overfitting when training i3d with optical-flow similar as training rgb data without imagenet pretraining. The test accuracy is only 50%. Have you met such problems? @joaoluiscarreira

leviswind avatar Jan 19 '19 05:01 leviswind

As mentioned earlier in the thread, training from scratch on flow got close to 80%. You could try testing with batch statistics to see if there's some batch norm moving average problem.

On Sat, Jan 19, 2019 at 5:22 AM leviswind [email protected] wrote:

We suffered great overfitting when training i3d with optical-flow similar as training rgb data without imagenet pretraining. The test accuracy is only 50%. Have you met such problems? @joaoluiscarreira https://github.com/joaoluiscarreira

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deepmind/kinetics-i3d/issues/46#issuecomment-455751135, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6qaph7hvuSW91qD5DMfoONVt9O-aA0ks5vEquFgaJpZM4aCQxU .

joaoluiscarreira avatar Jan 19 '19 08:01 joaoluiscarreira

@leviswind,Hi,can you train the i3d on ucf101 successfully? I want to use i3d on ucf101, How can I use i3d model to fine-tune on ucf101? where is the train code? can you give me some advice. Thanks Best wishes

poweryin avatar May 13 '19 08:05 poweryin