question about pretrained model
hi yabin, thanks for sharing the work! I have a question about backbone training. I read the code and found there is no backbone weights saving in training step which means we have to use pretrained backbone to extract features in terms of inference. What if I have no pretrain model? for example, If I'm going to train a video classification model and I have multiple stages to train, how should I start the training without pretrain backbone model?
That's an excellent question. While it's possible that ESN can work without backbones pre-trained on large datasets to extract features, my experiments yielded poor results when training from scratch using only the ten base classes. Additionally, obtaining a pre-trained backbone on relatively small datasets, like ImageNet, is now inexpensive and readily available. Although I'm uncertain, there likely exist robust video classification backbones for you to try.