swift-models icon indicating copy to clipboard operation
swift-models copied to clipboard

[Discussion] Solutions for Loading pretrained models

Open Shashi456 opened this issue 5 years ago • 0 comments

Currently, model loading is being done by explicitly writing the name of the tensor, for example, BERT, and VGG in #334. While VGG only has 16 layers, when going forward with Resnet and other image models, the depth would increase and it would result in a verbose declaration. [Note: Although we do currently have this hacky approach of loading models blockwise in a loop, It's not entirely intuitive and is a workaround].

We have to start looking at approaches to solve this soon, Looking at other frameworks :

1] Pytorch, has a state_dict approach to saving models, wherein a dictionary is maintained and the value of the dictionary are matched and replaced. 2] Keras has a naming approach, which is then used to match and load weights.

A naming approach might seem viable in this case, and would be (sort of) trivial for us, since we could just add an optional name variable to the layer protocol. It has a two-fold advantage for us: 1] In saving and loading models. 2] To retrieve specific weights wrt name in particular use-cases like the original paper of Neural style, retrieves specific weights of a layer and then does gram matrix normalization before computing the content loss, Deepdream also has a usecase like this.

Shashi456 avatar Mar 27 '20 07:03 Shashi456