Metalhead.jl code style

trafficstars

I'm having a hard time understanding how a simple model like ResNet(18) is created.

I think in general for a library like this it would be good to favor

simplicity
readability
locality

at the cost of some code duplication and verbosity. This is basically the philosophy adopted by hugging face transformers. So each model should have its own file and define its own submodules.

Apr 24 '23 06:04 CarloLucibello

This is exactly the opposite approach of what we set out to do with the Metalhead re-write in 0.6. Copying and pasting a bottleneck block over and over is good for pedagogy or examples like in the model zoo, but it isn't great software practice. If we saw that level of repetition in a codebase, someone would end up writing tooling or a framework to generalize that.

There's a balance between HF transformers and generalizing to the point of confusion. Perhaps we've gone too far in one direction, but right now I think code organization/naming and documentation are what's keeping the current codebase from being clear to fresh eyes.

Apr 24 '23 11:04 darsnack

We made an effort to separate the building blocks from the models. But maybe we can reorganize to have the building block files (mbconv.jl, etc.) live right next to the model file that uses them. So still following the model + sub-module organization that Carlo suggested above.

Apr 24 '23 12:04 darsnack

I think most of the readability question can easily be addressed by good docs. I was short on time and secretly hoping that Pollen would be more mature by the time I could find a block to work on this, but it looks like that might take a while longer. I am busy for a little bit more but over the course of this summer I hope to polish the docs, make other improvements and get a proper v0.8 release in.

Apr 24 '23 12:04 theabhirath

It's not all on you. I will make time this week to do these cleanup steps so that additional contributors (like Carlo) can help, and we can release 0.8 soon (it is way overdue).

Apr 24 '23 12:04 darsnack

now that I'm more familiar with the library I can see the logic of it

Apr 25 '23 18:04 CarloLucibello

Reopening with another stylistic comment. Models here rely a lot on Chain, Parallel and SkipConnection instead of defining custom blocks and their forward pass. This is fine, makes the implementation concise, but as a byproduct the instantiated model becomes very hard to "navigate" by field access.

As an example, below I access a specific layer inside two equivalent implementations of ViT, one from here and the other from torchvision:

julia> pymodel.encoder.layers[0].ln_1
Python LayerNorm: LayerNorm((768,), eps=1e-06, elementwise_affine=True)

julia> jlmodel.layers[1][5][1][1].layers[1]
LayerNorm(768)      # 1_536 parameters

Apr 27 '23 03:04 CarloLucibello

One thing we want to do but have not done yet is to use the named syntax for Chain and Parallel. So we should have

Chain(resblock0 = Parallel(..., residual = ..., bottleneck = ...), ...)

Similar to Flax but without defining new types or monolith types that "do it all." Part of the reason that syntax got added to Flux is to address this style issue in Metalhead.

Apr 27 '23 10:04 darsnack

Metalhead.jl Metalhead.jl copied to clipboard

code style

Metalhead.jl
Metalhead.jl copied to clipboard