feed_forward_vqgan_clip icon indicating copy to clipboard operation
feed_forward_vqgan_clip copied to clipboard

clarifying differences between available models

Open zeke opened this issue 2 years ago • 2 comments

Hi @mehdidc 👋🏼 I'm a new team member at @replicate.

I was trying out your model on replicate.ai and noticed that the names of the models are a bit cryptic, so it's hard to know what differences to expect when using each:

Screen Shot 2021-09-23 at 6 21 40 PM

Here's where those are declared:

https://github.com/mehdidc/feed_forward_vqgan_clip/blob/dd640c0ee5f023ddf83379e6b3906529511ce025/predict.py#L10-L14

Looking at the source for cog's Input class it looks like options can be a list of anything:

options: Optional[List[Any]] = None

I'm not sure if this is right, but maybe this means that each model could be declared as a tuple with an accompanying label:

MODELS = [
    ("cc12m_32x1024_vitgan_v0.1.th", "This model does x"),
    ("cc12m_32x1024_vitgan_v0.2.th" "This model does y"),,
    ("cc12m_32x1024_mlp_mixer_v0.2.th", "This model does z"),
]

We could then display those labels on the model form on replicate.ai to make the available options more clear to users.

Curious to hear your thoughts!

cc @CJWBW @bfirsh @andreasjansson

zeke avatar Sep 27 '21 17:09 zeke

Hi @zeke, sorry for my late answer, thanks for the proposition, you are absolutely right, the model names are not very informative. The thing is that the models are doing the same thing in a sense (also trained on the same prompts dataset), it's just that the architecture is different (vitgan vs mlp_mixer) and between 0.1 and 0.2 I used different set of data augmentations. The reason they are provided altogether is that the user might prefer one option over the other one for a specific prompt. One way to avoid the naming would be to to not provide model choice explicitly, but rather, display a grid of images as an output like in ICGAN (https://replicate.ai/arantxacasanova/ic_gan), where the image of each cell of the grid would be the generated image from a model.

So I am not totally sure, I will think about it, if you or anyone have any propositions, would be glad to hear from you.

mehdidc avatar Oct 01 '21 11:10 mehdidc

@mehdidc @zeke

The distinguishing information is: modelType: ["mlp_mixer", "vitgan"] -> basically "experimental (mlp_mixer) versus established (vitgan)" version: ["v0.1", "v0.2"] -> not sure what the precise differences are here, @mehdidc ? dimension: [128, 256, 512, 1024] -> correlates directly with accuracy of model. bigger is better, but slower. depth: [8, 16, 32] -> number of hidden layers. correlates directly with accuracy of model. bigger is better, but slower.

this info is contained in the filename (albeit cryptically) . The format is: {dataset}_{depth}x{dimension}_{type}_{version} if you remove the curly braces. So cc12m_32x1024_vitgan_v0 gives you: dataset: cc12m depth: 32 dimension 1024 type: vitgan version: v0

From skimming your post @zeke am I correct in assuming you have a somewhat limited API to work with on replicate? There are a few ways this information could be presented. Perhaps easiest would be to summarize this info and make it easy to get to from replicate.

afiaka87 avatar Oct 03 '21 13:10 afiaka87