mlem MLEM-loaded model performs consistently worse

I have a Pytorch text classification model I cannot disclose the architecture of. Whenever the model is loaded with the relative library, it consistently performs slightly better than the model saved and then loaded with MLEM. As detailed on the Discord discussion with @aguschin:

It's a Pytorch sequence classification model. Ran the eval four times each:

the original model
the mlem_model saved and loaded with:

# load the model with Pytorch model class
model = MyModel.from_pretrained('./model_path')

# save
from mlem.api import save
save(model, "./checkpoints/v070_mlem")

#
from mlem.api import load
mlem_model = load("./checkpoints/v070_mlem")

And did eval 4 times each on 5k samples, getting the accuracies:

original:

0.7868
0.7874
0.7844
0.7864

mlem_model:

0.7778
0.783
0.7808
0.7816

So almost the same, but consistently lower by about 0.6% on average.

Mar 13 '23 13:03 rocco-fortuna

@aguschin mentioned:

I think we can [try] one of Pytorch examples to see if this can be reproduced there. If that won't help us, we can try to dig deeper into some specifics.

Let me know if you need any additional info.

Mar 13 '23 13:03 rocco-fortuna

@mike0sv do you have any ideas why this could be the case?

Mar 14 '23 08:03 aguschin

Under the hood, mlem save and loads model with torch.save and torch.load (or torch.jit.save and torch.jit.load). We do not do anything else with the model. Can you confirm that this logic is to blame by running something like this

# load the model with Pytorch model class
model = MyModel.from_pretrained('./model_path')

# save
torch.save(model, "...")

#
model = torch.load("...")

and running evaluation? If your model is isinstance(model, torch.jit.ScriptModule), use torch.jit

Mar 14 '23 13:03 mike0sv

That yielded:

0.7832
0.7862
0.7888
0.7892

Consistently with the original model's performance

Mar 16 '23 20:03 rocco-fortuna