MLEM-loaded model performs consistently worse
I have a Pytorch text classification model I cannot disclose the architecture of. Whenever the model is loaded with the relative library, it consistently performs slightly better than the model saved and then loaded with MLEM. As detailed on the Discord discussion with @aguschin:
It's a Pytorch sequence classification model. Ran the eval four times each:
- the original model
- the mlem_model saved and loaded with:
# load the model with Pytorch model class
model = MyModel.from_pretrained('./model_path')
# save
from mlem.api import save
save(model, "./checkpoints/v070_mlem")
#
from mlem.api import load
mlem_model = load("./checkpoints/v070_mlem")
And did eval 4 times each on 5k samples, getting the accuracies:
- original:
- 0.7868
- 0.7874
- 0.7844
- 0.7864
- mlem_model:
- 0.7778
- 0.783
- 0.7808
- 0.7816
So almost the same, but consistently lower by about 0.6% on average.
@aguschin mentioned:
I think we can [try] one of Pytorch examples to see if this can be reproduced there. If that won't help us, we can try to dig deeper into some specifics.
Let me know if you need any additional info.
@mike0sv do you have any ideas why this could be the case?
Under the hood, mlem save and loads model with torch.save and torch.load (or torch.jit.save and torch.jit.load). We do not do anything else with the model. Can you confirm that this logic is to blame by running something like this
# load the model with Pytorch model class
model = MyModel.from_pretrained('./model_path')
# save
torch.save(model, "...")
#
model = torch.load("...")
and running evaluation?
If your model is isinstance(model, torch.jit.ScriptModule), use torch.jit
That yielded:
- 0.7832
- 0.7862
- 0.7888
- 0.7892
Consistently with the original model's performance