torchmd-net icon indicating copy to clipboard operation
torchmd-net copied to clipboard

Add version hyperparameter to the checkpoints.

Open RaulPPelaez opened this issue 1 year ago • 22 comments

Check version when loading a model.

Print a warning if the checkpoint is loaded with a version different to the one used to create it.

Closes #291

RaulPPelaez avatar Feb 20 '24 10:02 RaulPPelaez

@peastman could you please review this one?

RaulPPelaez avatar Feb 21 '24 08:02 RaulPPelaez

@stefdoerr I merged #291 here so that it is intertwined with the checkpoint version functionality.

RaulPPelaez avatar Feb 23 '24 10:02 RaulPPelaez

I've never used versioneer, but from their documentation it looks like it does basically the same thing your code did? When you're doing an official build by cloning the repository it should work, but if you're doing a local build in a repository you've been using for a long time the recent tags will be missing and it will get the version number wrong.

The only ways I can see to reliably keep the version number in the code in sync with the tag on github are 1) update it by hand, or 2) have something that runs on github to automatically update the code whenever you create a release.

In OpenMM we update the version number by hand, but then check for consistency in the recipe to build conda packages. If we forget to update the version number, the conda build will fail until we fix it.

https://github.com/conda-forge/openmm-feedstock/blob/8dfb41fddc4f2561d1adc29d9ace70656bb3045b/recipe/test_openmm.sh#L48-L56

peastman avatar Feb 23 '24 17:02 peastman

I don't understand why it's an issue. If you git pull it pulls all the tags from remote. I've never had an issue with this. If you do local builds with untagged commits, versioneer also adds a dirty-COMMITHASH to the version to indicate that you are not on a specific version.

(htmd)  sdoerr@yoga  ~/Work/moleculekit   main  ipython
In [1]: from moleculekit import __version__

In [2]: __version__
Out[2]: '1.8.21+3.g9f6320d'

stefdoerr avatar Feb 24 '24 09:02 stefdoerr

If you git pull it pulls all the tags from remote.

It only pulls tags if you specify the --tags option. Here's the tags in my local repository.

$ git tag
0.0.1
0.1.0
0.1.1
0.1.2
0.1.3
0.1.4
0.1.5
0.2.0
0.2.1
0.2.2
0.2.3
0.2.4

peastman avatar Feb 24 '24 16:02 peastman

Are we on different git versions? For me git pull works fine pulling tags

stefdoerr avatar Feb 26 '24 07:02 stefdoerr

This also slipped past me because I use magit, which pulls tags explicitly. Realistically versioneer cannot access more information than my initial solution, in the end only the git tag contains version information. tbh it starts to look like manual version is the best option. I like the idea of having a CI test checking the version, but if the check is done post-release then if one releases with an incorrect version it stays there forever.

What about this? https://github.com/marketplace/actions/auto-release This would make it so that a commit updating the version number is what creates a release automatically.

RaulPPelaez avatar Feb 26 '24 08:02 RaulPPelaez

Really, I don't see the issue. Git pull pulls the tags fine as I show above. If the issue only happens on forks (?) then make an alias if you want for git pull --tags. Seems to me like a very minor issue to end up having to manually write versions into files.

stefdoerr avatar Feb 26 '24 08:02 stefdoerr

Still, if we make the version depend on the git info, local installs could register things like "v2.5.3+some_hash-dirty" into checkpoints. At least some additional code to handle that is needed.

RaulPPelaez avatar Feb 26 '24 08:02 RaulPPelaez

Why would that cause an issue though? It is the exact version of that commit. If you prefer not to have the hash-dirty parse it out in torchmd-net when it does the version checks between ckpt and current version

stefdoerr avatar Feb 26 '24 08:02 stefdoerr

Git pull pulls the tags fine as I show above.

Git does not pull tags as I show above! :) I can't explain why it works one way for you and a different way for me. But we clearly can't assume it will.

I just checked my local torchmd-net repository on a different computer, and that one has no tags at all.

peastman avatar Feb 26 '24 17:02 peastman

Can you check git --version? I'm curious now. I'm on 2.34.1

stefdoerr avatar Feb 26 '24 17:02 stefdoerr

On my laptop:

$ git --version
git version 2.39.2 (Apple Git-143)

And on a server:

$ git --version
git version 2.34.1

peastman avatar Feb 26 '24 17:02 peastman

Did you perhaps clone the repository from the master torchmd-net repo rather than a personal fork?

peastman avatar Feb 26 '24 17:02 peastman

But you are on a fork, correct? So I assume you are also running git fetch upstream? I just feel like it's a bit of an edge case which users won't normally encounter. I never had that issue before and this issue only affects some devs.

stefdoerr avatar Feb 26 '24 17:02 stefdoerr

Yes I'm not on a fork. On a fork I assume it requires a few more steps to get everything in sync.

stefdoerr avatar Feb 26 '24 17:02 stefdoerr

Whenever I want to pull the latest code I do

git pull upstream main
git push

peastman avatar Feb 26 '24 17:02 peastman

this issue only affects some devs.

It affects anyone who creates a fork.

Possibly this really indicates a deeper issue with this PR: do we really not intend to provide any backward compatibility at all for models? If I train a model and post it for people to use, is it really expected that my model will only work with the exact version of torchmd-net I trained it with and no other?

peastman avatar Feb 26 '24 17:02 peastman

git fetch upstream seems to fetch the tags fine for me. Give it a try.

About the backwards comp I don't know.

The reason I'm arguing against hard-coding versions is that it's a chore and people forget to do their chores. If it takes one more command to fetch tags and it only affects developers I think it's better to not switch to hard-coded versions. Considering it's already two commands to pull from upstream, might as well make it three and throw them into an alias :)

stefdoerr avatar Feb 26 '24 18:02 stefdoerr

Possibly this really indicates a deeper issue with this PR: do we really not intend to provide any backward compatibility at all for models? If I train a model and post it for people to use, is it really expected that my model will only work with the exact version of torchmd-net I trained it with and no other?

Its not like we do not want to provide backwards comp. The code in this PR only prints a warning. Some default behaviors might change, some bugs might be fixed or names changed... For instance, last time I checked, this repo https://github.com/torchmd/torchmd-protein-thermodynamics cannot run with the latest version. It was written for something like 0.2.0, where some hparams had slightly different names. The changes are easy enough, but not having version information in the hparams makes it so one has to do a bit of archeology instead of simply "conda install torchmdnet==some_version".

Its intended to be simply informative.

RaulPPelaez avatar Feb 27 '24 12:02 RaulPPelaez

I will release version 2.0.0 soon, from which point we will be using semantic versioning, so we can relax the version comparison then. Perhaps not even printing a warning if a checkpoint with 2.1.0 is loaded in any 2.x.y version. I would like to add the version to the checkpoints at that point.

RaulPPelaez avatar Feb 27 '24 12:02 RaulPPelaez

I want to train a model and post it online for other people to use. Is that supposed to be a supported use case? A few years from now, will people who want to use it be expected to install obsolete versions of torchmd-net and all its dependencies? Will that even be possible? (New GPUs will require new versions of CUDA, which will require new versions of PyTorch.)

peastman avatar Feb 27 '24 16:02 peastman