FTorch
FTorch copied to clipboard
make FTorch more robust/resilient against version mis-matching
I think there are 3 import points to consider pytorch/libtorch versioning:
- The version used for training
- The version that is used to script the model
- The version that FTorch is linked against (ultimately will be used for inference)
Current anecdotal experience has indicated that FTorch can be extremely sensitive to version differences between parts 2 and 3 at least. I haven't tested version differences from stage 1 but I am concerned this will also be an issue.
The problem is that FTorch will run, even if linked against a libtorch version which differs from the version the model was torchscripted from.
This leads to silent errors/differences in the inference data which in my experience has been of order 10-3 off from the test reference data.
I would like, FTorch to at least warn that the model version differs from the libtorch version.
I am not sure if we can get end-to-end version checking, but I think by wrapping the C++ API I can at least check steps 2 and 3.
This is part of a wider discussion around the general problem of building and packaging FTorch (specifically whether or not we can pre-install it on HPC systems as a loadable module)