FTorch icon indicating copy to clipboard operation
FTorch copied to clipboard

make FTorch more robust/resilient against version mis-matching

Open TomMelt opened this issue 8 months ago • 0 comments
trafficstars

I think there are 3 import points to consider pytorch/libtorch versioning:

  1. The version used for training
  2. The version that is used to script the model
  3. The version that FTorch is linked against (ultimately will be used for inference)

Current anecdotal experience has indicated that FTorch can be extremely sensitive to version differences between parts 2 and 3 at least. I haven't tested version differences from stage 1 but I am concerned this will also be an issue.

The problem is that FTorch will run, even if linked against a libtorch version which differs from the version the model was torchscripted from.

This leads to silent errors/differences in the inference data which in my experience has been of order 10-3 off from the test reference data.

I would like, FTorch to at least warn that the model version differs from the libtorch version.

I am not sure if we can get end-to-end version checking, but I think by wrapping the C++ API I can at least check steps 2 and 3.

This is part of a wider discussion around the general problem of building and packaging FTorch (specifically whether or not we can pre-install it on HPC systems as a loadable module)

TomMelt avatar Mar 10 '25 11:03 TomMelt