Trouble loading TorchSharp Linux CUDA into F# Interactive
@gbaydin spotted problems with dynamic loading TorchSharp for Linux + CUDA into F# Interactive. This will also hit .NET Notebooks
-
If there is a
#Iinclude-path directive that includes LibTorchSharp and libtorch-cpu-* CPU binaries then they will be preferred and the CUDA load will fail -
The TorchSharp native component loader for F#/.NET Interactive has an explicit version number wired into it for matching libtorch packages to look for, this version number hasn't been updated and we should pick it up from the TorchProperties build instead of hard-wiring it into the source https://github.com/dotnet/TorchSharp/blob/decd474288196e8f4119991a69def32f0e106eff/src/TorchSharp/Torch.cs#L17
-
The diagnostic given when a libtorch CPU backend is loaded during CUDA initialization could be much more detailed. Currently it fails with "System.InvalidOperationException: Torch device type CUDA did not initialise on the current machine.". Instead it could add that a CPU libtorch was loaded, give the location of the native DLLs loaded etc.
We think these together will solve the problem
Typical failure:
TorchSharp: LoadNativeBackend: Initialising native backend
TorchSharp: LoadNativeBackend: Try loading torch_cuda native component
TorchSharp: LoadNativeBackend: Loading LibTorchSharp
TorchSharp: LoadNativeBackend: Loaded LibTorchSharp, ok = False
TorchSharp: LoadNativeBackend: Native backend not found in application loading TorchSharp directly from packages directory.
TorchSharp: LoadNativeBackend: Trying dynamic load for .NET/F# Interactive by consolidating native libtorch-cuda-11.1-linux-x64-* binaries to /home/gunes/.nuget/packages/torchsharp/0.91.52719/lib/netcoreapp3.1/cuda-11.1...
CopyNativeComponentsIntoSingleDirectory: packagesDir = /home/gunes/.nuget/packages
System.NotSupportedException: The libtorch-cuda-11.1-linux-x64 package version 1.9.0.7 is not restored on this system. If using F# Interactive or .NET Interactive you may need to add a reference to this package, e.g.
#r "nuget: libtorch-cuda-11.1-linux-x64, 1.9.0.7"
at TorchSharp.torch.LoadNativeBackend(Boolean useCudaBackend) in TorchSharp.dll:token 0x60001be+0x39a
at TorchSharp.torch.TryInitializeDeviceType(DeviceType deviceType) in TorchSharp.dll:token 0x60001bf+0x0
at TorchSharp.torch.InitializeDeviceType(DeviceType deviceType) in TorchSharp.dll:token 0x60001c1+0x0
at TorchSharp.torch.InitializeDevice(Device device) in TorchSharp.dll:token 0x60001c2+0xa
at <StartupCode$FSI_0002>.$FSI_0002.main@() in RefEmit_InMemoryManifestModule:token 0x600000b+0x92
Stopped due to error
To turn on tracing of the load process we used this:
open System.Diagnostics
let tracer = new ConsoleTraceListener()
Trace.Listeners.Add(tracer)
To test out one of the load-native-library probes we used this:
open TorchSharp
open System.Diagnostics
open System.Runtime.InteropServices
let assembly = typeof<torch>.Assembly
let ok, result = NativeLibrary.TryLoad("LibTorchSharp", assembly, System.Nullable())
printfn $"ok = {ok}, result = {result}"
@dsyme -- is this still a problem for DiffSharp?