machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Helix test fail on latest torchsharp (0.102.5) and its runtime

Open LittleLittleCloud opened this issue 1 year ago • 2 comments

System Information (please complete the following information):

  • OS & Version: [e.g. Windows 10]
  • ML.NET Version: [e.g. ML.NET v1.5.5]
  • .NET Version: [e.g. .NET 5.0]

Describe the bug

The Microsoft.ML.Torchsharp.Tests fails in the following helix tests if I update torchsharp and its runtime to 0.102.5 and 2.2.1.1.

  • centos x64
  • ubuntu x64

The error message from helix tests indicates some dependencies of liblibtorchsharp is missing. After turning on LD_Debug, it seems that one of the missing dependencies is GLIBC_2.34. Note that the image for helix test is still centos 8 streaming which glib version is 2.28. This could be the why the torchsharp test failures.

file=/temp/workitems/Microsoft.ML.TorchSharp.Tests/runtimes/linux-x64/native/libtorch_cpu.so
       260:
       260:     find library=libgomp-98b21ff3.so.1 [0]; searching
       260:      search path=/usr/local/lib:/usr/local/lib64            (LD_LIBRARY_PATH)
       260:       trying file=/usr/local/lib/libgomp-98b21ff3.so.1
       260:       trying file=/usr/local/lib64/libgomp-98b21ff3.so.1
       260:      search path=/temp/workitems/Microsoft.ML.TorchSharp.Tests/runtimes/linux-x64/native            (RUNPATH from file /temp/workitems/Microsoft.ML.TorchSharp.Tests/runtimes/linux-x64/native/libtorch_cpu.so)
       260:       trying file=/temp/workitems/Microsoft.ML.TorchSharp.Tests/runtimes/linux-x64/native/libgomp-98b21ff3.so.1
       260:
       260:     /lib64/libc.so.6: error: version lookup error: version `GLIBC_2.34' not found (required by /temp/workitems/Microsoft.ML.TorchSharp.Tests/runtimes/linux-x64/native/libLibTorchSharp.so) (fatal)
       260:     find library=libLibTorchSharp.so [0]; searching
       260:      search path=/usr/local/lib:/usr/local/lib64            (LD_LIBRARY_PATH)
       260:       trying file=/usr/local/lib/libLibTorchSharp.so
       260:       trying file=/usr/local/lib64/libLibTorchSharp.so
       260:      search cache=/etc/ld.so.cache
       260:      search path=/lib64/tls:/lib64:/usr/lib64/tls:/usr/lib64                (system search path)
       260:       trying file=/lib64/tls/libLibTorchSharp.so
       260:       trying file=/lib64/libLibTorchSharp.so
       260:       trying file=/usr/lib64/tls/libLibTorchSharp.so
       260:       trying file=/usr/lib64/libLibTorchSharp.so
       260:
       260:     find library=LibTorchSharp [0]; searching
       260:      search path=/usr/local/lib:/usr/local/lib64            (LD_LIBRARY_PATH)
       260:       trying file=/usr/local/lib/LibTorchSharp
       260:       trying file=/usr/local/lib64/LibTorchSharp
       260:      search cache=/etc/ld.so.cache
       260:      search path=/lib64/tls:/lib64:/usr/lib64/tls:/usr/lib64                (system search path)
       260:       trying file=/lib64/tls/LibTorchSharp
       260:       trying file=/lib64/LibTorchSharp
       260:       trying file=/usr/lib64/tls/LibTorchSharp
       260:       trying file=/usr/lib64/LibTorchSharp
       260:
       260:     find library=libLibTorchSharp [0]; searching
       260:      search path=/usr/local/lib:/usr/local/lib64            (LD_LIBRARY_PATH)
       260:       trying file=/usr/local/lib/libLibTorchSharp
       260:       trying file=/usr/local/lib64/libLibTorchSharp
       260:      search cache=/etc/ld.so.cache
       260:      search path=/lib64/tls:/lib64:/usr/lib64/tls:/usr/lib64                (system search path)
       260:       trying file=/lib64/tls/libLibTorchSharp
       260:       trying file=/lib64/libLibTorchSharp
       260:       trying file=/usr/lib64/tls/libLibTorchSharp
       260:       trying file=/usr/lib64/libLibTorchSharp

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots, Code, Sample Projects If applicable, add screenshots, code snippets, or sample projects to help explain your problem.

Additional context Add any other context about the problem here.

LittleLittleCloud avatar Jun 24 '24 16:06 LittleLittleCloud

The glibc dependency change from torchsharp might be introduced in this PR, which upgrades its building image from ubuntu 18 to ubuntu 22. I'm not familiar with C++ so it's just my guess.

  • https://github.com/dotnet/TorchSharp/commit/2c321846e79a2e3b7301a8b715fdb0f3410ee027#diff-7915b9b726a397ae7ba6af7b9703633d21c031ebf21682f3ee7e6a4ec52837a5R23

LittleLittleCloud avatar Jun 24 '24 16:06 LittleLittleCloud

We need to make sure we have an answer for this for ML.NET 4.0. Ideally we can update ML.NET to the latest torchsharp and that won't drop support for our platforms.

ericstj avatar Jun 24 '24 20:06 ericstj