torch Building lantern with CUDA and existing libtorch

I'll split this in three sections:

allowing for existing libtorch to be used when building lantern
bug in pytorch cmake config
allowing for existing libtorch to be used when installing torch

We could allow people to specify a LIBTORCH_DIR and pass that as DCMAKE_PREFIX_PATH to cmake in buildlantern.R. Here I'm hardcoding it, but it could be an env var:

withr::with_dir("lantern/build", {
    system("cmake -DCMAKE_PREFIX_PATH=/home/key/libtorch ..")
    # ...
  })

I'm running into the same Pytorch-level bug in cmake config I documented here:

https://github.com/mlverse/torch/issues/3 https://github.com/mlverse/lantern/issues/11

Basically this will only happen to people who don't have stuff in /usr/local/cuda - but still, it is a bug that, it seems, was never resolved in Pytorch: see https://discuss.pytorch.org/t/libtorch-cmake-issues/28246/7.

My workaround are the two lines between the calls to cmake, here:

withr::with_dir("lantern/build", {
    system("cmake -DCMAKE_PREFIX_PATH=/home/key/libtorch ..")
    system("sed -i 's;/local/cuda;;g' CMakeFiles/lantern.dir/build.make")
    system("sed -i 's;/local/cuda;;g' CMakeFiles/example-app.dir/link.txt")
    system("cmake --build . --target lantern --config Release --parallel 8")  
  })

With those above changes, install_torch in buildlantern.R will run, but it will download the 800M zip for nothing ... I would suggest that if a user has LIBTORCH_DIR, we directly copy the libs from there.

Jul 07 '20 19:07 skeydan

Note (in case I manage to somehow get rid of this local file ;-)) - this custom buildlantern.R seems to make my build survive pulls ;-))

# git pull

withr::with_dir("lantern/build", {
    system("cmake -DCMAKE_PREFIX_PATH=/home/key/libtorch ..")
    system("sed -i 's;/local/cuda;;g' CMakeFiles/lantern.dir/build.make")
    system("sed -i 's;/local/cuda;;g' CMakeFiles/example-app.dir/link.txt")
    system("cmake --build . --target lantern --config Release --parallel 8")  
  })

# copy lantern
source("R/lantern_sync.R")
lantern_sync(TRUE)  

# copy deps to inst
if (fs::dir_exists("inst/deps")) fs::dir_delete("inst/deps/")
fs::dir_copy("deps/", new_path = "inst/deps/")

# restart session

devtools::load_all()

Jul 09 '20 13:07 skeydan

For 1) IMHO the download is fine since it's only done once (the first time you clone the repo and run buildlantern) and it's definetly what we want on the CI and on most setups...

For 2) I think we should open a bug-report in the pytorch repo and see if they are inclined to fix it before doing anything on our side.

For the 3) I think it's already possible to use a pre-installed libtorch version using the the TORCH_HOME env var, see: https://github.com/mlverse/torch/blob/master/R/install.R#L62-L86

Jul 09 '20 22:07 dfalbel

would also solve the ARM / Jetson install issue for me - see https://github.com/znmeb/edgyR/issues/32

Oct 02 '20 16:10 znmeb

I might have a weird edge case, but using {renv} means every project that uses {torch} installs everything all over again. Sure, {torch} is kept in the cache, but the underlying libtorch keeps needed to be reinstalled.

Then to make matters worse, my server happens to have an oddly slow internet connection, so the download is taking more than the 60 seconds before timing out. So a local option would be good since I successfully downloaded the zip using wget. Or at least a way to change the timeout duration, beyond setting options(timeout=600) which may not be obvious to users.

Nov 30 '20 06:11 jaredlander

That makes sense! In theory you could do something like:

install_torch(path = "~\libtorch")

and then use the TORCH_HOME="~\libtorch" env var to always use the same installation directory.
Does that work for your use case?

Dec 03 '20 01:12 dfalbel

I do wonder where it keeps getting installed. I vaguely recall (from a late night) that it was getting installed in the package repo, which thanks to {renv} was symlinked to a global cache, but I'm not positive about that at all.

As for install_torch(path = "~\libtorch") it complained about the path not being a properly formatted URL. But, to be fair, I didn't call that, so much as I debugged the functions then inside the debugger I changed the URL to the path on disc. So that wasn't exactly a fair test either.

Dec 03 '20 01:12 jaredlander

torch torch copied to clipboard

Building lantern with CUDA and existing libtorch

torch
torch copied to clipboard