torch icon indicating copy to clipboard operation
torch copied to clipboard

Building lantern with CUDA and existing libtorch

Open skeydan opened this issue 4 years ago • 6 comments

I'll split this in three sections:

  • allowing for existing libtorch to be used when building lantern
  • bug in pytorch cmake config
  • allowing for existing libtorch to be used when installing torch
  1. We could allow people to specify a LIBTORCH_DIR and pass that as DCMAKE_PREFIX_PATH to cmake in buildlantern.R. Here I'm hardcoding it, but it could be an env var:
withr::with_dir("lantern/build", {
    system("cmake -DCMAKE_PREFIX_PATH=/home/key/libtorch ..")
    # ...
  })
  1. I'm running into the same Pytorch-level bug in cmake config I documented here:

https://github.com/mlverse/torch/issues/3 https://github.com/mlverse/lantern/issues/11

Basically this will only happen to people who don't have stuff in /usr/local/cuda - but still, it is a bug that, it seems, was never resolved in Pytorch: see https://discuss.pytorch.org/t/libtorch-cmake-issues/28246/7.

My workaround are the two lines between the calls to cmake, here:

withr::with_dir("lantern/build", {
    system("cmake -DCMAKE_PREFIX_PATH=/home/key/libtorch ..")
    system("sed -i 's;/local/cuda;;g' CMakeFiles/lantern.dir/build.make")
    system("sed -i 's;/local/cuda;;g' CMakeFiles/example-app.dir/link.txt")
    system("cmake --build . --target lantern --config Release --parallel 8")  
  })
  1. With those above changes, install_torch in buildlantern.R will run, but it will download the 800M zip for nothing ... I would suggest that if a user has LIBTORCH_DIR, we directly copy the libs from there.

skeydan avatar Jul 07 '20 19:07 skeydan

Note (in case I manage to somehow get rid of this local file ;-)) - this custom buildlantern.R seems to make my build survive pulls ;-))

# git pull

withr::with_dir("lantern/build", {
    system("cmake -DCMAKE_PREFIX_PATH=/home/key/libtorch ..")
    system("sed -i 's;/local/cuda;;g' CMakeFiles/lantern.dir/build.make")
    system("sed -i 's;/local/cuda;;g' CMakeFiles/example-app.dir/link.txt")
    system("cmake --build . --target lantern --config Release --parallel 8")  
  })

# copy lantern
source("R/lantern_sync.R")
lantern_sync(TRUE)  

# copy deps to inst
if (fs::dir_exists("inst/deps")) fs::dir_delete("inst/deps/")
fs::dir_copy("deps/", new_path = "inst/deps/")

# restart session

devtools::load_all()

skeydan avatar Jul 09 '20 13:07 skeydan

For 1) IMHO the download is fine since it's only done once (the first time you clone the repo and run buildlantern) and it's definetly what we want on the CI and on most setups...

For 2) I think we should open a bug-report in the pytorch repo and see if they are inclined to fix it before doing anything on our side.

For the 3) I think it's already possible to use a pre-installed libtorch version using the the TORCH_HOME env var, see: https://github.com/mlverse/torch/blob/master/R/install.R#L62-L86

dfalbel avatar Jul 09 '20 22:07 dfalbel

  1. would also solve the ARM / Jetson install issue for me - see https://github.com/znmeb/edgyR/issues/32

znmeb avatar Oct 02 '20 16:10 znmeb

I might have a weird edge case, but using {renv} means every project that uses {torch} installs everything all over again. Sure, {torch} is kept in the cache, but the underlying libtorch keeps needed to be reinstalled.

Then to make matters worse, my server happens to have an oddly slow internet connection, so the download is taking more than the 60 seconds before timing out. So a local option would be good since I successfully downloaded the zip using wget. Or at least a way to change the timeout duration, beyond setting options(timeout=600) which may not be obvious to users.

jaredlander avatar Nov 30 '20 06:11 jaredlander

That makes sense! In theory you could do something like:

install_torch(path = "~\libtorch")

and then use the TORCH_HOME="~\libtorch" env var to always use the same installation directory.
Does that work for your use case?

dfalbel avatar Dec 03 '20 01:12 dfalbel

I do wonder where it keeps getting installed. I vaguely recall (from a late night) that it was getting installed in the package repo, which thanks to {renv} was symlinked to a global cache, but I'm not positive about that at all.

As for install_torch(path = "~\libtorch") it complained about the path not being a properly formatted URL. But, to be fair, I didn't call that, so much as I debugged the functions then inside the debugger I changed the URL to the path on disc. So that wasn't exactly a fair test either.

jaredlander avatar Dec 03 '20 01:12 jaredlander