torch
torch copied to clipboard
Building lantern with CUDA and existing libtorch
I'll split this in three sections:
- allowing for existing libtorch to be used when building lantern
- bug in pytorch cmake config
- allowing for existing libtorch to be used when installing torch
- We could allow people to specify a LIBTORCH_DIR and pass that as DCMAKE_PREFIX_PATH to cmake in
buildlantern.R
. Here I'm hardcoding it, but it could be an env var:
withr::with_dir("lantern/build", {
system("cmake -DCMAKE_PREFIX_PATH=/home/key/libtorch ..")
# ...
})
- I'm running into the same Pytorch-level bug in cmake config I documented here:
https://github.com/mlverse/torch/issues/3 https://github.com/mlverse/lantern/issues/11
Basically this will only happen to people who don't have stuff in /usr/local/cuda
- but still, it is a bug that, it seems, was never resolved in Pytorch: see https://discuss.pytorch.org/t/libtorch-cmake-issues/28246/7.
My workaround are the two lines between the calls to cmake, here:
withr::with_dir("lantern/build", {
system("cmake -DCMAKE_PREFIX_PATH=/home/key/libtorch ..")
system("sed -i 's;/local/cuda;;g' CMakeFiles/lantern.dir/build.make")
system("sed -i 's;/local/cuda;;g' CMakeFiles/example-app.dir/link.txt")
system("cmake --build . --target lantern --config Release --parallel 8")
})
- With those above changes,
install_torch
inbuildlantern.R
will run, but it will download the 800M zip for nothing ... I would suggest that if a user hasLIBTORCH_DIR
, we directly copy the libs from there.
Note (in case I manage to somehow get rid of this local file ;-)) - this custom buildlantern.R
seems to make my build survive pulls ;-))
# git pull
withr::with_dir("lantern/build", {
system("cmake -DCMAKE_PREFIX_PATH=/home/key/libtorch ..")
system("sed -i 's;/local/cuda;;g' CMakeFiles/lantern.dir/build.make")
system("sed -i 's;/local/cuda;;g' CMakeFiles/example-app.dir/link.txt")
system("cmake --build . --target lantern --config Release --parallel 8")
})
# copy lantern
source("R/lantern_sync.R")
lantern_sync(TRUE)
# copy deps to inst
if (fs::dir_exists("inst/deps")) fs::dir_delete("inst/deps/")
fs::dir_copy("deps/", new_path = "inst/deps/")
# restart session
devtools::load_all()
For 1) IMHO the download is fine since it's only done once (the first time you clone the repo and run buildlantern) and it's definetly what we want on the CI and on most setups...
For 2) I think we should open a bug-report in the pytorch repo and see if they are inclined to fix it before doing anything on our side.
For the 3) I think it's already possible to use a pre-installed libtorch version using the the TORCH_HOME env var, see: https://github.com/mlverse/torch/blob/master/R/install.R#L62-L86
- would also solve the ARM / Jetson install issue for me - see https://github.com/znmeb/edgyR/issues/32
I might have a weird edge case, but using {renv}
means every project that uses {torch}
installs everything all over again. Sure, {torch}
is kept in the cache, but the underlying libtorch keeps needed to be reinstalled.
Then to make matters worse, my server happens to have an oddly slow internet connection, so the download is taking more than the 60 seconds before timing out. So a local option would be good since I successfully downloaded the zip using wget. Or at least a way to change the timeout duration, beyond setting options(timeout=600)
which may not be obvious to users.
That makes sense! In theory you could do something like:
install_torch(path = "~\libtorch")
and then use the TORCH_HOME="~\libtorch"
env var to always use the same installation directory.
Does that work for your use case?
I do wonder where it keeps getting installed. I vaguely recall (from a late night) that it was getting installed in the package repo, which thanks to {renv}
was symlinked to a global cache, but I'm not positive about that at all.
As for install_torch(path = "~\libtorch")
it complained about the path not being a properly formatted URL. But, to be fair, I didn't call that, so much as I debugged the functions then inside the debugger I changed the URL to the path on disc. So that wasn't exactly a fair test either.