ml
ml copied to clipboard
greta is not working
I was trying to play with greta using this container but I'm getting an error. This is what I am doing:
nvidia-docker run -it rocker/ml-gpu:latest bash
root@7dc3309926d4:/# nvidia-smi
Fri Apr 19 12:25:12 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116 Driver Version: 390.116 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:01:00.0 On | N/A |
| 45% 42C P0 27W / 120W | 1382MiB / 6076MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
root@7dc3309926d4:/# R
R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> x <- iris$Petal.Length
> y <- iris$Sepal.Length
> library(greta)
Attaching package: 'greta'
The following objects are masked from 'package:stats':
binomial, poisson
The following objects are masked from 'package:base':
%*%, backsolve, beta, colMeans, colSums, diag, forwardsolve, gamma,
rowMeans, rowSums, sweep
> int <- normal(0, 5)
> coef <- normal(0, 3)
> sd <- lognormal(0, 3)
> mean <- int + coef * x
> distribution(y) <- normal(mean, sd)
> m <- model(int, coef, sd)
> draws <- mcmc(m, n_samples = 1000)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
Error: greta hit a tensorflow error:
Error in py_call_impl(callable, dots$args, dots$keywords): NotFoundError: ./libdevice.compute_30.10.bc not found
[[{{node cluster_0_1/xla_compile}} = _XlaCompile[Nresources=0, Targs=[DT_DOUBLE, DT_DOUBLE, DT_DOUBLE, DT_DOUBLE, DT_DOUBLE, DT_DOUBLE, DT_DOUBLE, DT_DOUBLE, DT_DOUBLE], Tconstants=[DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32], function=cluster_0[_XlaCompiledKernel=true, _XlaNumConstantArgs=6, _XlaNumResourceArgs=0], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Const, Tile_3/multiples/1, Reshape/shape, strided_slice_3/stack, strided_slice_3/stack_1, Sum_1/reduction_indices, _arg_Placeholder_0_0/_3, _arg_Placeholder_1_0_1/_5, _arg_Placeholder_2_0_2/_7, _arg_Placeholder_3_0_3/_9, _arg_Placeholder_4_0_4/_11, _arg_Placeholder_5_0_5/_13, _arg_Placeholder_6_0_6/_15, _arg_Placeholder_7_0_7/_17, _arg_Placeholder_8_0_8/_19)]]
[[{{node cluster_0_1/xla_run/_1}} = _Recv[client_terminated=false, recv_device="/job:localh
thanks for the report, I'll take a look.
hmm... we can solve the errors such as NotFoundError: ./libdevice.compute_30.10.bc not found
by copying /usr/local/cuda-9.0
from the rocker/cuda-dev
image, but then I seem to be running up against https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc#L485-L489 instead.
Not exactly clear to me how to cherrypick ptxas 9.2.88 though.
Bumping all of cuda to 9.2.88 seems to break tensorflow, as it looks like the binaries installed by pip (for 0.12.0) are build only for cuda 9.0.
A second error I encounter, e.g. via either the virtualenv install route or in building on tensorflow/tensorflow:1.13.1-gpu-py3
is ValueError: Tensor conversion requested dtype int64 for Tensor with dtype int32
. Longer trace below.
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: Tensor conversion requested dtype int64 for Tensor with dtype int32: 'Tensor("Placeholder_13:0", dtype=int32)'
Detailed traceback:
File "/usr/local/lib/python3.5/dist-packages/tensorflow_probability/python/mcmc/sample.py", line 216, in sample_chain
name="num_steps_between_results")
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1039, in convert_to_tensor
return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1097, in convert_to_tensor_v2
as_ref=False)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1175, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 977, in _TensorTensorConversionFunction
(dtype.name, t.dtype.name, str(t)))
still digging...
I think that last error just means you have the CRAN release of greta, but need the current GitHub version.
Something changed in the most recent Tensorflow Probability release, and the greta-side patch hasn't yet made its way to CRAN.
@goldingn thanks Nick, that's the ticket!
@ignacio82 Once rocker/tensorflow-gpu
builds (probably by tomorrow, or just docker build
locally), you should be able to do a remotes::install_github("greta-dev/greta")
and then gpu-accelerated greta should be working now.
Thanks again for the bug report, hadn't gotten around to testing greta, it's still somewhat early days for these ML images.
Thanks! A couple of question:
- You said to use
rocker/tensorflow-gpu
but I think i should userocker/ml-gpu:latest
. With the former i got a mesage saying that i needed to install tensor flow probability. Is that right or should I userocker/tensorflow-gpu
? - Although greta seems to be working, I am getting the following message:
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
Is this a problem that the greta developers need to fix?
@ignacio82 Right, I moved tensorflow-probability
into the tensorflow
image now since it seemed more logical to keep those together, but the latest rocker/tensorflow-gpu
instance hasn't finished building. We're still figuring out the right organizational modularity.
Re the DeprecationWarning
, yeah, I see that too, @goldingn can probably give us more insight on that but I don't think it's much of a problem.
Not sure this ought to be a different error or not, but I get a strange error when trying greta with the ml-gpu
container.
remotes::install_github("greta-dev/greta")
rm(list=ls())
library(reticulate)
py_discover_config()
use_python("/opt/virtualenvs/r-tensorflow/bin/python")
use_virtualenv("/opt/virtualenvs/r-tensorflow/", required=T)
library(greta)
library(DiagrammeR)
library(bayesplot)
library(tidyverse)
length_of_data <- 100
sd_eps <- pi^exp(1)
intercept <- -5.0
slope <- pi
x <- seq(-10*pi, 10*pi, length.out = length_of_data)
y <- intercept + slope*x + rnorm(n = length_of_data, mean = 0, sd = sd_eps)
data <- data_frame(y = y, x = x)
intercept_p <- uniform(-10, 10)
sd_eps_p <- uniform(0, 50)
slope_p <- uniform(0, 10)
mean_y <- intercept_p+slope_p*x
distribution(y) <- normal(mean_y, sd_eps_p)
our_model <- model(intercept_p, slope_p, sd_eps_p)
num_samples <- 1000
param_draws <- mcmc(our_model, n_samples = num_samples, warmup = num_samples / 10)
that gives the error
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: Tensor conversion requested dtype int64 for Tensor with dtype int32:
'Tensor("Placeholder_13:0", dtype=int32)'
So greta
requires pretty careful coordination between versions of CUDA, tensorflow, and greta
itself. I think this particular is due to using the most recent dev version of greta with an older tensorflow (see https://github.com/greta-dev/greta/issues/248).
We're still exploring the best way to help users triangulate these versions. (The current tensorflow-gpu
image is iirc still on cuda 9.0, which is too old for tensorflow > 1.13 which is required for greta > 0.3.0 or so? don't quote me on those versions).
Can you try testing on rocker/ml:cuda-10.0
? (Note that it should already have greta
installed).