multinerf icon indicating copy to clipboard operation
multinerf copied to clipboard

Installation error: test_integrated_pos_enc (tests.coord_test.CoordTest) tests.coord_test.CoordTest.test_integrated_pos_enc

Open Paul45577 opened this issue 2 years ago • 0 comments

When I try to confirm that all unit tests pass I get the following error:

(multinerf) mypc@pc0005:~/user/multinerf$ ./scripts/run_all_unit_tests.sh
.
----------------------------------------------------------------------
Ran 1 test in 2.074s

OK
....
----------------------------------------------------------------------
Ran 4 tests in 13.416s

OK
...............................Mean Error = 0.0803162083029747, Tolerance = 0.1
.Mean Error = 0.08638705313205719, Tolerance = 0.1
........................
----------------------------------------------------------------------
Ran 56 tests in 90.169s

OK
.........2022-12-27 13:00:39.973523: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2163] Execution of replica 0 failed: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.cholesky' failed: cuSolver internal error.
E..PE of degree 5 has a maximum error of 2.5369226932525635e-06
.PE of degree 10 has a maximum error of 6.4849853515625e-05
.PE of degree 15 has a maximum error of 0.002378210425376892
.PE of degree 20 has a maximum error of 0.11622805148363113
.PE of degree 25 has a maximum error of 1.999955415725708
.PE of degree 30 has a maximum error of 1.9999704360961914
....
======================================================================
ERROR: test_integrated_pos_enc (tests.coord_test.CoordTest)
tests.coord_test.CoordTest.test_integrated_pos_enc
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/multinerf/tests/coord_test.py", line 254, in test_integrated_pos_enc
    samples = random.multivariate_normal(key, mean, cov, [num_samples])
  File "/home/mypc/anaconda3/envs/multinerf/lib/python3.9/site-packages/jax/_src/random.py", line 625, in multivariate_normal
    return _multivariate_normal(key, mean, cov, shape, dtype, method)  # type: ignore
  File "/home/mypc/anaconda3/envs/multinerf/lib/python3.9/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
    return fun(*args, **kwargs)
  File "/home/mypc/anaconda3/envs/multinerf/lib/python3.9/site-packages/jax/_src/api.py", line 623, in cache_miss
    out_flat = call_bind_continuation(execute(*args_flat))
  File "/home/mypc/anaconda3/envs/multinerf/lib/python3.9/site-packages/jax/_src/profiler.py", line 314, in wrapper
    return func(*args, **kwargs)
  File "/home/mypc/anaconda3/envs/multinerf/lib/python3.9/site-packages/jax/interpreters/pxla.py", line 2136, in __call__
    out_bufs = self.xla_executable.execute_sharded_on_local_devices(
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.cholesky' failed: cuSolver internal error.

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/multinerf/tests/coord_test.py", line 254, in test_integrated_pos_enc
    samples = random.multivariate_normal(key, mean, cov, [num_samples])
  File "/home/mypc/anaconda3/envs/multinerf/lib/python3.9/site-packages/jax/_src/random.py", line 625, in multivariate_normal
    return _multivariate_normal(key, mean, cov, shape, dtype, method)  # type: ignore
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.cholesky' failed: cuSolver internal error.

----------------------------------------------------------------------
Ran 21 tests in 27.995s

FAILED (errors=1)
......
----------------------------------------------------------------------
Ran 6 tests in 5.412s

OK
..
----------------------------------------------------------------------
Ran 2 tests in 4.408s

OK
.
----------------------------------------------------------------------
Ran 1 test in 0.325s

OK
.
----------------------------------------------------------------------
Ran 1 test in 1.427s

OK
....../home/user/multinerf/internal/math.py:28: RuntimeWarning: overflow encountered in cast
  return fn(jnp.where(jnp.abs(x) < t, x, x % t))
/home/mypc/anaconda3/envs/multinerf/lib/python3.9/site-packages/jax/_src/numpy/lax_numpy.py:1090: RuntimeWarning: overflow encountered in cast
  return _where(condition, x, y)
.
----------------------------------------------------------------------
Ran 7 tests in 4.480s

OK
..........................................
----------------------------------------------------------------------
Ran 42 tests in 37.426s

OK

I've installed jaxlib with GPU support:

pip install -U jax[cuda11_cudnn82] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

My cuda version is:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0

My GPU is:

GeForce GTX 1080 Ti

Jax/Jaxlib Versions:

jax 0.4.1
jaxlib 0.4.1+cuda11.cudnn82

System Info:

Python version 3.9, Ubuntu 20.04

Paul45577 avatar Dec 27 '22 12:12 Paul45577