Dr.Jit compiler failure: "Disk cache database error"
Getting this issue:
Critical Dr.Jit compiler failure: jit_optix_check(): API error 7012 (OPTIX_ERROR_DISK_CACHE_DATABASE_ERROR): "Disk cache database error" in /project/ext/drjit-core/src/optix_core.cpp:71.
Running multiple mistuba rendering tasks on the same machine across multiple graphics cards. Upon initialization, the above error is seen. Any ideas?
System configuration
System information: System information:
OS: Ubuntu 22.04.3 LTS CPU: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz GPU: Tesla V100-SXM2-32GB Tesla V100-SXM2-32GB Tesla V100-SXM2-32GB Tesla V100-SXM2-32GB Tesla V100-SXM2-32GB Tesla V100-SXM2-32GB Tesla V100-SXM2-32GB Tesla V100-SXM2-32GB Python: 3.11.4 (main, Jul 5 2023, 13:45:01) [GCC 11.2.0] NVidia driver: 535.104.05 LLVM: 14.0.6
Dr.Jit: 0.4.2 Mitsuba: 3.3.0 Is custom build? False Compiled with: GNU 10.2.1 Variants: scalar_rgb scalar_spectral cuda_ad_rgb llvm_ad_rgb
Description
Running multiple mistuba rendering tasks on the same machine across multiple graphics cards. Upon initialization, the above error is seen. Any ideas?
Steps to reproduce
- Pull code at https://github.com/sagesimhon/totem_plus,
- Update run_generation_machine_distributed.sh specifying the hostnames and number of GPUs for testing, then run it.
Hi @sagesimhon
I can't preproduce this, but I only have a single GPU. I wouldn't be surprised if OptiX's cache mechanism was device dependent and threw an error whenever two different devices tried to access the same cache. (Although the GPUs are all identical here :shrug:).
We've recently made it easier to increase the OptiX debug level, if you also bump the logging level, you might get some more information from OptiX as to why it's failing...
Thanks i'll try that.
I ran into this issue on a very similar multi GPU setup. I was able to work around it by explicitly setting the OPTIX_CACHE_PATH environment variable to /tmp. I don't think the issue is related to there being multiple GPUs, but it might be that the environment on these systems is more locked down than a regular linux install.
Closing this issue -- inactivity.
If anyone has more to contribute to this discussion, please feel free to still comment on the current issue.