Ceres
Ceres copied to clipboard
CUDA failed to initialize on Ubuntu
I have a working LC0 chess engine with the Cuda backend. Ceres starts, but failes to initialize Cuda library.
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
Could it be just some variable setting missing maybe?
CUDA 11.5 should work fine. Most testing is done on Windows under CUDA 11.5, and Linux under 11.4, but Linux under 11.5 should work. Please indicate the exact version of Ceres being used, the GPU model, and the specific error message.
Initially I tried 0.94 with from the provided Releases, but I am reproducing the same error using the latest 0.95-rc8 as of the time of writing this response. I am pulling the latest code and compiling it in Debug mode.
nvidia-smi gives me this information: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.39.01 Driver Version: 510.39.01 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A | | 50% 49C P5 25W / 250W | 529MiB / 4096MiB | 5% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1263 G /usr/lib/xorg/Xorg 173MiB | | 0 N/A N/A 1680 G /usr/bin/gnome-shell 59MiB | .... +-----------------------------------------------------------------------------+
(so, I upgraded to 11.6 since the initial reporting of the problem)
Nibbler with LC0 works fine:
I ran ./Ceres SYSBENCH and got this output: |=========================================================| | Ceres - A Monte Carlo Tree Search Chess Engine | | | | (c) 2020- David Elliott and the Ceres Authors | | With network backend code from Leela Chess Zero. | | Use help to list available commands. | | | | Version 0.95-RC8 with PGO: NA | | Runtime .NET 5.0.13 and Cuda 11.60 | |=========================================================|
Ceres user settings loaded from file /home/alex/src/Ceres/artifacts/debug/net5.0/Ceres.json
CPU BENCHMARK 418,801 ops/second, 0 bytes alloc/op : MGPosition.FromPosition 94,264 ops/second, 1,127 bytes alloc/op : MGChessPositionFromFEN 15,142,286 ops/second, 0 bytes alloc/op : MGChessMoveToLZPositionMove 6,848,370 ops/second, 0 bytes alloc/op : ZobristHash
CERES CPU BENCHMARK SCORE: 14
GPU BENCHMARK (benchmark net: LC0:/home/alex/Programs/lc0/weights_611245)
ID Name Ver SMClk GPU% Mem% Temp Throttle Reasons NPS 1 NPS Batch
CUDA device 0: NVIDIA GeForce GTX 970 SMs: 13 Mem: 3gb Error when initializing CUDA. Did you install NVidia's CUDA? https://developer.nvidia.com/cuda-zone ErrorInvalidPtx at ManagedCuda.CudaContext.LoadModulePTX(Byte[] moduleImage, CUJITOption[] options, Object[] values) at ManagedCuda.CudaContext.LoadKernelPTX(Stream moduleImage, String kernelName) at Ceres.Base.CUDA.CUDADevice.DoLoadKernel(Assembly assembly, CudaContext context, String resource, String kernelName) in /home/alex/src/Ceres/src/Ceres.Base/CUDA/CUDADevice.cs:line 115 at Ceres.Base.CUDA.CUDADevice.GetKernel(Assembly assembly, String resource, String kernelName) in /home/alex/src/Ceres/src/Ceres.Base/CUDA/CUDADevice.cs:line 94 at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers.InitKernels(NNBackendExecContext context) in /home/alex/src/Ceres/src/Ceres.Chess/NNBackends/CUDA/NNBackendCUDALayers.cs:line 121 at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers..ctor(NNBackendExecContext context, Net net, LC0LegacyWeights weights, Boolean saveActivations, NNBackendCUDALayers referenceLayers) in /home/alex/src/Ceres/src/Ceres.Chess/NNBackends/CUDA/NNBackendCUDALayers.cs:line 111 at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.InitNetwork(Net net) in /home/alex/src/Ceres/src/Ceres.Chess/NNBackends/CUDA/NNBackendLC0_CUDA.cs:line 352 at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA..ctor(Int32 gpuID, Net net, Boolean saveActivations, Int32 maxBatchSize, Boolean dumpTiming, Boolean enableCUDAGraphs, Int32 graphBatchSizeDivisor, NNBackendLC0_CUDA referenceBackend) in /home/alex/src/Ceres/src/Ceres.Chess/NNBackends/CUDA/NNBackendLC0_CUDA.cs:line 274 Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object. at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.PrepareInputPositions(IEncodedPositionBatchFlat batch) in /home/alex/src/Ceres/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 309 at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.StartEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Int32 numPositions, Boolean retrieveSupplementalResults) in /home/alex/src/Ceres/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 214 at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.DoEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in /home/alex/src/Ceres/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 225 at Ceres.Chess.NNEvaluators.NNEvaluator.EvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in /home/alex/src/Ceres/src/Ceres.Chess/NNEvaluators/NNEvaluator.cs:line 149 at Ceres.Chess.NNEvaluators.NNEvaluatorBenchmark.EstNPS(NNEvaluator evaluator, Boolean computeBreaks, Int32 bigBatchSize, Boolean estimateSingletons, Int32 numWarmups) in /home/alex/src/Ceres/src/Ceres.Chess/NNEvaluators/NNEvaluatorBenchmark.cs:line 120 at Ceres.Commands.FeatureBenchmark.DumpGPUBenchmark() in /home/alex/src/Ceres/src/Ceres/Commands/FeatureBenchmark.cs:line 118 at Ceres.Commands.FeatureBenchmark.DumpBenchmark() in /home/alex/src/Ceres/src/Ceres/Commands/FeatureBenchmark.cs:line 46 at Ceres.Commands.DispatchCommands.ProcessCommand(String cmd) in /home/alex/src/Ceres/src/Ceres/Commands/DispatchCommands.cs:line 212 at Ceres.Program.Main(String[] args) in /home/alex/src/Ceres/src/Ceres/Program.cs:line 105 Aborted (core dumped)
And here's the error log from Nibbler, running the same version:
Sincerely, Alex Tarra
On Sat, Jan 15, 2022 at 5:57 AM dje-dev @.***> wrote:
CUDA 11.5 should work fine. Most testing is done on Windows under CUDA 11.5, and Linux under 11.4, but Linux under 11.5 should work. Please indicate the exact version of Ceres being used, the GPU model, and the specific error message.
— Reply to this email directly, view it on GitHub https://github.com/dje-dev/Ceres/issues/68#issuecomment-1013669545, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXGXPAPY36EAS2NFIWYSQYDUWFOJPANCNFSM5LQXMN6A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
Tried version 0.96 compiled from source code locally, same issue.
New version of Cuda and Ceres, same old issue. I had to manually convert all projects to Net6.0 to get running.
|=========================================================| | Ceres - A Monte Carlo Tree Search Chess Engine | | | | (c) 2020- David Elliott and the Ceres Authors | | With network backend code from Leela Chess Zero. | | Use help to list available commands. | | | | Version 0.97RC3 with PGO: NA | | Runtime .NET 6.0.10 and Cuda 11.80 | |=========================================================|
Ceres user settings loaded from file /home/alex/temp/Ceres-0.97RC3/artifacts/release/net6.0/Ceres.json
Network evaluation configured to use: <NNEvaluatorDef Network=LC0:./Networks/weights_run2_703810.pb.gz Device=GPU:0 >
Entering UCI command processing mode. go
Loaded network weights: 0 10x128 WDL MLH from ./Networks/weights_run2_703810.pb.gz
CUDA device 0: NVIDIA GeForce GTX 970 Compute: 5.2 SMs: 13 Mem: 3gb Error when initializing CUDA. Did you install NVidia's CUDA? https://developer.nvidia.com/cuda-zone ErrorInvalidPtx at ManagedCuda.CudaContext.LoadModulePTX(Byte[] moduleImage, CUJITOption[] options, Object[] values) at ManagedCuda.CudaContext.LoadKernelPTX(Stream moduleImage, String kernelName) at Ceres.Base.CUDA.CUDADevice.DoLoadKernel(Assembly assembly, CudaContext context, String resource, String kernelName) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Base/CUDA/CUDADevice.cs:line 114 at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers.InitKernels(NNBackendExecContext context) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendCUDALayers.cs:line 152 at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers..ctor(NNBackendExecContext context, Int32 deviceComputeCapabilityMajor, Net net, LC0LegacyWeights weights, Boolean saveActivations, NNBackendCUDALayers referenceLayers) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendCUDALayers.cs:line 141 at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.InitNetwork(Net net) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendLC0_CUDA.cs:line 357 at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA..ctor(Int32 gpuID, Net net, Boolean saveActivations, Int32 maxBatchSize, Boolean dumpTiming, Boolean enableCUDAGraphs, Int32 graphBatchSizeDivisor, NNBackendLC0_CUDA referenceBackend) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendLC0_CUDA.cs:line 276 CUDA device 0: NVIDIA GeForce GTX 970 Compute: 5.2 SMs: 13 Mem: 3gb Error when initializing CUDA. Did you install NVidia's CUDA? https://developer.nvidia.com/cuda-zone ErrorInvalidPtx at ManagedCuda.CudaContext.LoadModulePTX(Byte[] moduleImage, CUJITOption[] options, Object[] values) at ManagedCuda.CudaContext.LoadKernelPTX(Stream moduleImage, String kernelName) at Ceres.Base.CUDA.CUDADevice.DoLoadKernel(Assembly assembly, CudaContext context, String resource, String kernelName) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Base/CUDA/CUDADevice.cs:line 114 at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers.InitKernels(NNBackendExecContext context) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendCUDALayers.cs:line 152 at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers..ctor(NNBackendExecContext context, Int32 deviceComputeCapabilityMajor, Net net, LC0LegacyWeights weights, Boolean saveActivations, NNBackendCUDALayers referenceLayers) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendCUDALayers.cs:line 141 at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.InitNetwork(Net net) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendLC0_CUDA.cs:line 357 at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA..ctor(Int32 gpuID, Net net, Boolean saveActivations, Int32 maxBatchSize, Boolean dumpTiming, Boolean enableCUDAGraphs, Int32 graphBatchSizeDivisor, NNBackendLC0_CUDA referenceBackend) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNBackends/CUDA/NNBackendLC0_CUDA.cs:line 276 Unhandled exception. System.AggregateException: One or more errors occurred. (Object reference not set to an instance of an object.) (Object reference not set to an instance of an object.) ---> System.NullReferenceException: Object reference not set to an instance of an object. at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.PrepareInputPositions(IEncodedPositionBatchFlat batch) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 308 at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.StartEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Int32 numPositions, Boolean retrieveSupplementalResults) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 214 at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.DoEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 225 at Ceres.Chess.NNEvaluators.NNEvaluator.EvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/NNEvaluator.cs:line 149 at Ceres.MCTS.Params.NNEvaluatorSet.<Warmup>b__18_3() in /home/alex/temp/Ceres-0.97RC3/src/Ceres.MCTS/Iteration/Params/NNEvaluatorSet.cs:line 145 at System.Threading.Tasks.Task.InnerInvoke() at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) --- End of stack trace from previous location --- at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) --- End of inner exception stack trace --- at System.Threading.Tasks.Task.WaitAllCore(Task[] tasks, Int32 millisecondsTimeout, CancellationToken cancellationToken) at System.Threading.Tasks.Task.WaitAll(Task[] tasks) at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions) --- End of stack trace from previous location --- at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException) at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions) at Ceres.Features.UCI.UCIManager.InitializeEngineIfNeeded() in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Features/UCI/UCIManager.cs:line 616 at Ceres.Features.UCI.UCIManager.PlayUCI() in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Features/UCI/UCIManager.cs:line 337 at Ceres.Commands.DispatchCommands.ProcessCommand(String cmd) in /home/alex/temp/Ceres-0.97RC3/src/Ceres/Commands/DispatchCommands.cs:line 74 at Ceres.Program.Main(String[] args) in /home/alex/temp/Ceres-0.97RC3/src/Ceres/Program.cs:line 103 ---> (Inner Exception #1) System.NullReferenceException: Object reference not set to an instance of an object. at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.PrepareInputPositions(IEncodedPositionBatchFlat batch) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 308 at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.StartEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Int32 numPositions, Boolean retrieveSupplementalResults) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 214 at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.DoEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/CUDA/NNEvaluatorCUDA.cs:line 225 at Ceres.Chess.NNEvaluators.NNEvaluator.EvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in /home/alex/temp/Ceres-0.97RC3/src/Ceres.Chess/NNEvaluators/NNEvaluator.cs:line 149 at Ceres.MCTS.Params.NNEvaluatorSet.<Warmup>b__18_4() in /home/alex/temp/Ceres-0.97RC3/src/Ceres.MCTS/Iteration/Params/NNEvaluatorSet.cs:line 146 at System.Threading.Tasks.Task.InnerInvoke() at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj) at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state) --- End of stack trace from previous location --- at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)<---
Aborted