uastar
uastar copied to clipboard
Maybe bug(?)
I tried using your codes, and found that your codes sometimes report inconsistent results (in puzzle domain).
My input was such as
- 12 11 9 14 7 2 4 3 0 13 10 5 8 15 1 6
- 0 15 7 9 5 3 11 13 2 4 12 14 8 10 6 1
And, I did a little fix to your codes in order to build and run your codes in AWS environment(I used g2.2xlarge, and NVIDIA CUDA 7.5 Toolkit on Amazon Linux-0ce7aca3-5b96-4ff4-8396-05245687380a-ami-52420645.3(ami-35b36f54)).
If possible, would you see my changes in this PR, and point out the mistakes I did if there are.
Log: When running the code (Results are unreproducible)
uastar$ ./uastar --puzzle -H 4 -W 4 < ../slide_puzzle/benchmarks/uastar/prob002
[0000000] Generating input data ......
[0000000] Initializing CPU data structure ......
[0000511] Initializing GPU data structure ......
[0002168] Solving the problem on a pure CPU platform ......
Number of nodes deduplicated: 1772085
Number of nodes expanded: 3062731
[0018677] Solving the problem with GPU acceleration ......
Round 645: Found one solution
Number of nodes expanded: 9667894
[0019174] Checking the result ......
> Optimal steps from CPU: 60
> Optimal steps from GPU: 66
[0019175] ERROR: Output of the CPU and GPU is not consistent!
Log: When using debug build, by adding flags "-O0 -G -XCompiler -g" to NVCC_FLAGS
uastar$ ./uastar --puzzle -H 4 -W 4 < ../slide_puzzle/benchmarks/uastar/prob001
[0000000] Generating input data ......
[0000000] Initializing CPU data structure ......
[0000508] Initializing GPU data structure ......
[0002214] Solving the problem on a pure CPU platform ......
Number of nodes deduplicated: 452516
Number of nodes expanded: 806666
[0006227] Solving the problem with GPU acceleration ......
CudaDeviceMem::ToHost copy error 77
uastar$ ./uastar --puzzle -H 4 -W 4 < ../slide_puzzle/benchmarks/uastar/prob000
[0000000] Generating input data ......
[0000000] Initializing CPU data structure ......
[0000511] Initializing GPU data structure ......
[0002232] Solving the problem on a pure CPU platform ......
Number of nodes deduplicated: 1054825
Number of nodes expanded: 1835432
[0011771] Solving the problem with GPU acceleration ......
Round 1010: Found one solution
terminate called after throwing an instance of 'std::length_error'
what(): vector::_M_fill_insert
Aborted
I also found that when running your codes under cuda-memcheck, there are some errors reported.
Log: Error reported by cuda-memcheck(using checkinit tool)
========= Uninitialized __global__ memory read of size 4
========= at 0x00005408 in /home/ec2-user/uastar/src/puzzle/GPU-kernel.cuh:352:void kExtractExpand<int=4, int=39, int=192, int=133, int=90000089>(unsigned char*, node_t<int=4>*, int*, heap_t*, int*, unsigned int*, unsigned int*, heap_t*, int*, heap_t*, int*, int*)
========= by thread (66,0,0) in block (15,0,0)
========= Address 0x72a26513c
========= Saved host backtrace up to driver entry point
========= Host Frame:/usr/lib64/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x15865d]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.7.5 [0x146ad]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.7.5 (cudaLaunch + 0x143) [0x2ece3]
========= Host Frame:./uastar [0xc828b]
========= Host Frame:./uastar [0xc7486]
========= Host Frame:./uastar [0xc751a]
========= Host Frame:./uastar (_Z14kExtractExpandILi4ELi39ELi192ELi133ELi90000089EEvPhP6node_tIXT_EEPiP6heap_tS4_PjS7_S6_S4_S6_S4_S4_ + 0x76) [0xd2fa0]
========= Host Frame:./uastar (_ZN9gpusolver15GPUPuzzleSolverILi4EE5solveEv + 0x2f9) [0xd1681]
========= Host Frame:./uastar (_ZN6Puzzle8gpuSolveEv + 0x60) [0xc63f8]
========= Host Frame:./uastar [0x90bdf]
========= Host Frame:./uastar (main + 0x5e4) [0x913c1]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x21b15]
========= Host Frame:./uastar [0x90659]
...
Thanks for your testing and input. I will look into the memory bugs. What is the frequency that you meet the memory problem?
Thank your for taking a look and your comments.
The outputs sometimes (about 2,3 times in 5, even in the same problem) include ERROR: Output of the CPU and GPU is not consistent!
with default compilation flags.
When compiling with "-O0 -G -XCompiler -g", other types of runtime errors are almost always reported.
In case of cuda-memcheck --tool initcheck
,
- with "-O0 -G -XCompiler -g", always (too many) report error at least in my observation.
- with default flags, the frequency and the number of reports decreased, but rarely reported.
I’m so sorry if you feel being rushed, but actuary I want to compare your tile solver with my implementation of IDA* search on GPU, in my master thesis (due date: 1/12).
So if you have any progress or any plan/hints, I would appreciate so much if you give me some rough ideas.
Could you offer me the handmade puzzle instances used in this paper? http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/viewFile/9620/9366