uastar Maybe bug(?)

I tried using your codes, and found that your codes sometimes report inconsistent results (in puzzle domain).

My input was such as

12 11 9 14 7 2 4 3 0 13 10 5 8 15 1 6
0 15 7 9 5 3 11 13 2 4 12 14 8 10 6 1

And, I did a little fix to your codes in order to build and run your codes in AWS environment(I used g2.2xlarge, and NVIDIA CUDA 7.5 Toolkit on Amazon Linux-0ce7aca3-5b96-4ff4-8396-05245687380a-ami-52420645.3(ami-35b36f54)).

If possible, would you see my changes in this PR, and point out the mistakes I did if there are.

Log: When running the code (Results are unreproducible)

uastar$ ./uastar --puzzle -H 4 -W 4 < ../slide_puzzle/benchmarks/uastar/prob002
[0000000] Generating input data ......
[0000000] Initializing CPU data structure ......
[0000511] Initializing GPU data structure ......
[0002168] Solving the problem on a pure CPU platform ......
       		Number of nodes deduplicated: 1772085
       		Number of nodes expanded: 3062731
[0018677] Solving the problem with GPU acceleration ......
       		Round 645: Found one solution
       			 Number of nodes expanded: 9667894
[0019174] Checking the result ......
 > Optimal steps from CPU: 60
 > Optimal steps from GPU: 66
[0019175] ERROR: Output of the CPU and GPU is not consistent!

Log: When using debug build, by adding flags "-O0 -G -XCompiler -g" to NVCC_FLAGS

uastar$ ./uastar --puzzle -H 4 -W 4 < ../slide_puzzle/benchmarks/uastar/prob001
[0000000] Generating input data ......
[0000000] Initializing CPU data structure ......
[0000508] Initializing GPU data structure ......
[0002214] Solving the problem on a pure CPU platform ......
       		Number of nodes deduplicated: 452516
       		Number of nodes expanded: 806666
[0006227] Solving the problem with GPU acceleration ......
CudaDeviceMem::ToHost copy error 77
uastar$ ./uastar --puzzle -H 4 -W 4 < ../slide_puzzle/benchmarks/uastar/prob000
[0000000] Generating input data ......
[0000000] Initializing CPU data structure ......
[0000511] Initializing GPU data structure ......
[0002232] Solving the problem on a pure CPU platform ......
       		Number of nodes deduplicated: 1054825
       		Number of nodes expanded: 1835432
[0011771] Solving the problem with GPU acceleration ......
       		Round 1010: Found one solution
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_fill_insert
Aborted

I also found that when running your codes under cuda-memcheck, there are some errors reported.

Log: Error reported by cuda-memcheck(using checkinit tool)

========= Uninitialized __global__ memory read of size 4
=========     at 0x00005408 in /home/ec2-user/uastar/src/puzzle/GPU-kernel.cuh:352:void kExtractExpand<int=4, int=39, int=192, int=133, int=90000089>(unsigned char*, node_t<int=4>*, int*, heap_t*, int*, unsigned int*, unsigned int*, heap_t*, int*, heap_t*, int*, int*)
=========     by thread (66,0,0) in block (15,0,0)
=========     Address 0x72a26513c
=========     Saved host backtrace up to driver entry point
=========     Host Frame:/usr/lib64/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x15865d]
=========     Host Frame:/usr/local/cuda/lib64/libcudart.so.7.5 [0x146ad]
=========     Host Frame:/usr/local/cuda/lib64/libcudart.so.7.5 (cudaLaunch + 0x143) [0x2ece3]
=========     Host Frame:./uastar [0xc828b]
=========     Host Frame:./uastar [0xc7486]
=========     Host Frame:./uastar [0xc751a]
=========     Host Frame:./uastar (_Z14kExtractExpandILi4ELi39ELi192ELi133ELi90000089EEvPhP6node_tIXT_EEPiP6heap_tS4_PjS7_S6_S4_S6_S4_S4_ + 0x76) [0xd2fa0]
=========     Host Frame:./uastar (_ZN9gpusolver15GPUPuzzleSolverILi4EE5solveEv + 0x2f9) [0xd1681]
=========     Host Frame:./uastar (_ZN6Puzzle8gpuSolveEv + 0x60) [0xc63f8]
=========     Host Frame:./uastar [0x90bdf]
=========     Host Frame:./uastar (main + 0x5e4) [0x913c1]
=========     Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x21b15]
=========     Host Frame:./uastar [0x90659]
...

Dec 15 '16 02:12 spinute

Thanks for your testing and input. I will look into the memory bugs. What is the frequency that you meet the memory problem?

Dec 15 '16 02:12 zhou13

Thank your for taking a look and your comments.

The outputs sometimes (about 2,3 times in 5, even in the same problem) include ERROR: Output of the CPU and GPU is not consistent! with default compilation flags.

When compiling with "-O0 -G -XCompiler -g", other types of runtime errors are almost always reported.

In case of cuda-memcheck --tool initcheck,

with "-O0 -G -XCompiler -g", always (too many) report error at least in my observation.
with default flags, the frequency and the number of reports decreased, but rarely reported.

Dec 15 '16 03:12 spinute

I’m so sorry if you feel being rushed, but actuary I want to compare your tile solver with my implementation of IDA* search on GPU, in my master thesis (due date: 1/12).

So if you have any progress or any plan/hints, I would appreciate so much if you give me some rough ideas.

Dec 26 '16 13:12 spinute

Could you offer me the handmade puzzle instances used in this paper? http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/viewFile/9620/9366

Dec 26 '16 15:12 spinute

uastar uastar copied to clipboard

Maybe bug(?)

uastar
uastar copied to clipboard