cuda-tutorial
cuda-tutorial copied to clipboard
hello world doesn't actually say hello world
Why does the hello world example do nothing? This fact is mentioned in the tutorial, but no further explanation is given. This is extremely confusing; what is the point of even writing that code if it doesn't work?
Thanks for the comment.
The purpose of the hello world part was to quickly introduce the term "kernel" and how to compile CUDA program to the reader without introducing too much information. Right now, that is the smallest code I could think of. Please suggest if you have some ideas for the example.
BTW, the code is actually work. After additional investigation, I found that the program exit before the GPU could send the printf message back. To avoid such problem, you could use the following code.
#include <stdio.h>
#include <unistd.h>
__global__ void cuda_hello(){
printf("Hello World from GPU!\n");
}
int main() {
cuda_hello<<<1,1>>>(in);
sleep(10);
return 0;
}
hello, I am getting hello.cu(9): error: identifier "in" is undefined
hello.cu(9): error: too many arguments in function call
2 errors detected in the compilation of "/tmp/tmpxft_00001b05_00000000-8_hello.cpp1.ii".
are you sure in is a good parameter to pass? have you tested the code you shared?
Sorry, there is a typo in the code. There should not be in
in the kernel launch.
The correct code is
#include <stdio.h>
#include <unistd.h>
__global__ void cuda_hello(){
printf("Hello World from GPU!\n");
}
int main() {
cuda_hello<<<1,1>>>();
sleep(10);
return 0;
}
by the way fails to output anyway
At this point, the code won't show any output. I mentioned that in Putting things in actions Section. The purpose of the hello world code is for comparing it with C counterpart, which should be common for anyone who starts learning C code.
Please suggest if you have a better example. I could update the code if it makes more sense.
I think technically the correct way to get the print function to work is to insert a call to cudaDeviceSynchronize() after the function call.
int main(){
cuda_hello<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
It's worth updating in the tutorial since it appears there are a lot of people posting this problem/question on various message boards.
@SpacelySpaceSprockets Nvidia has its own slightly more complex Hello World that does exactly that:
#include <stdio.h>
__global__ void helloCUDA(float f)
{
printf("Hello thread %d, f=%f\n", threadIdx.x, f);
}
int main()
{
helloCUDA<<<1, 5>>>(1.2345f);
cudaDeviceSynchronize();
return 0;
}
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#examples
As @puttsk pointed out, this is needed because the GPU and CPU run asynchronously. The CPU kills the program before the CUDA runtime has executed the printf()
from the GPU.
Also, #include <stdio.h> is needed in the first line. Please add this line or make it clear that the code is not a complete example.