cuda-tutorial hello world doesn't actually say hello world

Why does the hello world example do nothing? This fact is mentioned in the tutorial, but no further explanation is given. This is extremely confusing; what is the point of even writing that code if it doesn't work?

Aug 30 '19 03:08 xdavidliu

Thanks for the comment.

The purpose of the hello world part was to quickly introduce the term "kernel" and how to compile CUDA program to the reader without introducing too much information. Right now, that is the smallest code I could think of. Please suggest if you have some ideas for the example.

BTW, the code is actually work. After additional investigation, I found that the program exit before the GPU could send the printf message back. To avoid such problem, you could use the following code.

#include <stdio.h>
#include <unistd.h>

__global__ void cuda_hello(){
    printf("Hello World from GPU!\n");
}

int main() {
    cuda_hello<<<1,1>>>(in);
    sleep(10);
    return 0;
}

Aug 30 '19 03:08 puttsk

hello, I am getting hello.cu(9): error: identifier "in" is undefined

hello.cu(9): error: too many arguments in function call

2 errors detected in the compilation of "/tmp/tmpxft_00001b05_00000000-8_hello.cpp1.ii".

are you sure in is a good parameter to pass? have you tested the code you shared?

Nov 03 '19 18:11 LucaPaterlini

Sorry, there is a typo in the code. There should not be in in the kernel launch.

The correct code is

#include <stdio.h>
#include <unistd.h>

__global__ void cuda_hello(){
    printf("Hello World from GPU!\n");
}

int main() {
    cuda_hello<<<1,1>>>();
    sleep(10);
    return 0;
}

Nov 04 '19 03:11 puttsk

by the way fails to output anyway

Mar 25 '22 09:03 127

At this point, the code won't show any output. I mentioned that in Putting things in actions Section. The purpose of the hello world code is for comparing it with C counterpart, which should be common for anyone who starts learning C code.

Please suggest if you have a better example. I could update the code if it makes more sense.

Mar 25 '22 10:03 puttsk

I think technically the correct way to get the print function to work is to insert a call to cudaDeviceSynchronize() after the function call.

int main(){
    cuda_hello<<<1,1>>>();
    cudaDeviceSynchronize();
    return 0;
}

It's worth updating in the tutorial since it appears there are a lot of people posting this problem/question on various message boards.

Apr 03 '22 19:04 SpacelySpaceSprockets

@SpacelySpaceSprockets Nvidia has its own slightly more complex Hello World that does exactly that:

#include <stdio.h>

__global__ void helloCUDA(float f)
{
    printf("Hello thread %d, f=%f\n", threadIdx.x, f);
}

int main()
{
    helloCUDA<<<1, 5>>>(1.2345f);
    cudaDeviceSynchronize();
    return 0;
}

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#examples

As @puttsk pointed out, this is needed because the GPU and CPU run asynchronously. The CPU kills the program before the CUDA runtime has executed the printf() from the GPU.

May 01 '22 20:05 ryao

Also, #include <stdio.h> is needed in the first line. Please add this line or make it clear that the code is not a complete example.

Oct 17 '23 12:10 wzm2256