HAMi icon indicating copy to clipboard operation
HAMi copied to clipboard

[bug] Low-priority tasks will not be blocked.

Open chaunceyjiang opened this issue 1 year ago • 1 comments

When there are two pods at the same node, one pod A is set to nvidia.com/priority: "0", and another pod B is set to nvidia.com/priority: "1".

Then I run the following program in Pod B and find that it can still run. It will not be blocked.

#include <stdio.h>
#include <unistd.h>
const int N = 16;
const int blocksize = 16;

__global__
void hello(char *a, int *b)
{
        a[threadIdx.x] += b[threadIdx.x];
}

int main()
{
        char a[N] = "Hello \0\0\0\0\0\0";
        int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

        char *ad;
        int *bd;
        const int csize = N*sizeof(char);
        const int isize = N*sizeof(int);

        printf("%s", a);

        cudaMalloc( (void**)&ad, csize );
        cudaMalloc( (void**)&bd, isize );
        cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice );
        cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice );

        dim3 dimBlock( blocksize, 1 );
        dim3 dimGrid( 1, 1 );
        hello<<<dimGrid, dimBlock>>>(ad, bd);
        cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost );
        cudaFree( ad );
        cudaFree( bd );

        printf("%s\n", a);
        sleep(10);
        return 0;
}

Then I also printed the values in the share cache, confirming that the Pod should be blocked.

root@cuda-12-runtime-6d7cb75b56-7xs68:~# ./mmap_read --filename=/usr/local/vgpu/c11a0a04-ade9-461f-994e-e7f5a8e448b8.cache
cachestr= 
  initializedFlag 19920718
  smInitFlag 0
  ownerPid 0
  sem {[0 0 0 0 1 0 0 0 0 0 0 0 128 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]}
  num 1
  uuids [uuid=GPU-26a583dd-542e-09bb-5dd1-9cc5bd6eb552               ]
  limit [157286400 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  sm_limit [10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10]
  procnum 0
  utilizationSwitch 1
  recentKernel -1
  priority 1
  procs [
    pid=292, hostpid=624912, used=[               ], monitorused=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], status=1
    pid=216, hostpid=0, used=[               ], monitorused=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], status=1
root@cuda-12-runtime-6d7cb75b56-7xs68:~# time ./hello
[4pdvGPU Msg(292:139722028023808:libvgpu.c:869)]: Initializing.....
[4pdvGPU Warn(292:139722028023808:utils.c:228)]: get default cuda 1 from (null)
[4pdvGPU Msg(292:139722028023808:libvgpu.c:902)]: Initialized
Hello Hello
[4pdvGPU Msg(292:139722028023808:multiprocess_memory_limit.c:477)]: Calling exit handler 292

real	0m11.668s
user	0m0.205s
sys	0m0.330s

chaunceyjiang avatar Jan 23 '24 03:01 chaunceyjiang