llvm icon indicating copy to clipboard operation
llvm copied to clipboard

failed to allocate memory when using malloc_device

Open SimonWang9610 opened this issue 4 years ago • 8 comments

template<typename T>
class Linear {
    private:
       T* weight;
       T* input;
       T* result;
       T* bias;
       T* dz;
       const int M;
       const int N;
       const int K;
    public:
       Linear(T* x, T* r, int m, int n, intk,  queue& Q): input(x), result(r), M(m), N(n), K(k) {
           weight = malloc_device<T>(M * N, Q);
           bias = malloc_device<T>(M, Q);
           dz = malloc_device<T>(N * K, Q);
       }
    ...

x = malloc_device<T>(N * K, Q);, r = malloc_device<T>(M * K, Q); In my codes, when I have multiple Linear instances sequently, all of them can allocate successful for weight and bias. However, only the last Linear instance can allocate successfully for dz, others failed to allocate for dz and return 0. (dz == nullptr is true). I use dz for storing temporary result in each Linear.

Furthermore, if I change my code and put dz int member function of Linear, like bellow:

T* update(T* diff, queue& Q) {
    T* dz = malloc_device<T>(N * K, Q); // also failed to allocate
    T* dw = malloc_device<T>(M * N, Q); // but dw always allocate successfully
    /* events here*/
   ...
   free(dw, Q);
   return dz;
}

I call update in a for loop:

T* diff = inputs.back(); // all elements in inputs are allocated by malloc_device
for (auto linear = layers.rbegin(); linear != layers.rend(); linear++) {
    diff = linear->update(diff, Q);
 }
free(diff, Q);

if I change the above code as:

void update(T* diff, queue& Q) {
    T* dz = malloc_device<T>(N * K, Q); // also failed to allocate
    T* dw = malloc_device<T>(M * N, Q); // but dw always allocate successfully
    /* events here*/
   ...
   free(dw, Q);
   free(diff, Q);
   Q.memcpy(diff, dz, N * K * sizeof(T)).wait();
  free(dz, Q);
}

then, call it as:

T* diff = inputs.back(); // all elements in inputs are allocated by malloc_device
for (auto linear = layers.rbegin(); linear != layers.rend(); linear++) {
    linear->update(diff, Q);
 }
free(diff, Q);

all circumstances only can allocate successfully dz for the last Linear, others will get 0 (nullptr);

This problem has driven me crazy! Hope guys can explain why it happened! Thanks! I tested those codes in Windows 10 on Intel CPU, using oneAPI toolkit and Ubuntu 18.04 on Nvidia GPU. They gave me same errors.

SimonWang9610 avatar Feb 15 '21 17:02 SimonWang9610