tcl icon indicating copy to clipboard operation
tcl copied to clipboard

Error when contract many tensors using openmp.

Open xichuang opened this issue 5 years ago • 2 comments

I have many tensors to contract. So I use openmp to parallelize the contraction procedure. Then I got the following error message.

src/memoryBroker.cpp:38: char* tcl::MemoryBroker::requestMemory(size_t): Assertion `this->currentOffset + size <= this->totalSize' failed.

This error can be reproduced by the following simple code:

`#include <stdlib.h> #include

#include <tcl.h>

int main(int argc, char** argv) { tcl::sizeType m = 5; tcl::sizeType n = 4; tcl::sizeType k1 = 2; tcl::sizeType k2 = 3; tcl::sizeType l1 = 6;

#pragma omp parallel for for(int np=0; np<4; np++) { float dataA, dataB, dataC; posix_memalign((void) &dataA, 64, sizeof(float) * ((size_t)k2)mk1l1); posix_memalign((void**) &dataB, 64, sizeof(float) * ((size_t)n)k2k1*l1); posix_memalign((void**) &dataC, 64, sizeof(float) * ((size_t)m)nl1);

    // Initialize tensors (data is not owned by the tensors)
    tcl::Tensor<float> A({k1, m, k2, l1}, dataA);
    tcl::Tensor<float> B({n, k2, k1, l1}, dataB);
    tcl::Tensor<float> C({m, n, l1}, dataC);

    // Data initialization
    for (int i = 0; i < A.getTotalSize(); ++i)
        dataA[i] = (i + 1) * 7 % 100;
    for (int i = 0; i < B.getTotalSize(); ++i)
        dataB[i] = (i + 1) * 13 % 100;
    for (int i = 0; i < C.getTotalSize(); ++i)
        dataC[i] = (i + 1) * 5 % 100;

    float alpha = 2;
    float beta = 4;

    // tensor contarction: C_{m,n} = alpha * A_{k2,m,k1} * B_{n,k2,k1} + beta * C_{m,n}
    auto err = tcl::tensorMult<float>(alpha, A["k1,m,k2,l1"], B["n,k2,k1,l1"], beta, C["m,n,l1"]);
    if (err != tcl::SUCCESS) {
        printf("ERROR: %s\n", tcl::getErrorString(err));
        exit(-1);
    }
}

return 0;

}`

This code follows "contraction.cpp" in the example folder. Nothing is changed except the #pragma omp parallel for for(int np=0; np<4; np++) ...

Any suggestion is of great help.

xichuang avatar Aug 22 '19 11:08 xichuang

Sorry, but I'm no longer maintaining this project. Please consider to use the GPU version that I'm currently working on: https://developer.nvidia.com/cutensor

springer13 avatar Aug 22 '19 14:08 springer13

Thank you for your reply. Anyway, the CPU version is very impressed for its performance. The problem I reported is probably caused by the global declaration of memBroker. I can fix this by removing the global declaration in tcl.h and memoryBroker.cpp, and declare a local memBorker in function contractTTGT in contract.cpp. But I'm not fully understand why you use memBroker as a global viable. Are there some performance benefits by using memoryBroker in this way? It is OK if you hadn't time to solve this.

xichuang avatar Aug 24 '19 04:08 xichuang