tcl
tcl copied to clipboard
Error when contract many tensors using openmp.
I have many tensors to contract. So I use openmp to parallelize the contraction procedure. Then I got the following error message.
src/memoryBroker.cpp:38: char* tcl::MemoryBroker::requestMemory(size_t): Assertion `this->currentOffset + size <= this->totalSize' failed.
This error can be reproduced by the following simple code:
`#include <stdlib.h>
#include
#include <tcl.h>
int main(int argc, char** argv) { tcl::sizeType m = 5; tcl::sizeType n = 4; tcl::sizeType k1 = 2; tcl::sizeType k2 = 3; tcl::sizeType l1 = 6;
#pragma omp parallel for for(int np=0; np<4; np++) { float dataA, dataB, dataC; posix_memalign((void) &dataA, 64, sizeof(float) * ((size_t)k2)mk1l1); posix_memalign((void**) &dataB, 64, sizeof(float) * ((size_t)n)k2k1*l1); posix_memalign((void**) &dataC, 64, sizeof(float) * ((size_t)m)nl1);
// Initialize tensors (data is not owned by the tensors)
tcl::Tensor<float> A({k1, m, k2, l1}, dataA);
tcl::Tensor<float> B({n, k2, k1, l1}, dataB);
tcl::Tensor<float> C({m, n, l1}, dataC);
// Data initialization
for (int i = 0; i < A.getTotalSize(); ++i)
dataA[i] = (i + 1) * 7 % 100;
for (int i = 0; i < B.getTotalSize(); ++i)
dataB[i] = (i + 1) * 13 % 100;
for (int i = 0; i < C.getTotalSize(); ++i)
dataC[i] = (i + 1) * 5 % 100;
float alpha = 2;
float beta = 4;
// tensor contarction: C_{m,n} = alpha * A_{k2,m,k1} * B_{n,k2,k1} + beta * C_{m,n}
auto err = tcl::tensorMult<float>(alpha, A["k1,m,k2,l1"], B["n,k2,k1,l1"], beta, C["m,n,l1"]);
if (err != tcl::SUCCESS) {
printf("ERROR: %s\n", tcl::getErrorString(err));
exit(-1);
}
}
return 0;
}`
This code follows "contraction.cpp" in the example folder. Nothing is changed except the #pragma omp parallel for for(int np=0; np<4; np++) ...
Any suggestion is of great help.
Sorry, but I'm no longer maintaining this project. Please consider to use the GPU version that I'm currently working on: https://developer.nvidia.com/cutensor
Thank you for your reply. Anyway, the CPU version is very impressed for its performance. The problem I reported is probably caused by the global declaration of memBroker. I can fix this by removing the global declaration in tcl.h and memoryBroker.cpp, and declare a local memBorker in function contractTTGT in contract.cpp. But I'm not fully understand why you use memBroker as a global viable. Are there some performance benefits by using memoryBroker in this way? It is OK if you hadn't time to solve this.