TensorComprehensions
TensorComprehensions copied to clipboard
A domain specific language to express machine learning workloads.
Hi, I'm currently trying to implement a custom convolution operation (foward and backward passes). I am starting from the code provided [here](https://facebookresearch.github.io/TensorComprehensions/_sources/framework/pytorch_integration/autograd_with_tc.rst.txt) I am getting an error with the backward...
First, let me say I love, love, love this project. It makes creating fast kernels a real breeze, unlike anything else I've seen. I am running into a bug (I...
Templated isl types keep track of the number and nature of tuples in the spaces of an isl object as well as (optionally) their internal structure. This allows the compiler...
This PR isolates a small subset from @ttheodor's #453. I took the commits that look independent, fix clear bugs or implement simple new behavior and do not touch autotuner-inl.h (which...
This PR passes proper llvm::TargetMachine information in llvm_jit and codegen_llvm by introducing a proper TargetMachine at the LLVMJit level and avoids introducing adhoc objects. The TargetMachine is constructed either from...
Add CUPTI-based profiling functionality in `CudaRTCFun`. There are several performance metrics (listed [here](http://docs.nvidia.com/cuda/cupti/r_main.html#metrics-reference). Each metric requires measuring (possibly multiple types of) hardware events. Since not all events can be measured...
Hi, I want to build TC from source on a cluster (https://www.macs.hw.ac.uk/~hv15/robotarium/about) and run some of the benchmarks, here's the information regarding my working environment: - OS: Scientific Linux release...
cuda::MappedScop: introduce maxPrivateElements mapping option This mapping option controls the maximum number of elements per thread that are promoted into the private memory (hopefully, registers, but we cannot guarantee this...