Matthew Nicely
Matthew Nicely
We should add support for `cmath` https://en.cppreference.com/w/cpp/header/cmath
``` $ nvcc -arch=sm_70 -c test.cu test.cu(2): error: calling a constexpr __host__ function("sqrt") from a __global__ function("test") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this....
Would it be possible to add batching functionality? It seems the code is highly optimized to process one image at a time, but not, say 1M images. Let me know...
Would it be possible to convert _cudaMallocPitch_ calls to _cudaMalloc_? I understand why cudaMallocPitch was chosen, but those limitations are not as noticeable today with larger cache sizes. The main...
Is it possible to quantify improvements between compares? I know with `--benchmark-compare-fail=:%` I can see which test performed worse, but I would also like to know how much better my...
Would you be interested in integrating the latest version of nvCOMP (i.e., v2.0)?
Are there plans to integrate the cuTENSOR branch into main? If yes, do you have a timeline? If no, are there any blockers? + @v0i0 @springer13
Found a bug with correlate 1D + complex data types.
Hi @evanmayer, do you have a way to run your code without hardware? I've found a few places to make improvements. If no, I've added ideas below if you ever...
Hi, I came across this repo while wiring up Whisper support to my Bazaar. I just wanted to say amazing repo! Diving through the code I noticed there was a...