Ashish Gondimalla

Results 2 issues of Ashish Gondimalla

Cutlass profiler has a great set of flags to perform shmoos across different matrix shapes and sizes. While benchmarking GEMMs using the cutlass profiler, one can use Cublas as a...

I ran LLAMA_70b model on two different version (v0.5.0 and v0.7.0) with the following commands: Building the engine: ``` export NUM_GPUS=8 python examples/llama/build.py --remove_input_padding --enable_context_fmha --parallel_build --output_dir /code/outputs/engines/llama70b_tp${NUM_GPUS}_fp8_bs32_isl8192_osl512 --dtype float16...

triaged