flash_attention_inference
flash_attention_inference copied to clipboard

Published 20 hours ago •

→

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

Results 0 flash_attention_inference issues

Sort by recently updated

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

gpu

cuda

inference

nvidia

multi-head-attention

mha

llm

cutlass

large-language-model

flash-attention

flash-attention-2

tensor-core

Stars

Forks

Watchers

Stars

Forks

Watchers

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.