Azure

Results 7 issues of Azure

Megatron integrate terapipe, first commit. Modify transformer to add cache attention calculation. Correctness verification is required.

Hi, currently I'm researching the impact of different retrieval-augmented generation (RAG) techniques on the LLM effect. We are attempting to replicate the CrossCodeEval from the "StarCoder 2 and The Stack...

It seems that if we remove assert in the `Layer.pack`, then we can pack an bf16 linear? By the way, will marlin support "int4 \times bf16" as input?

Body: Hello, I am currently working on implementing tensor parallelism and need some guidance on how to split AWQ weights properly. Here's the current state of the AWQ weights I'm...

Body: Hello, I am currently working on implementing tensor parallelism and need some guidance on how to split AWQ weights properly. Here's the current state of the AWQ weights I'm...

The newly released kimi-k2-0905 model is now supported by the ktransformers framework. 📌 We will update the supported models list to include kimi-k2-0905. 🔄 We are also working on manual...

## Description This PR fixes the IPv6 link-local address support issue reported in #1043. The problem was that IPv6 link-local addresses (e.g., fe80::xxx%interface) could not be properly parsed and used...

run-ci