Nanoflow icon indicating copy to clipboard operation
Nanoflow copied to clipboard

A question while reading the paper

Open JasonJ2021 opened this issue 1 year ago • 4 comments

What does UGD refer to in Figure 4 of the paper?

JasonJ2021 avatar Aug 27 '24 08:08 JasonJ2021

Up Gate Down, the mlp layers in llama.

yzh119 avatar Aug 27 '24 08:08 yzh119

@serendipity-zk @happierpig this term should be articulated.

yzh119 avatar Aug 27 '24 08:08 yzh119

Up Gate Down, the mlp layers in llama.

That makes sense, thank you!

JasonJ2021 avatar Aug 27 '24 10:08 JasonJ2021

another question that came up while reading the paper:

Moving data between the NUMA-affinitive (directly attached) CPU and GPU can lead to 1.27× bandwidth gain compared to non-affinitive ones. NanoFlow ensures the KV-cache is copied to and from the affinitive NUMA node via thread binding.

can you share the NUMA settings on the host machine? I guess it's important to reproduce the benchmarking

wDevil avatar Sep 06 '24 05:09 wDevil

We tested our framework on multiple host machines and get similar results. The key is tuning the binding of threads to CPUs /src/computeBound.cu#L100

serendipity-zk avatar Oct 31 '24 19:10 serendipity-zk