Horace He comments

Results 242 comments of


                                            Horace He

About benchmark results

Yes. GPU power limit is an unfortunate limitation of the particular hardware setup I'm using - it's not required.

I would perhaps suggest this video giving an overview of TorchInductor: https://www.youtube.com/watch?v=p13HpZv2S3Q Another thing you can check out is `TORCH_LOGS="output_code"`, which'll show you the actual triton kernels that are generated....

KeyError: 'model.layers.{}.self_attn.W_pack.weight'

What command are you running to get this error?

Help explain "Actually better for Inductor to codegen attention here"

`F.scaled_dot_product_attention` automatically makes a decision about what backend to dispatch to. For example, it can choose to dispatch to the FlashAttention2 kernel. Or, for example, on platforms where FlashAttention2 is...

Help explain "Actually better for Inductor to codegen attention here"

The big issue is the work partitioning structure. FlashAttention parallelizes among heads, BS, and output_seq_len (i.e. seq_query). In this case, BS and `output_seq_len` is 1, so the only parallelism is...

Help explain "Actually better for Inductor to codegen attention here"

> The function decode_n_tokens, in which the torch.backends.cuda.sdp_kernel decorator is used, is not compiled. Does that mean the aforementioned behavior is not applied? No, `decode_n_tokens` calls `decode_token`, which does have...

Normal Inference seems to output more tokens per second.

What is a "normal implementation" of the model? To be clear, the metric reported here is also sometimes called "tokens per second per user" (i.e. the latency for a single...

[RFC][inductor] Add collective function fusion: AllReduce + Split to ReduceScatter

@yifuwang I think the right way to handle this is that we should compile once for all ranks, and then re-use the graph on all ranks.

Pressing `u` will undo all the stack.

Putting this inside `vim.otherModesKeyBindings` is a good temporary fix if this is a big issue for you: ``` { "before": [ "u" ], "after": [], "commands": [ { "command": "undo"...

Pressing `u` will undo all the stack.

@petejkim You're correct! Thanks!