Andy Arditi comments

Results 5 comments of


                                            Andy Arditi

Construct causal mask on-the-fly

Having written out the alternative solution of having a cached attention mask that grows as needed, I'm thinking maybe that's better.. It does have the following drawback: if you run...

Construct causal mask on-the-fly

I ran the following benchmarks to measure perf impact. The difference in perf doesn't seem significant to me, so I think this simple implementation seems ok. Let me know if...

Construct causal mask on-the-fly

Hi @bryce13950 - thanks for pinging on this. The currently-implemented solution in this PR is to construct attention masks for each attention component (i.e. at each layer) on-the-fly. This solution...

Construct causal mask on-the-fly

abandoning this pr

[Proposal] Add Support for Yi-6B and Yi-34B

Just noting here that Yi models (both [6B](https://huggingface.co/01-ai/Yi-6B/blob/7ed3ea6ea9c05020e2fd0cd1cc2916921a369d7c/config.json#L15) and [34B](https://huggingface.co/01-ai/Yi-34B/blob/48ef127f218826a38e0dc0aebea9505e8302a842/config.json#L15)) use grouped-query attention (`num_key_value_heads` < `num_attention_heads`). Grouped-query attention is implemented in #443, so this integration should be straightforward once that...