Jay Zhuang comments

Results 56 comments of


                                            Jay Zhuang

trafficstars

Plot vector field with Nedelec element

A slightly cleaned-up script: ```julia using Gridap using CairoMakie """Extract node coordinates as 1D arrays from 2D regular mesh""" function get_node_coords_2d(model) node_coords = model.grid.node_coords node_x = map(coords -> coords[1], node_coords)...

Plot vector field with Nedelec element

> add code that is general Then I would probably wait until I figure out a way to plot higher-order Nedelec elements on more general meshes. Just leave the current...

Changing naive attention to SDPA gives wrong result for batched llama example

For the `LlamaAttention._attn()` implementation: https://github.com/alipay/PainlessInferenceAcceleration/blob/6280cb2f097ba0bc6bc423ab910b9de7ddbe3bf2/pia/lookahead/models/llama/modeling_llama_batch.py#L299-L325 The `self.norm_coef` is never defined, and its `else` branch is never entered. So the code is equivalent to (I checked that code below gives identical...

Changing naive attention to SDPA gives wrong result for batched llama example

OK I see, the original `_attn()` function is missing the scaling factor. Setting `scale=1.0` for `scaled_dot_product_attention` fixes the problem. Here's a simple test: ```python import torch import torch.nn.functional as F...

Changing naive attention to SDPA gives wrong result for batched llama example

Now with ```python def _sdp_attn(self, query, key, value, attention_mask=None, head_mask=None): with torch.backends.cuda.sdp_kernel(enable_math=False): return F.scaled_dot_product_attention(query, key, value, attn_mask=attention_mask, scale=1.0), None ``` I can get correct results: ``` lookahead:False time:3.198s speed:36.9token/s response:["I'm...

[Model] Support Mamba

> However, the mamba2 models unfortunately are not transformers-compatible and don't work out of the box. Huggingface transformers 4.40.0 supports Mamba-2 (codestral mamba) https://github.com/huggingface/transformers/pull/32080