pfeatherstone comments

Results 403 comments of


                                            pfeatherstone

[Bug] XL-recurrence with AlibiPositionalBias and mems not working correctly

https://ofir.io/train_short_test_long.pdf, the one you reference in your readme. I have to admit I haven't read it in great detail but they suggest AliBI is great.

[Bug] XL-recurrence with AlibiPositionalBias and mems not working correctly

Basically I need a positional embedding that length-extrapolates well, works with memories, and flash attention. Do you have any suggestions?

[Bug] XL-recurrence with AlibiPositionalBias and mems not working correctly

What do you mean by curriculum learn to longer sequence lengths? Sorry if my questions are dumb.

RotaryEmbedding XPOS doesn't work with mems

Presumably in `apply_rotary_pos_emb()` we need to add: ``` scale = scale[-seq_len:, :] ``` ?

RotaryEmbedding XPOS doesn't work with mems

As an aside, why is all the RotaryEmbedding decorated with `@torch.cuda.amp.autocast(enabled = False)` ? You can remove it with just a couple tweaks and it supports `torch.bfloat16`.

RotaryEmbedding XPOS doesn't work with mems

Also, I think the `scale` calculation is incorrect when using mems since the positions are off. You have to use the same trick of starting from negative position.

RotaryEmbedding XPOS doesn't work with mems

https://github.com/lucidrains/x-transformers/pull/234 I believe this fixes it.

Question: rotary embeddings and bad length extrapolation

Other candidates are Alibi or no embeddings at all. For the last one, in order for it to work, do you need to train with a range of sizes so...

ONNX export failed

Then if i change: ``` *x.shape, ``` to ``` x.shape[0], x.shape[1] ``` I get another error: ``` x_transformers.py", line 1238, in forward rotary_pos_emb = self.rotary_pos_emb(max_rotary_emb_length) return _VF.einsum(equation, operands) # type:...

ONNX export failed

It would seem that during normal inference `max_rotary_emb_length` is an `int`, during JIT tracing or ONNX export it's a 0-dimensional tensor. EDIT: It looks like generally something like `x.shape[0]` is...