Eli Simhayev

Results 31 comments of Eli Simhayev

Hi @kashif, I fixed the final attention output of ProbSparseAttention, and added the ProbMask. In more detail: # Major 1. Added calculation of the final `attn_output` using `v_aggregated`, meaning steps...

@kashif fixed what I could from Sylvain comments. The main thing is that some tests are breaking after this fix https://github.com/huggingface/transformers/pull/21099/commits/b4cbddfa05e3bd739b79569cd3c3b89e316f2451

Thank you for reading the post, and submitting this issue :) Maybe you can use `_partial_` initialization? like the example [here](https://hydra.cc/docs/advanced/instantiate_objects/overview/#partial-instantiation). Alternatively, consider implementing a new transformer class (e.g. MyLogTransformer)...

Still happening in the latest release. This issue is super important because wandb logs become useless when using tqdm @ramit-wandb

Hi @andrew-openai, thank you for the posting the idea & clarifications 🙂 Can you share which evals are "complex multi-turn instructions"? I just submitted an eval here https://github.com/openai/evals/pull/634, and it...

Hi, how to change the x-axis from step to epoch?

Any updates on this? Newer architectures could greatly benefit from it. @ezyang, if I understood you correctly, PT2 hasn't implemented the axis API yet.