Arthur comments

Results 795 comments of


                                            Arthur

[`Core tokenization`] `add_dummy_prefix_space` option to help with latest issues

will make it for next release I hope!

[`Core tokenization`] `add_dummy_prefix_space` option to help with latest issues

Failing test is unrelated 😉

Open to contribution: adding `torch.nn.functional.scaled_dot_product_attention` support for more architectures

Mistral is already covered! LongT5 if it is like T5 and has attention bias that might not be supported

Open to contribution: adding `torch.nn.functional.scaled_dot_product_attention` support for more architectures

Not sure anyone is working on that but bert is already so small that I doubt it will have a lot of impact on perf!

Open to contribution: adding `torch.nn.functional.scaled_dot_product_attention` support for more architectures

FYI going forward we should rather use https://github.com/huggingface/transformers/blob/416711c3ea88109cf25a9c5f85b4aeee2cb831b5/src/transformers/models/llama/modeling_llama.py#L1058 as it is more self contained, easier to debug and maintain than the many paths in the atnn_mask utils

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

cc @muellerzr or @pacman100

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

gently pinging @muellerzr as you self assigned this!

Arthur

[`Core tokenization`] `add_dummy_prefix_space` option to help with latest issues

[`Core tokenization`] `add_dummy_prefix_space` option to help with latest issues

Open to contribution: adding `torch.nn.functional.scaled_dot_product_attention` support for more architectures

Open to contribution: adding `torch.nn.functional.scaled_dot_product_attention` support for more architectures

Open to contribution: adding `torch.nn.functional.scaled_dot_product_attention` support for more architectures

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

Add "Fill-in-Middle" pipeline

Add "Fill-in-Middle" pipeline

Add "Fill-in-Middle" pipeline