torchscale icon indicating copy to clipboard operation
torchscale copied to clipboard

Foundation Architecture for (M)LLMs

Results 26 torchscale issues
Sort by recently updated
recently updated
newest added

Thank you for your great work! I've noticed that your decoder_retention_heads is set to 3 by default, and the mask is also expanded to three dimensions to match. Have you...

Bumps [pyarrow](https://github.com/apache/arrow) from 9.0.0 to 14.0.1. Commits ba53748 MINOR: [Release] Update versions for 14.0.1 529f376 MINOR: [Release] Update .deb/.rpm changelogs for 14.0.1 b84bbca MINOR: [Release] Update CHANGELOG.md for 14.0.1 f141709...

dependencies

Bumps [transformers](https://github.com/huggingface/transformers) from 4.8.1 to 4.36.0. Release notes Sourced from transformers's releases. v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa wide-spread support New model additions Mixtral Mixtral is the new...

dependencies

This is a simple fix to the issue of pytorch no longer has torch._six.

Hello, I followed the blog post https://zenn.dev/selllous/articles/retnet_tutorial shared in #52 in order to train RetNet, and it seems to work well for small models (< 3B). But I am unable...

The fix of normalization Rnm is totally wrong. The added max value in clam needed because of wrong placement of abs() operation. More thorough explanation I put here: https://github.com/microsoft/torchscale/commit/fdd8838a756c7c435d7f8a1e4303e150dfac7442#commitcomment-134758047 Commented...

Bumps [scipy](https://github.com/scipy/scipy) from 1.6.3 to 1.10.0. Release notes Sourced from scipy's releases. SciPy 1.10.0 Release Notes SciPy 1.10.0 is the culmination of 6 months of hard work. It contains many...

dependencies

As opposed to the other architectures in this package, RetNet doesn't have support for padding as far as I'm aware. I was thinking the best place to introduce it was...

Hi, first of all thank you for the nice work. I was reading the paper and found the weight decay mentioned in the appendix is different from the one mentioned...

In the RetNet model, embed _ tokens is not given, I can 't run the code. When I use this model, what should the parameter token _ embeddings pass ?...