torchscale issues

Results 26 torchscale issues

Sort by recently updated

Question regarding the configuration of decoder_retention_heads

Thank you for your great work! I've noticed that your decoder_retention_heads is set to 3 by default, and the mask is also expanded to three dimensions to match. Have you...

Kratos-Wen

Bump pyarrow from 9.0.0 to 14.0.1 in /examples/longvit

Bumps [pyarrow](https://github.com/apache/arrow) from 9.0.0 to 14.0.1. Commits ba53748 MINOR: [Release] Update versions for 14.0.1 529f376 MINOR: [Release] Update .deb/.rpm changelogs for 14.0.1 b84bbca MINOR: [Release] Update CHANGELOG.md for 14.0.1 f141709...

dependabot[bot]

dependencies

Bump transformers from 4.8.1 to 4.36.0 in /examples/longvit

Bumps [transformers](https://github.com/huggingface/transformers) from 4.8.1 to 4.36.0. Release notes Sourced from transformers's releases. v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa wide-spread support New model additions Mixtral Mixtral is the new...

dependabot[bot]

dependencies

Fix No module named 'torch._six'

This is a simple fix to the issue of pytorch no longer has torch._six.

ahmedhshahin

Training RetNet on A100 GPUs

Hello, I followed the blog post https://zenn.dev/selllous/articles/retnet_tutorial shared in #52 in order to train RetNet, and it seems to work well for small models (< 3B). But I am unable...

Antoine-Bergerault

Wrong Rnm Normalization.

The fix of normalization Rnm is totally wrong. The added max value in clam needed because of wrong placement of abs() operation. More thorough explanation I put here: https://github.com/microsoft/torchscale/commit/fdd8838a756c7c435d7f8a1e4303e150dfac7442#commitcomment-134758047 Commented...

pdradx

Bump scipy from 1.6.3 to 1.10.0 in /examples/longvit

Bumps [scipy](https://github.com/scipy/scipy) from 1.6.3 to 1.10.0. Release notes Sourced from scipy's releases. SciPy 1.10.0 Release Notes SciPy 1.10.0 is the culmination of 6 months of hard work. It contains many...

dependabot[bot]

dependencies

torchscale
torchscale copied to clipboard

Metadata

Question regarding the configuration of decoder_retention_heads

Bump pyarrow from 9.0.0 to 14.0.1 in /examples/longvit

Bump transformers from 4.8.1 to 4.36.0 in /examples/longvit

Fix No module named 'torch._six'

Training RetNet on A100 GPUs

Wrong Rnm Normalization.

Bump scipy from 1.6.3 to 1.10.0 in /examples/longvit

Introducing padding_mask to RetNet

[Minor issue] Discrepancy inside arxiv paper

embed_tokens

← Metadata

Owner

Metadata

torchscale torchscale copied to clipboard

Metadata

← Metadata

Owner

Metadata

torchscale
torchscale copied to clipboard