torchscale icon indicating copy to clipboard operation
torchscale copied to clipboard

Foundation Architecture for (M)LLMs

Results 26 torchscale issues
Sort by recently updated
recently updated
newest added

Hello, I have followed the training configuration introduced here (https://github.com/microsoft/torchscale/issues/52) with retnet_medium architecture. I have some questions that I would appreciate if anyone could answer them. The first is about...

I've rewritten the `torchscale.architecture.config` module to use inheritance and remove the redundant code. There are now 3 classes: `Config` - that holds all common options `EncoderConfig` - inherits 'Config' and...

5 classes in the codebase inherit from `object` for some reason. I am guessing it was some sort of an oversight.

This is kind of a simple-minded question, but what do I do if I want to see for myself that I can process a huge attention window using torchscale? Ideally,...

This pull request adds support for the Flash Attention mechanism to the MultiheadAttention module. Flash Attention is a recently proposed alternative to the conventional multi-head attention mechanism which reduces memory...

Thanks for your excellent work! I have mentioned that torchscale serially executes the operation of mapping x to q, k, and v, in line 84~86 in file torchscale/component/multihead_attention.py. Will this...