CALM-pytorch
CALM-pytorch copied to clipboard
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
In the code-snippet below, is it possible to load Decoder/Encoder with pre-trained models from huggingface hub? ``` augment_llm = TransformerWrapper( num_tokens = 20000, max_seq_len = 1024, attn_layers = Decoder( dim...
What is the model size (number of trainable parameters) of the below mentioned models used for the experiments in the paper: 1. PaLM2-XXS 2. PaLM2-XS 3. PaLM2-S
When training, there is an issue where memory usage continuously increases during the loss calculation in the following part: **loss = F.cross_entropy( rearrange(logits, 'b n c -> b c n'),...