Results 11 comments of Felix Mujkanovic

The forward pass through the LLM should actually be differentiable, right? However, differentiating through the LLM might of course supply too noisy and therefore unusable gradients. If that is the...