David Quarel
David Quarel
**Describe the bug** When loading Llama-2, throws a "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!" error. **Code example**...
For any tokens following ` 私` the result fails to be tokenized using the tokenizer meta-llama/Meta-LLama-3-8B.   This doesn't match the behaviour of the huggingface tokenizer  ``` >>>...
**Describe the bug** PosEmbed uses incorrect device when used with `accelerate` library **Code example** The following code is a minimal training loop that trains the `gpt2` model on randomly generated...