Sebastian Raschka

Results 180 issues of Sebastian Raschka

Some models have the gguf weights on the hub: https://huggingface.co/QuantFactory/Meta-Llama-3-8B-GGUF/tree/main We would need to find and map those to the respective models one by one I think. Maybe via an...

enhancement

Hi there, I recently stumbled upon your paper, and Phudge looks great! I was wondering if you considered adding it to ollama so that it can be used in an...

After the next Lightning release, we can increase the supported bitsandbytes version, since Lightning supports this now (see https://github.com/Lightning-AI/pytorch-lightning/pull/20313)

enhancement

Investigating the RoPE implementation Fixes #1713 Fixes #1699

If we ever have the time, might be nice to add this model checkpoint [microsoft/Phi-3.5-MoE-instruct](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) because it might be a nice usage of the MoE capabilities we added for Mixtral.

enhancement
model-weights

Right now we only support the Phi 3 version that supports up to 4k tokens, it would be nice to also support the 128k token version: [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)

enhancement
model-weights

### Bug description When running the pretraining example: ```python mkdir -p custom_texts curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt # 1) Download a tokenizer litgpt download EleutherAI/pythia-160m \ --tokenizer_only...

bug

There should probably be an option to disable the KV cache in the Python API as part of the compute/memory trade-off story. (Also, it could perhaps be useful for debugging.)

enhancement

This PR increases the version since there have been a bunch of changes/fixes since the last release. This makes it a bit easier to detect which version is currently installed...

package

Fixes the link to the GPT-2 paper.