jukofyork comments

Results 57 comments of


jukofyork

feat: add support for flash_attn

:+1: Hopefully this does get merged and not just left to die a painful "death-by-conflicts" like so many other PRs have already! :frowning_face: I'm getting double the context on some...

feat: add support for flash_attn

Working really well! :+1:

Mistral Instruct models prompt does not use <s> or </s>

> from https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1#instruction-format > > > [INST] Instruction [/INST] Model answer [INST] Follow-up instruction [/INST] > > I didn't see `` as part of the prompt when using `OLLAMA_DEBUG=1 ollama...

llama3-instruct models not stopping at stop token

https://github.com/ggerganov/llama.cpp/issues/6772 I edited my gguf to use the `` token but it still prints it out? Using `gguf-dump` I can confirm I have made the change from the reddit thread...

llama3-instruct models not stopping at stop token

``` > llama.cpp/gguf-py/scripts/gguf-dump.py --no-tensors llama3:70b-instruct-q8_0.gguf * Loading: llama3:70b-instruct-q8_0.gguf * File is LITTLE endian, script is running on a LITTLE endian host. * Dumping 24 key/value pair(s) 1: UINT32 | 1...

llama3-instruct models not stopping at stop token

> Something is not right with the 70B model IMHO - I'm using the q4_K_M and the q4_0 version with crewai, and it diverges so much from the cloud version...

Code view on codellama vs phi and dolphin-phi

You can try appending this to the SYSTEM message in the modelfile: **When providing code examples always use Markdown: use ``` to wrap code blocks and use ` to denote...

Add cli switch to show generation time and tokens/sec output time

You can use the - -verbose command line option to do this: ``` > ollama run --help Run a model Usage: ollama run MODEL [PROMPT] [flags] Flags: --format string Response...

Please evaluate `Eurus-70b-nca-fixed` and `Eurus-70b-sft-fixed` (first public fine-tunes of base `codellama-70b`!!!)

# Eurus-70b-nca-fixed ### USER Write a simple in memory database using C++ and Boost multiindex. Make the primary key a datetime and use a composite key so we can search...

Please evaluate `Eurus-70b-nca-fixed` and `Eurus-70b-sft-fixed` (first public fine-tunes of base `codellama-70b`!!!)

# Eurus-70b-sft-fixed ### USER Write a simple in memory database using C++ and Boost multiindex. Make the primary key a datetime and use a composite key so we can search...