ExtReMLapin

Results 239 comments of ExtReMLapin

PR #19084 Fixes this issue. When working with contexts of 70k, with the model loaded + the context it uses something like 30Gb of vram, but during inference it goes...

I really don't get why no one yet opened a PR to replace the placeholders by the actual description/ocr

I wrote a script that remove some images of wandb ```python import wandb api = wandb.Api() entitiy = "your_wandb_username_or_organization" projects = api.projects(entitiy) def is_image_from_name(name): return name.endswith(".png") or name.endswith(".jpg") or name.endswith(".jpeg")...

What the the performance when comparing python transformers to llama.cpp run ?

Thanks for the answer, unless there is a typo somewhere, I expected it to be faster on llama.cpp

Any news on this ? Also what @viantirreau suggested would be top notch.

Average 'I can't code but I want to be in the contributors list' PR

Not stale, I fixed this issue in llama.cpp but vllm has the same issue : https://github.com/vllm-project/vllm/blob/4fbd8bb597cf392b94def04a6955f22580356d76/vllm/entrypoints/openai/protocol.py#L712C9-L712C35 It's generating a json schema without allowing for the thinking tags Llama.cpp issue https://github.com/ggml-org/llama.cpp/issues/15247...

it's zluda's job to support ctranslate2