Nicolas Patry

Results 978 comments of Nicolas Patry

You send request: ``` input_ids = [A, B, C, A] # Those are tokens new_token_D, past = forward(input_ids) ``` That's a prefill step. Then we continue generating new tokens in...

> I understand that flash or sparse attention models won't use padding, but if the user generates a very long sequence, say thousands of tokens, how many of those tokens...

Try disabling it ? It should still download the model just a bit slower. `hf_transfer` is really barebones, and any flaky network might trigger issues for you (or because you're...

Isn't there a way for you to provide environement variables ? ``` HF_HUB_ENABLE_HF_TRANSFER=0 ``` Is what you are looking for.

Thanks for sharing the solution ! Closing this then

It should work, but you would need `--trust-remote-code` flag for it to work Can you provide a full stacktrace ?

I think the first one is a very easy fix we could implement. Today there are 3 issues about this conversion, so maybe making it a bit more robust/effective in...

Hi @mayurtikundi12 You need to work with latest for this model to work. We're going to release 0.9 soon which should work. @OlivierDehaene (For vis)

Try with `--auto-convert false`. This error happens when trying to convert to safetensors, but it shouldn't be required for non *core* models.