Rejnald Lleshi

Results 74 comments of Rejnald Lleshi

@johnnv1 thanks for the quick answer. Unfortunately, that's only ~2G less GPU intensive.

Hi @aarnphm this seems to be happening when you define the `docker.env` field. So if you have a different set of variables on `envs` and you define another set of...

that would be helpful as I ended up wasting quite a bit of time on this

Hi @mmathew23 , thanks for ur prompt response. Sorry about the missing info above. Here are more details. Gemma3 1B takes 1.8G of memory (via ollama) on the Jetson. I...

Well, before the above-error message is thrown my RAM overflows, but you're right it doesn't throw the conventional OOM error. So if I just do ``` model, tokenizer = FastModel.from_pretrained(...

Jetson Orin Nano doesn't have a dedicated VRAM GPU memory, but rather has a unified shared memory. So the RAM memory is used as VRAM memory and vice versa. Correct,...

@mmathew23 kind reminder

@mmathew23 `FastModel.from_pretrained()` is where it runs out of memory. Although this only happens if I'm loading my own lora weights. Here's the verbose stack trace that you asked for: ```...

@mmathew23 just wondering if you have any other suggestions before we move on from this and fully commit to ollama.

@mmathew23 We want to use unsloth for training & inference but if we cannot do inference with it then we're hoping to convert the models for OLLama inference (vllm was...