paolovic

Results 51 comments of paolovic

Somehow I don't see the recommendation to apply model.eval here @Parskatt , but thank you, I changed my implementation to ``` batch = {"im_A": query_images, "im_B": ref_batch_images} roma_model.eval() with torch.inference_mode():...

> Yeah github bugged for me and showed my comment as duplicated, removed one and both disappeared... alright, in any case thank you very much!

For me the same, it basically takes the whole GPU, in my case almost 11GB this is how I could reduce it ``` import tensorflow as tf physical_devices = tf.config.list_physical_devices('GPU')...

> @paolovic Hi, can you tell me in more detail how you solved it? Hi, I inserted the snippet from my previous post after the imports to the omniglue_demo script,...

well...I know that's what I wrote, maybe I was not able to express my case clearly enough... but I cannot "open" my internet connection, I work in a restricted environment...

@youkaichao @ringos Thank you very much! I'll try it out and will come back to you! Best regards

@youkaichao @ringos Thank you very much for your support! Finally, ringos' approach did the trick for me `GIT_REPOSITORY https://github.com/nvidia/cutlass.git` => `GIT_REPOSITORY `.

I have 2x L40s cannot reproduce with `Meta-Llama-3.1-8B-Instruct-quantized.w8a16` https://huggingface.co/RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 invoked like this: ```bash VLLM_USE_V1=1 vllm serve Meta-Llama-3.1-8B-Instruct-quantized.w8a16 --host 0.0.0.0 --served-model-name llama3.1-8B llama3.1-8B-Int8 --port 8000 --max-model-len 65536 --enable-auto-tool-choice --tool-call-parser llama3_json ```...

Hi @manitadayon , I am downloading https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1 now. How did you quantize it? Using huggingface + autogptq? How many bits? Thank you and best regards

@manitadayon Mistral in half precision is larger than `nvidia/Llama-3_3-Nemotron-Super-49B-v1` in 4-bit. Anyway, as I was hoping for a memory leak, I was hoping they would lead to an OOM error...