fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Why the nllb 3.3B will still occupy 5.7GB memory while the model has loaded to GPU and occupied 13.17GB GPU-memory?

Open micronetboy opened this issue 1 year ago • 0 comments

Why the nllb 3.3B will still occupy 5.7GB memory while the model has loaded to GPU and occupied 13.17GB GPU-memory? In my opinion, when the model load to GPU, the memory will be very low.

GPU : Nvidia A100 80G PCIe

My code : model_name = "facebook/nllb-200-3.3B" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForSeq2SeqLM.from_pretrained(model_name, trust_remote_code=True).cuda() translator = pipeline( 'translation', model=model, tokenizer=tokenizer, src_lang=source_lang, tgt_lang=target_lang, max_length=max_length, device=device )

micronetboy avatar Feb 04 '24 09:02 micronetboy