matmulfreellm
matmulfreellm copied to clipboard
No reduction in VRAM usage
I tried running the following code, with just having the ```ridger/MMfreeLM-1.3B```` model initialized:
root@r4-0:~/matmulfreellm# python
>>> import os
>>> os.environ["TOKENIZERS_PARALLELISM"] = "false"
>>> import mmfreelm
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> # Change here to our open-sourced model
>>> name = "ridger/MMfreeLM-1.3B"
>>> tokenizer = AutoTokenizer.from_pretrained(name)
>>> model = AutoModelForCausalLM.from_pretrained(name).cuda().half()
Having another terminal opened with 'watch rocm-smi', showing 68% VRAM usage meaning about 5.5GB
Every 2.0s: rocm-smi r4-0: Wed Jun 12 12:16:17 2024
======================================== ROCm System Management Interface ========================================
================================================== Concise Info ==================================================
Device [Model : Revision] Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
Name (20 chars) (Edge) (Socket) (Mem, Compute)
==================================================================================================================
0 [RX Vega64 : 0xc1] 30.0°C 11.0W N/A, N/A 852Mhz 167Mhz 9.41% auto 220.0W 68% 0%
Vega 10 XL/XT [Radeo
==================================================================================================================
============================================== End of ROCm SMI Log ===============================================
Contradicting what was said in the paper?