matmulfreellm No reduction in VRAM usage

No reduction in VRAM usage

Open radna0 opened this issue 8 months ago • 5 comments

I tried running the following code, with just having the ```ridger/MMfreeLM-1.3B```` model initialized:

root@r4-0:~/matmulfreellm# python
>>> import os
>>> os.environ["TOKENIZERS_PARALLELISM"] = "false"
>>> import mmfreelm
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> # Change here to our open-sourced model
>>> name = "ridger/MMfreeLM-1.3B"
>>> tokenizer = AutoTokenizer.from_pretrained(name)
>>> model = AutoModelForCausalLM.from_pretrained(name).cuda().half()

Having another terminal opened with 'watch rocm-smi', showing 68% VRAM usage meaning about 5.5GB

Every 2.0s: rocm-smi                                                                                        r4-0: Wed Jun 12 12:16:17 2024



======================================== ROCm System Management Interface ========================================
================================================== Concise Info ==================================================
Device  [Model : Revision]    Temp    Power     Partitions      SCLK    MCLK    Fan    Perf  PwrCap  VRAM%  GPU%
        Name (20 chars)       (Edge)  (Socket)  (Mem, Compute)                                                    
==================================================================================================================
0       [RX Vega64 : 0xc1]    30.0°C  11.0W     N/A, N/A        852Mhz  167Mhz  9.41%  auto  220.0W   68%   0%
        Vega 10 XL/XT [Radeo
==================================================================================================================
============================================== End of ROCm SMI Log ===============================================

Contradicting what was said in the paper?

Jun 12 '24 12:06 radna0

matmulfreellm matmulfreellm copied to clipboard

No reduction in VRAM usage

matmulfreellm
matmulfreellm copied to clipboard