whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Memory allocation in Windows

Open vitacon opened this issue 2 years ago • 11 comments

I don't know how many people noticed that but the memory handling seems to be rather suspicious to me.

I have to use large model to get a usable results with Czech recordings and I often hit the memory limit. I can analyze about 10 minutes of audio on 16 GB RAM and about 20 minutes on 32 GB RAM.

The strange thing is that Task Manager does not quite reflect it. The Main task seems to be always using about 5 GB of RAM but the size of available RAM is decreasing while Main is working and the "disappearing" memory it is not attached to any task. When it hits the memory limit Windows start to swap hard and the size of the memory allocated to Main decreases but the whole memory is still unavailable. However, the memory becomes available again once the Main task is killed.

Isn't it a memory leak?

Start - 32 % of RAM mem1

Almost full - 92 % of RAM mem1+

Full - 99 % of RAM mem2

vitacon avatar Dec 25 '22 15:12 vitacon

Weird - what main parameters are you using when this happens?

ggerganov avatar Dec 25 '22 16:12 ggerganov

Nothing too fancy. Just variations of this:

main -t 10 -l cs -m models\ggml-model-whisper-large.bin --output-srt -f temp.wav

I think it happens with smaller models too but it's just less apparent and you have to use longer input audio to hit the ceiling.

These screenshots are from a conversion of a 7 minutes long video using Medium model:

Beginning whisper-1

At 6:30 whisper-2

55 - 44 = 11 % of 16 GB memory = almost 1.76 GB is unavailable while the memory allocated to Main has increased just about 100 MB.

Edit: I used your latest build "whisper-bin-x64" from ggml : simplify the SIMD code (#324)

vitacon avatar Dec 26 '22 10:12 vitacon

Some additional information:

I suspected the amount of locked memory depends on the number of threads and it really does. The previous test with Medium model and 11 threads raised the amount by 11 % points (44 % to 55 %). Today's tests: 4 threads -> 4 % points (42 % to 46 %) and 1 thread -> 1 % (42 % to 43%).

That's a bit disappointing because my theory was that the additional memory was locked by the parallel threads but it seems it happens even with a single thread - except the difference is really small.

vitacon avatar Dec 27 '22 18:12 vitacon

@vitacon Whenever you get the chance, please try the latest version of the code and see if the issue persists.

ggerganov avatar Dec 30 '22 11:12 ggerganov

Thanks for your investigation. I used the latest build whisper : avoid some memory allocations.

Last time I used a 7:30 long video, 11 threads, Medium model and the locked memory raised by 11 % points.

Today I used the same video, 11 threads and Large model to get more visible differences. The raise was 15 % points (48 % in the beginning and 73 % in the end).

vitacon avatar Dec 30 '22 15:12 vitacon

Damn, I think I previously mixed up data from two different computers so they were a bit misleading. After a few more tests on just one computer (Ryzen with 16 GB] it seems the differences of different builds are hardly measurable. It also seems the numbers slightly vary each time.

Czech audio of 7:30

Build		Threads	Model	MemChange

gg20221224	11	Large	53->70	17 %
gg20221230	11	Large		15 %
vc20221207	11	Large	57->71	14 %

gg20221224	11	Medium	41->52	11 %
gg20221230	11	Medium	42->53	11 %
vc20221207	11	Medium		12 %

vitacon avatar Dec 30 '22 19:12 vitacon

Thanks for the information.

I don't have Windows machine so I cannot investigate in details. I don't observe this behaviour on Mac OS and Ubuntu, so it seems to be Windows related.

I think it is some side effect of STL containers fragmenting the memory, but not sure.

ggerganov avatar Dec 31 '22 08:12 ggerganov

I don't have Windows machine so I cannot investigate in details.

I understand that it is a big obstacle. I suppose it would be necessary to add some debug messages and check the amount of available physical memory (something like ComputerInfo().AvailablePhysicalMemory) during the run of the program to find the exact place where the memory "disappears". Unfortunately I can't experiment with it enough because of my other project with higher priority.

vitacon avatar Dec 31 '22 09:12 vitacon

I've written a C++/MFC app that uses Whisper. It is performing real time transcription using the tiny model. I don't see any memory leaking, either by observing Resource Monitor, Task Manager, or my built in call to ::GetProcessMemoryInfo(). Under Visual Studio 2022, upon shutdown, there are no reports of memory leaks. You might want to use Resource Monitor, it shows working and committed memory. Perhaps your memory leak is related to your particular usage of Whisper. My methods are similiar to the stream sample.

UPDATE: I tried using the medium English model. Still no memory leak.

What is your environment? What kind of app? Are you shelling out to command shell? Streaming live audio or WAV files?

I've been hammering on Whisper for a couple weeks now, similiar architecture as ggerganov's stream example, with no leaks at all. I think you are doing something different that is causing it. Please provide some more details on your environment and perhaps i can help.

RndyP avatar Dec 31 '22 18:12 RndyP

I think it is some side effect of STL containers fragmenting the memory, but not sure. I've written some pretty memory intensive software with extremely large STL vectors and my experience is it will only start page swapping if you are REALLY out of memory. Whisper is using reserve() in a couple places, so I'm sure you know about it.

RndyP avatar Dec 31 '22 18:12 RndyP

@RndyP I've seen the described behaviour by @vicalloy sometime ago occurring on a Linux machine.

It was a completely correct C++ project, without memory leaks, but it was allocating many small std::vector and other STL containers. My investigation back then showed that when these containers go out of scope, the memory is not actually released back to the operating system. It's some sort of optimisation in the default allocator for small object (if I understood correctly).

For example, I could fix the issue of the growing memory usage by increasing the M_MMAP_THRESHOLD environment variable to be larger than the max size of the STL containers that we were allocating. Same code, no rebuild, just increase the environment variable and the problem disappears. (edit: either increase or decrease the variable - forgot which way it was)

The described issue sounded similar, so that's why in 68daf6e487d3a61d55048069345d638aeacc8171 I avoided the std::vector allocations in whisper_sample_best(). Unfortunately, it does not seem to help, and I don't think we have any other STL containers that are aggressively created and destroyed in the code. So yeah - not sure what is causing the issue here.

What is your environment? What kind of app? Are you shelling out to command shell? Streaming live audio or WAV files?

@vicalloy is using the pre-compiled binaries by the Windows Github action in this repo.

ggerganov avatar Jan 02 '23 08:01 ggerganov