nntrainer Support memory swap

Support memory swap

Open jihochu opened this issue 1 year ago • 4 comments

It proposes Memory Swap to reduce memory usage. Unused tensor data is swap-out to the external storage with its data, and it will be swap-in when it needs the data. It does not fix maximum size of memory usage, but it uses only mandatory memory with pre-calculation.

swap-out an swap-in points can be pre-defined with execution order which is already implemented, so it can reduce resources to choose victim for swap-out. (Common cache algorithm(lru, lfu..) is unnecessary), And original memory pool already optimizes memory usage based on execution-order and it does not changes while running train phases. New cache pool is introduced and it's inherited from memory pool to utilize the optimized memory information. All allocated memory is linked and managed by cache pool, and the alloced memory location could be replaced for the new swap-in allocation.

For the draft, the swap management is only applied to tensor info (#1965) The trial test results are as below:

-	MIN	AVR	MAX
MNIST(orig)	17,112K	17,112K	17,112K
MNIST(swap)	16,114K	16,201K	16,308K
-	-	-5.32%	-
Resnet(orig)	231,728K	231,728K	231,728K
Resnet(swap)	157,048K	195,842K	232,268K
-	-	-15.48%	-

The result shows that

The peak memory usage is same or similar with non-swap case.
Original optimized memory plan utilizes almost whole allocated memory at least once.
Deallocation of the memory cannot retrive whole real memory due to the kernel policy.

Jul 19 '22 12:07 jihochu

:octocat: cibot: Thank you for posting issue #1966. The person in charge will reply soon.

Jul 19 '22 12:07 taos-ci

2nd Revision (#1965 is updated) It uses exact execution orders to obtain proper timing which we have to swap-out. At the 1st version, we keeps data alive while its usage is over. But, we can use execution order more slightly for find-grained timing control. For every execution order, unnecerray data is evicted, and only necessary data is loaded.

Beside to the 1st version, it is applied to both tensor and weight. Peak memory is reduced significantly, but the weight had no affects on peak memory. Detailed reason needs to be investigated. Defailed memory usage is presented in below:

MODEL	MIN	AVR	MAX	AVR(tensor+weight)	MAX(tensor+weight)
MNIST(orig)	16,060K	16,060K	16,060K	1,609K	1,609K
MNIST(swap)	13,828K	14,107K	15,372K	214K	920K
-	-13.8%	-12.1%	-4.2%	-86.6%	-42.8%
Resnet(orig)	230,648K	230,648K	230,648K	201,589K	201,589K
Resnet(swap)	15,928K	35,902K	189,632K	8,648K	175,392K
-	-93.0%	-84.4%	-17.7%	-95.7%	-12.9%

Aug 17 '22 06:08 jihochu

3rd Revision (#1965, #1987 updated)

Applied some opmizations:

flush initialized memory
optimize flush timing
remove unnecessary logs

Fixed some bugs

Fix for supporting multiple training
Fix for flush timing

MODEL	MIN	AVR	MAX	AVR(tensor+weight)	MAX(tensor+weight)
MNIST(orig)	16,852K	16,852K	16,852K	1,707K	1,707K
MNIST(swap)	15,548K	15,620K	15,692K	221K	942K
-	-7.7%	-7.3%	-6.8%	-87.0%	-44.8%
Resnet(orig)	231,800K	231,800K	231,800K	206,427K	206,427K
Resnet(swap)	20,512K	30,055K	69,372K	3,103K	38,576K
-	-91.1%	-87.0%	-70.0%	-98.4%	-81.3%
VGG16(orig)	320,524K	387,881K	389,524K	-	-
VGG16(swap)	16,976K	53,323K	119,560K	-	-
-	-94.7%	-86.2%	-69.3%	-	-

Aug 25 '22 02:08 jihochu

This is great work! Using this swapping, we can save much more memory now. We can train the Resnet like model under 100 MB memory!!!

Aug 25 '22 04:08 jijoongmoon

I'll revisit this issue later with prelaod(#2034), and disk I/O performace (not yet opened issue).

Nov 08 '22 01:11 jihochu

nntrainer nntrainer copied to clipboard

Support memory swap

nntrainer
nntrainer copied to clipboard