Progress bar is relatively slow
Hello,
I was recently trying to update MEM code to improve performance and realize that a significant amount of time was spent updating the progress bar. This is especially true for large data, for which bst_progress('inc', 1); would be call a large amount of time.
Here is a very small program showing the issue:
profile on -historysize 10000000000
main(1e4);
profile viewer
function main(n)
bst_progress('start', 'Test progress bar... ' , 'Test progress bar... ', 1, n);
for i = 1:n
M = randn(3,3);
bst_progress('inc', 1);
end
bst_progress('stop');
end
The program run in 51 seconds, with 50 seconds spent in bst_progress. In the MEM case, it was not that extrem as the lopp was doing something but still bst_progress represented ~25% of the runtime.
In theory, this can be solved by limiting the amount of time bst_progress is called :
function main2(n)
bst_progress('start', 'Test progress bar... ' , 'Test progress bar... ', 1, 100);
freq_update_progress_bar = round(n / 100);
for i = 1:n
M = randn(3,3);
if mod(i, freq_update_progress_bar) == 0
bst_progress('inc', 1);
end
end
bst_progress('stop');
end
In this case, main2 run in 0.644s. However, this trick does not work for par-for loops:
function main3(n)
bst_progress('start', 'Test progress bar... ' , 'Test progress bar... ', 1, n);
q = parallel.pool.DataQueue;
afterEach(q, @(x) bst_progress('inc', 1));
for i = 1:n
M = randn(3,3);
send(q, 1);
end
bst_progress('stop');
end
Main3 ran in 74 seconds :
Few observation:
- it seems that getting information from java is taking some times. Maybe we can save a copy in matlab so we don't have to call getValue and getMaximum from the jProgressBar
- setValue seems also costly so maybe we can update the bar only very time a new % is reach.
Edouard
The program run in 51 seconds, with 50 seconds spent in bst_progress. In the MEM case, it was not that extrem as the lopp was doing something but still bst_progress represented ~25% of the runtime.
This is strange. The same test runs over here in 5s, that is a huge difference with respect to 50s
About the parfor solution, while the iteration order is not deterministic, we can still count how many mod() == 0 have been executed as a rough estimate of the progress. Using global variables seems overkilling.
function main4(n)
bst_progress('start', 'Test progress bar... ' , 'Test progress bar... ', 1, n);
step = ceil(n/100);
q = parallel.pool.DataQueue;
afterEach(q, @(x) bst_progress('inc', step));
parfor i = 1:n
M = randn(3,3);
if mod(i, step) == 0
send(q, i); % Send data back to the client
end
end
bst_progress('stop');
end
I just tried on Linux and indeed it's 5 seconds. but much slower on my Mac. Although the test today shows more ~20 seconds than 50... weird.
About the parfor solution, while the iteration order is not deterministic, we can still count how many mod() == 0
I think it would be nicer to have this hidden in the code of the progress bar so we don't have to think about it for every progress bar.
On linux, I am using 'Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode' My Mac is using ' 'Java 1.8.0_392-b08 with Amazon.com Inc. OpenJDK 64-Bit Server VM mixed mode''
The program run in 51 seconds, with 50 seconds spent in bst_progress. In the MEM case, it was not that extrem as the lopp was doing something but still bst_progress represented ~25% of the runtime.
I also tried running this in my windows laptop and it ran in under 5s.
Closed in improve progress bar performance (#804)