fpm
fpm copied to clipboard
Improve text file reading performance
Description
- [x] using smaller buffer size in
getline; - [x] update
read_linesusing binary reading; - [x] fix CRLF.
Use smaller buffer size in getline
I'm trying to improve the efficiency of reading text files:
- removing
number_of_rowsroutine; - using smaller buffer size;
- using
advance='yes'read.
Local data proves that all three of them can improve read efficiency to some extent. However, they fail to have an order of magnitude improvement effect.
Among them, using a smaller buffer size is the least change to the fpm code, I tested in Windows OS and Ubuntu Linux environment, the two trends are basically the same, the following gives the time-consuming evaluation image under Windows OS and Ubuntu Linux environment:
Time consumed to read a certain 177-line *.f90 file 1000 times:
Compared to 32768, using a smaller line length buffer, such as 1024 (toml-f using 4096), is more in line with fpm's common file read scenarios, and at the same time we can get a 26%~52% read performance improvement.
(Win: Windows OS; GFortran: GCC Fortran; IFX: Intel oneAPI ifx)
Pseudocode
use fpm_filesystem, only: read_lines
...
open (1, file='src/readfile.f90', status='old', action='read')
call tmr%tic()
do i = 1, 1000
rewind (1)
lines = read_lines(1)
end do
print *, 'Elapsed time: ', tmr%toc(), 's'
Also see this repo.
Related links
- https://github.com/fortran-lang/fpm/discussions/694
Update read_lines using binary reading
I tried to read text files in C and found it much faster than Fortran. Taking a cue from @Euler-37 , I used the binary way of reading text files, which is the ideal reader, and you can see similar code in fortran-lang/http-client.
Using binary reading ditches the encoding formatting process, and while the original fpm-0.9.0 took 0.7970s to read the file, the current solution only takes 0.062s, an order of magnitude improvement. When I run the command time fpm build --show-model in my local fpm repository:
fpm-0.9.0: time consumed0:01.24s;- this PR: time consumed
0:00.86s.
That's a 30.65% speedup, which I think is worth celebrating.
Ensure thread safety
For thread-safety, local allocatable arrays are used to record the start and end indexes of the lines, which reduces performance a bit, but may be able to lay the groundwork for subsequent parallel binary reads.
On Windows, fpm build --show-model has an 18.81% performance improvement.
By the way, I'm posting here a running hotspot diagram (fpm-debug build ---show-model) using Intel Vtune for Windows:
This PR changes the way fpm reads text files from reading characters by line to reading all binary bytes at once, which may reduce the time it takes to read files, and doesn't change much of fpm's other behavior:
- Reduced the cache length in
getlineto adapt to the fpm scenario; - Add
read_text_filebinary mode to read the content of the text file.
There is nothing left to update in this PR, and if the change in the way the file is read is considered beneficial, then this PR is passable.
@zoziha Is this PR ready to merge ? , I have resolved the conflicts.
Thanks for reviewing, @henilp105 . Okay, nothing more to add, let's merge it.