fpm Improve text file reading performance

Description

[x] using smaller buffer size in getline;
[x] update read_lines using binary reading;
[x] fix CRLF.

Use smaller buffer size in getline

I'm trying to improve the efficiency of reading text files:

removing number_of_rows routine;
using smaller buffer size;
using advance='yes' read.

Local data proves that all three of them can improve read efficiency to some extent. However, they fail to have an order of magnitude improvement effect. Among them, using a smaller buffer size is the least change to the fpm code, I tested in Windows OS and Ubuntu Linux environment, the two trends are basically the same, the following gives the time-consuming evaluation image under Windows OS and Ubuntu Linux environment:

Time consumed to read a certain 177-line *.f90 file 1000 times: Compared to 32768, using a smaller line length buffer, such as 1024 (toml-f using 4096), is more in line with fpm's common file read scenarios, and at the same time we can get a 26%~52% read performance improvement. (Win: Windows OS; GFortran: GCC Fortran; IFX: Intel oneAPI ifx)

Pseudocode

use fpm_filesystem, only: read_lines
...
open (1, file='src/readfile.f90', status='old', action='read')
call tmr%tic()
do i = 1, 1000
    rewind (1)
    lines = read_lines(1)
end do
print *, 'Elapsed time: ', tmr%toc(), 's'

Also see this repo.

Update read_lines using binary reading

I tried to read text files in C and found it much faster than Fortran. Taking a cue from @Euler-37 , I used the binary way of reading text files, which is the ideal reader, and you can see similar code in fortran-lang/http-client.

Using binary reading ditches the encoding formatting process, and while the original fpm-0.9.0 took 0.7970s to read the file, the current solution only takes 0.062s, an order of magnitude improvement. When I run the command time fpm build --show-model in my local fpm repository:

fpm-0.9.0: time consumed 0:01.24 s;
this PR: time consumed 0:00.86 s.

That's a 30.65% speedup, which I think is worth celebrating.

Sep 01 '23 12:09 zoziha

Ensure thread safety

For thread-safety, local allocatable arrays are used to record the start and end indexes of the lines, which reduces performance a bit, but may be able to lay the groundwork for subsequent parallel binary reads. On Windows, fpm build --show-model has an 18.81% performance improvement.

By the way, I'm posting here a running hotspot diagram (fpm-debug build ---show-model) using Intel Vtune for Windows:

Sep 05 '23 09:09 zoziha

This PR changes the way fpm reads text files from reading characters by line to reading all binary bytes at once, which may reduce the time it takes to read files, and doesn't change much of fpm's other behavior:

Reduced the cache length in getline to adapt to the fpm scenario;
Add read_text_file binary mode to read the content of the text file.

There is nothing left to update in this PR, and if the change in the way the file is read is considered beneficial, then this PR is passable.

Dec 19 '23 17:12 zoziha

@zoziha Is this PR ready to merge ? , I have resolved the conflicts.

Mar 29 '24 05:03 henilp105

Thanks for reviewing, @henilp105 . Okay, nothing more to add, let's merge it.

Mar 29 '24 06:03 zoziha

fpm
fpm copied to clipboard

Improve text file reading performance

Description

Use smaller buffer size in getline

Related links

Update read_lines using binary reading

Ensure thread safety

fpm fpm copied to clipboard

Improve text file reading performance

Description

Use smaller buffer size in getline

Related links

Update read_lines using binary reading

Ensure thread safety

fpm
fpm copied to clipboard