fpm icon indicating copy to clipboard operation
fpm copied to clipboard

Improve text file reading performance

Open zoziha opened this issue 2 years ago • 3 comments

Description

  • [x] using smaller buffer size in getline;
  • [x] update read_lines using binary reading;
  • [x] fix CRLF.

Use smaller buffer size in getline

I'm trying to improve the efficiency of reading text files:

  1. removing number_of_rows routine;
  2. using smaller buffer size;
  3. using advance='yes' read.

Local data proves that all three of them can improve read efficiency to some extent. However, they fail to have an order of magnitude improvement effect. Among them, using a smaller buffer size is the least change to the fpm code, I tested in Windows OS and Ubuntu Linux environment, the two trends are basically the same, the following gives the time-consuming evaluation image under Windows OS and Ubuntu Linux environment:

Time consumed to read a certain 177-line *.f90 file 1000 times: Compared to 32768, using a smaller line length buffer, such as 1024 (toml-f using 4096), is more in line with fpm's common file read scenarios, and at the same time we can get a 26%~52% read performance improvement. image (Win: Windows OS; GFortran: GCC Fortran; IFX: Intel oneAPI ifx)

Pseudocode
use fpm_filesystem, only: read_lines
...
open (1, file='src/readfile.f90', status='old', action='read')
call tmr%tic()
do i = 1, 1000
    rewind (1)
    lines = read_lines(1)
end do
print *, 'Elapsed time: ', tmr%toc(), 's'

Also see this repo.

Related links

  • https://github.com/fortran-lang/fpm/discussions/694

zoziha avatar Sep 01 '23 08:09 zoziha

Update read_lines using binary reading

I tried to read text files in C and found it much faster than Fortran. Taking a cue from @Euler-37 , I used the binary way of reading text files, which is the ideal reader, and you can see similar code in fortran-lang/http-client.

Using binary reading ditches the encoding formatting process, and while the original fpm-0.9.0 took 0.7970s to read the file, the current solution only takes 0.062s, an order of magnitude improvement. When I run the command time fpm build --show-model in my local fpm repository:

  • fpm-0.9.0: time consumed 0:01.24 s;
  • this PR: time consumed 0:00.86 s.

That's a 30.65% speedup, which I think is worth celebrating.

zoziha avatar Sep 01 '23 12:09 zoziha

Ensure thread safety

For thread-safety, local allocatable arrays are used to record the start and end indexes of the lines, which reduces performance a bit, but may be able to lay the groundwork for subsequent parallel binary reads. On Windows, fpm build --show-model has an 18.81% performance improvement.

By the way, I'm posting here a running hotspot diagram (fpm-debug build ---show-model) using Intel Vtune for Windows: image

zoziha avatar Sep 05 '23 09:09 zoziha

This PR changes the way fpm reads text files from reading characters by line to reading all binary bytes at once, which may reduce the time it takes to read files, and doesn't change much of fpm's other behavior:

  1. Reduced the cache length in getline to adapt to the fpm scenario;
  2. Add read_text_file binary mode to read the content of the text file.

There is nothing left to update in this PR, and if the change in the way the file is read is considered beneficial, then this PR is passable.

zoziha avatar Dec 19 '23 17:12 zoziha

@zoziha Is this PR ready to merge ? , I have resolved the conflicts.

henilp105 avatar Mar 29 '24 05:03 henilp105

Thanks for reviewing, @henilp105 . Okay, nothing more to add, let's merge it.

zoziha avatar Mar 29 '24 06:03 zoziha