wfdb-python icon indicating copy to clipboard operation
wfdb-python copied to clipboard

Performance issue while reading the data with FLAC file format via HTTP.

Open vitaldb opened this issue 3 years ago • 0 comments

I tested the performance of wfdb python library for reading waveforms via HTTP.

  • mimic3wdb
dtstart = datetime.datetime.now()
wfdb.rdsamp('3000003', pn_dir='mimic3wdb/1.0/30/3000003')
print(datetime.datetime.now() - dtstart)
# results 0:00:21.143365
  • mitdb
dtstart = datetime.datetime.now()
wfdb.rdsamp('100', pn_dir='mitdb/1.0.0')
print(datetime.datetime.now() - dtstart)
# results 0:00:02.764091

It looks great. However, when I tried to read the mimic4wdb which has FLAC format, there was a significant performance decrease.

  • mimic4wdb
dtstart = datetime.datetime.now()
wfdb.rdsamp('81739927', pn_dir='mimic4wdb/0.1.0/waves/p100/p10014354/81739927')
print(datetime.datetime.now() - dtstart)
# results 0:07:26.220685

This issue was resolved when I cached files with the buffering = -2 for openurl function in _url.py. -> results 0:00:37.388115

After digging a bit more, I figured out that this problem is caused by repeatedly calling read function frame by frame in the _cdata_io function in soundfile.py. Whenever the read function is called, session.request in _url.py is called and HTTP communication is established. This can cause significant performance problem and also make stress to the web server.

So it seems a good idea to change buffering=-2 to default until this is fixed. Reducing the number of requests is much more efficient in both improving the performance and reducing the load of the web server.

vitaldb avatar Sep 29 '22 20:09 vitaldb