visidata icon indicating copy to clipboard operation
visidata copied to clipboard

[tsv] decrease tsv buffer size for slow input streams

Open SuRaMoN opened this issue 1 year ago • 3 comments

Right now when piping vd with tsv loader to a slow stream, it takes pretty long before you see any data. This is caused by https://github.com/saulpw/visidata/commit/08900503ce638461b6d380c4fe0ec7bad13bac6d

Example of slow stream: for p in $(seq 0 100); do echo -e "a\tb"; sleep 1; done | vd -f tsv

It takes 100 seconds to show anything (when stream is finished). So i modified the buffer size to flush the buffer every second. this shows results for slow streams fast, but for fast streams, we still have the 10% speed increase of https://github.com/saulpw/visidata/commit/08900503ce638461b6d380c4fe0ec7bad13bac6d because the buffer size adapts itself to fast streams

SuRaMoN avatar Sep 02 '22 18:09 SuRaMoN

its possible, but it will probably be more of a hassle.

  • os.read returns bytes, so will have to be converted with iterdecode
  • os.read completely ignores all magic done in path.py:Path (i'm thinking about reading compressed files)
  • i'm afraid it wont work in windows for e.g. stdin https://stackoverflow.com/questions/323829/how-to-find-out-if-there-is-data-to-be-read-from-stdin-on-windows-in-python

i'll try some stuff out and i'll get back to you, but i dont think it will be an elegant solution

SuRaMoN avatar Sep 06 '22 12:09 SuRaMoN

Its as a expected:

import gzip
o = open("t.tsv.gz", "rb")
g = gzip.open(o)
print(o.fileno(), g.fileno())

gives:

3 3

So i cant use os.read for compressed files. I dont see any feasable solution with os.read

SuRaMoN avatar Sep 08 '22 19:09 SuRaMoN

I fixed tests and added documentation

For performance tests, i did this, does it suffice, or do you want something integrated in the automated tests?

$ git checkout tsv

$ cat open-and-quit.vdj 
#!vd -p
{"longname": "open-file", "input": "t2.tsv", "keystrokes": "o"}
{"sheet": "t2", "col": "", "row": "", "longname": "quit-all", "input": "", "keystrokes": "", "comment": ""}

$ >t.tsv; >t2.tsv; for P in $(seq 0 10000); do echo -e "a\tb\tc" >> t.tsv; done; for P in $(seq 0 2000); do cat t.tsv >> t2.tsv; done; du -hs t2.tsv
115M	t2.tsv

$ time vd --play open-and-quit.vdj t2.tsv 

real	3m15.688s
user	3m12.871s
sys	0m4.354s

$ git checkout stable

$ time vd --play open-and-quit.vdj t2.tsv 

real	3m17.077s
user	3m14.395s
sys	0m4.155s



SuRaMoN avatar Sep 11 '22 11:09 SuRaMoN