libconfini icon indicating copy to clipboard operation
libconfini copied to clipboard

Re: TOML performance

Open arp242 opened this issue 2 years ago • 2 comments

I'm not sure where else to report this, but re: https://github.com/madmurphy/libconfini/wiki/An-INI-critique-of-TOML#18-performance

This is really a tomlc99 issue, not a TOML issue. The problem is that it allocates memory in 1,000 byte chucks, which takes quite a while (see start of toml_parse_file()), and after this it spends most of its time in check_key(), which loops over an array of keys, and is called every time it parses a new key, so ... yeah,

But these are not issues with TOML the file format. Other parsers are much faster; for example:

% ls -lh ~/50M.toml
-rw-r--r-- 1 martin martin 49M Sep 30 20:23 /home/martin/50M.toml

With C++:

% time ./tomlv++ ~/50M.toml
./tomlv++ ~/50M.toml  2.19s user 0.15s system 99% cpu 2.348 total

Go:

% time tomlv-go ~/50M.toml
tomlv-go ~/50M.toml  6.31s user 0.51s system 154% cpu 4.421 total

And Python:

% time ./tomlv.py ~/50M.toml
./tomlv.py ~/50M.toml  11.95s user 0.42s system 99% cpu 12.413 total

None are spectacularly fast as such, but orders of magnitudes faster than tomlc99, which is the outlier here. I also don't know how this compares to the 50M test-file you used (the TOML file lacks comments and is more "dense" in syntax than the big_file.ini that your script generates, so it's probably not a fair comparison).

Most of the other points in that article are more subjective, but I felt this one was clearly wrong.

arp242 avatar Oct 01 '23 02:10 arp242

@arp242

Thank you for pointing that out. I can write a paragraph about it in the next days (but I am not sure I will have time before mid October).

I also don't know how this compares to the 50M test-file you used (the TOML file lacks comments and is more "dense" in syntax than the big_file.ini that your script generates, so it's probably not a fair comparison).

If I don't remember wrong, I used the INI file generated by dev/tests/performance/prepare.sh (you will need to clone the repository before launching that script). Maybe you could re-post the timings you get using this file?

--madmurphy

madmurphy avatar Oct 01 '23 02:10 madmurphy

I can write a paragraph about it in the next days (but I am not sure I will have time before mid October).

No hurries; I came across this almost 2 weeks ago. I also hope to fix the worst of the issue in tomlc99 when I have the time (not my code, so not entirely sure how hard it is).

Maybe you could re-post the timings you get using this file?

That file doesn't produce valid TOML. A few weeks ago I looked in to it and manually fixed some obvious things, but then you run in to:

% tomlv ./big_file.ini
Error in './big_file.ini': toml: line 35: Key 'section' has already been defined.

So this needs more work.

I wrote a little script to frob with TOML table and key names and generated my 50M TOML file with that, but you need a "real" parser for this (can't just s/../../ it), and this won't output comments, so hard to really reproduce that big_file.ini in a way that's valid TOML at the moment.

tomlc99 also can't parse it, which is why you got the error. I wouldn't be surprised if all of those 13 minutes you spent waiting was just for the allocation, and it didn't actually parse anything.

arp242 avatar Oct 01 '23 02:10 arp242