Pidgin
Pidgin copied to clipboard
Add Parlot to JsonBench
Parlot is a new parser combinator library by @sebastienros . I added it for reference to JsonBench by bringing the parser from Parlot's repository.
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 7 2700X, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.102
[Host] : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
DefaultJob : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
Method | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
BigJson_Pidgin | 430.3 μs | 1.12 μs | 1.05 μs | 1.00 | 0.00 | 24.9023 | 3.4180 | - | 101.7 KB |
BigJson_Sprache | 3,438.3 μs | 15.24 μs | 12.72 μs | 7.99 | 0.04 | 1308.5938 | 50.7813 | - | 5349.63 KB |
BigJson_Superpower | 1,793.6 μs | 7.88 μs | 7.37 μs | 4.17 | 0.02 | 222.6563 | 1.9531 | - | 913.43 KB |
BigJson_FParsec | 461.9 μs | 2.86 μs | 2.54 μs | 1.07 | 0.01 | 83.9844 | 0.9766 | - | 344.68 KB |
BigJson_Parlot | 256.1 μs | 0.43 μs | 0.38 μs | 0.60 | 0.00 | 24.9023 | 2.9297 | - | 101.8 KB |
LongJson_Pidgin | 383.8 μs | 1.09 μs | 0.97 μs | 1.00 | 0.00 | 25.3906 | 2.9297 | - | 104.25 KB |
LongJson_Sprache | 2,812.0 μs | 7.66 μs | 7.17 μs | 7.32 | 0.03 | 1054.6875 | 11.7188 | - | 4311.36 KB |
LongJson_Superpower | 1,458.4 μs | 11.98 μs | 10.62 μs | 3.80 | 0.03 | 171.8750 | 3.9063 | - | 706.79 KB |
LongJson_FParsec | 420.2 μs | 2.58 μs | 2.41 μs | 1.09 | 0.01 | 94.2383 | 1.4648 | - | 386.3 KB |
LongJson_Parlot | 213.5 μs | 0.82 μs | 0.73 μs | 0.56 | 0.00 | 25.3906 | 0.7324 | - | 104.35 KB |
DeepJson_Pidgin | 499.2 μs | 1.32 μs | 1.23 μs | 1.00 | 0.00 | 45.8984 | 0.9766 | - | 187.79 KB |
DeepJson_Sprache | 2,947.6 μs | 8.96 μs | 7.48 μs | 5.91 | 0.02 | 554.6875 | 222.6563 | - | 2946.56 KB |
DeepJson_FParsec | 473.1 μs | 1.24 μs | 1.03 μs | 0.95 | 0.00 | 84.4727 | 0.9766 | - | 346.43 KB |
DeepJson_Parlot | 171.5 μs | 1.05 μs | 0.93 μs | 0.34 | 0.00 | 20.0195 | - | - | 82.34 KB |
WideJson_Pidgin | 231.7 μs | 0.67 μs | 0.56 μs | 1.00 | 0.00 | 11.7188 | 0.2441 | - | 48.42 KB |
WideJson_Sprache | 1,631.0 μs | 5.51 μs | 4.30 μs | 7.04 | 0.02 | 683.5938 | 11.7188 | - | 2797.28 KB |
WideJson_Superpower | 899.7 μs | 0.44 μs | 0.41 μs | 3.88 | 0.01 | 112.3047 | 1.9531 | - | 459.74 KB |
WideJson_FParsec | 190.4 μs | 1.91 μs | 1.69 μs | 0.82 | 0.01 | 31.4941 | 3.9063 | - | 129.02 KB |
WideJson_Parlot | 155.9 μs | 0.33 μs | 0.30 μs | 0.67 | 0.00 | 11.7188 | 0.4883 | - | 48.52 KB |
Interesting. Looks like I can no longer claim to be the fastest in C#! 😉 I'm curious where Parlot gets its speed from. Is it purely down to the fact that Parlot does less thorough error reporting?
I have no clue where the difference could be. But it's easier to make something faster when you have a baseline. If you want to use this as an opportunity I'd suggest to check why Pidgin allocates so much more for the DeepJson scenario. This has the most difference.
I am not sure what you mean with thorough error reporting. Maybe I am not aware of a specific feature in Pidgin. In Parlot errors are reported explicitly with a custom parser construct. So if this parser is reached (or the previous fails) the error is reported. The only limitation I am aware of right now is that there is a single error message, so I need to improve it to continue parsing and report more errors when possible.
What we paid attention to for perf is ref structs, not creating results when not necessary, removing interface dispatch, and having most things strongly typed. I think I saw a few boxing code paths in Pidgin at some point, that could be a difference. I had a hard time removing such code paths while maintaining some consistent API.
Maybe the main thing is that @lahma seems to like making my dumb code faster ;) He knows all the tricks to gain a few ns here and there.
Re error reporting, Pidgin does quite a lot of work to keep track of what the parser was expecting to encounter, including across branches, so that I can give error messages like Expected "class" or "struct"
.
There's also a certain amount of overhead associated with supporting different types of input (that is, not always parsing from a string). That's one of the reasons I have a separate function to enable backtracking (Try
) - I can't guarantee the data is in memory otherwise.
Beyond that, there might be some overhead in the implementation of the parsers themselves, rather than across-the-board costs (perhaps the loops themselves are not optimised). That seems quite directly tractable, if I can diagnose the worst performers!
If I were you I'd keep this PR around if you want to use it and make Pidgin faster. If you are willing to and have the time for that.