ada
ada copied to clipboard
Improve `ada::can_parse` performance
It is possible to speed-up ada::can_parse performance.
I would like to try this
@anonrig do you have anything specific in mind ? or should I just investigate and try to find something ?
ideally, we should try to avoid any string allocations whenever we are calling can_parse
I looked briefly and I am not sure what can be done directly in ada:can_parse Maybe:
- removing allocations in the parse_url itself and url_aggregator?
- changing parse signature to maybe take directly some views and changing url_aggregator in the process ?
- maybe parsing base + rest together ? it should fail early anyway if base is not ok and we don't have to allocate result for base before we parse together anyway and have to allocate again into aggregator
I give up on this someone else might take over
what do you guys use for profiling? I don't have much experience with C++, but I'd like to give a shot here
@CarlosEduR What do you run in your computer? Windows, macOS, Linux?
@lemire I currently have a dual-boot so I run Linux and Windows, but I usually go with Linux for coding.
@CarlosEduR Ok. So the first step is 'just' to add can_parse to our benchmarks:
https://github.com/ada-url/ada/blob/ccb7a2646f53a2aa9c65f3ec644f856d57d2b341/benchmarks/benchmark_template.cpp#L21
Basically copying this code, and replacing 'parse' by 'can_parse' ought to do.
This would be an excellent start.
@lemire awesome, I'll do it right now.
@lemire can_parse benchmark added:
bytes/URL: 86.859205
curl : OMITTED
input bytes: 8688092
number of URLs: 100025
performance counters: Enabled
rust version : 1.76.0
zuri : OMITTED
--------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------
BasicBench_AdaURL_href 30622042 ns 30620308 ns 23 GHz=4.24318 cycle/byte=14.8028 cycles/url=1.28576k instructions/byte=34.4159 instructions/cycle=2.32496 instructions/ns=9.86522 instructions/url=2.98934k ns/url=303.018 speed=283.736M/s time/byte=3.5244ns time/url=306.127ns url/s=3.26662M/s
BasicBench_AdaURL_aggregator_href 19617809 ns 19616697 ns 36 GHz=4.18634 cycle/byte=9.36962 cycles/url=813.838 instructions/byte=23.009 instructions/cycle=2.45571 instructions/ns=10.2804 instructions/url=1.99855k ns/url=194.403 speed=442.893M/s time/byte=2.25788ns time/url=196.118ns url/s=5.09897M/s
BasicBench_AdaURL_CanParse 19694538 ns 19692588 ns 36 GHz=4.27876 cycle/byte=9.55048 cycles/url=829.548 instructions/byte=23.5847 instructions/cycle=2.46948 instructions/ns=10.5663 instructions/url=2.04855k ns/url=193.876 speed=441.186M/s time/byte=2.26662ns time/url=196.877ns url/s=5.07932M/s
BasicBench_whatwg 53192626 ns 53182009 ns 13 GHz=4.24331 cycle/byte=25.8289 cycles/url=2.24348k instructions/byte=68.8033 instructions/cycle=2.66381 instructions/ns=11.3034 instructions/url=5.9762k ns/url=528.709 speed=163.365M/s time/byte=6.12125ns time/url=531.687ns url/s=1.88081M/s
BasicBench_ServoUrl 84392206 ns 84384304 ns 8 GHz=4.24188 cycle/byte=40.2331 cycles/url=3.49461k instructions/byte=108.216 instructions/cycle=2.68973 instructions/ns=11.4095 instructions/url=9.39958k ns/url=823.836 speed=102.959M/s time/byte=9.71264ns time/url=843.632ns url/s=1.18535M/s
@CarlosEduR Would you issue a PR? This is fantastic!!!
Once we have good metrics, we can more easily test out ideas. It seems that there are obvious things that we can drop... e.g., sections of code that always succeed and do not impact the rest of the processing. We should be able to go step-by-step.
@lemire sure, I'll open a PR soon! And it sounds interesting, I'd love to help more on that too.
@lemire @anonrig I'd like to keep the discussion
Yes. Absolutely.
Right now I don't know the easiest way to go forward.