ada icon indicating copy to clipboard operation
ada copied to clipboard

Improve `ada::can_parse` performance

Open anonrig opened this issue 2 years ago • 15 comments

It is possible to speed-up ada::can_parse performance.

anonrig avatar May 08 '23 15:05 anonrig

I would like to try this

darowny avatar Sep 13 '23 14:09 darowny

@anonrig do you have anything specific in mind ? or should I just investigate and try to find something ?

darowny avatar Sep 13 '23 16:09 darowny

ideally, we should try to avoid any string allocations whenever we are calling can_parse

anonrig avatar Sep 13 '23 16:09 anonrig

I looked briefly and I am not sure what can be done directly in ada:can_parse Maybe:

  1. removing allocations in the parse_url itself and url_aggregator?
  2. changing parse signature to maybe take directly some views and changing url_aggregator in the process ?
  3. maybe parsing base + rest together ? it should fail early anyway if base is not ok and we don't have to allocate result for base before we parse together anyway and have to allocate again into aggregator

darowny avatar Sep 13 '23 20:09 darowny

I give up on this someone else might take over

darowny avatar Nov 25 '23 19:11 darowny

what do you guys use for profiling? I don't have much experience with C++, but I'd like to give a shot here

CarlosEduR avatar May 10 '24 16:05 CarlosEduR

@CarlosEduR What do you run in your computer? Windows, macOS, Linux?

lemire avatar May 10 '24 16:05 lemire

@lemire I currently have a dual-boot so I run Linux and Windows, but I usually go with Linux for coding.

CarlosEduR avatar May 10 '24 17:05 CarlosEduR

@CarlosEduR Ok. So the first step is 'just' to add can_parse to our benchmarks:

https://github.com/ada-url/ada/blob/ccb7a2646f53a2aa9c65f3ec644f856d57d2b341/benchmarks/benchmark_template.cpp#L21

Basically copying this code, and replacing 'parse' by 'can_parse' ought to do.

This would be an excellent start.

lemire avatar May 10 '24 18:05 lemire

@lemire awesome, I'll do it right now.

CarlosEduR avatar May 10 '24 19:05 CarlosEduR

@lemire can_parse benchmark added:


bytes/URL: 86.859205
curl : OMITTED
input bytes: 8688092
number of URLs: 100025
performance counters: Enabled
rust version : 1.76.0
zuri : OMITTED
--------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------
BasicBench_AdaURL_href              30622042 ns     30620308 ns           23 GHz=4.24318 cycle/byte=14.8028 cycles/url=1.28576k instructions/byte=34.4159 instructions/cycle=2.32496 instructions/ns=9.86522 instructions/url=2.98934k ns/url=303.018 speed=283.736M/s time/byte=3.5244ns time/url=306.127ns url/s=3.26662M/s
BasicBench_AdaURL_aggregator_href   19617809 ns     19616697 ns           36 GHz=4.18634 cycle/byte=9.36962 cycles/url=813.838 instructions/byte=23.009 instructions/cycle=2.45571 instructions/ns=10.2804 instructions/url=1.99855k ns/url=194.403 speed=442.893M/s time/byte=2.25788ns time/url=196.118ns url/s=5.09897M/s
BasicBench_AdaURL_CanParse          19694538 ns     19692588 ns           36 GHz=4.27876 cycle/byte=9.55048 cycles/url=829.548 instructions/byte=23.5847 instructions/cycle=2.46948 instructions/ns=10.5663 instructions/url=2.04855k ns/url=193.876 speed=441.186M/s time/byte=2.26662ns time/url=196.877ns url/s=5.07932M/s
BasicBench_whatwg                   53192626 ns     53182009 ns           13 GHz=4.24331 cycle/byte=25.8289 cycles/url=2.24348k instructions/byte=68.8033 instructions/cycle=2.66381 instructions/ns=11.3034 instructions/url=5.9762k ns/url=528.709 speed=163.365M/s time/byte=6.12125ns time/url=531.687ns url/s=1.88081M/s
BasicBench_ServoUrl                 84392206 ns     84384304 ns            8 GHz=4.24188 cycle/byte=40.2331 cycles/url=3.49461k instructions/byte=108.216 instructions/cycle=2.68973 instructions/ns=11.4095 instructions/url=9.39958k ns/url=823.836 speed=102.959M/s time/byte=9.71264ns time/url=843.632ns url/s=1.18535M/s

CarlosEduR avatar May 10 '24 22:05 CarlosEduR

@CarlosEduR Would you issue a PR? This is fantastic!!!

Once we have good metrics, we can more easily test out ideas. It seems that there are obvious things that we can drop... e.g., sections of code that always succeed and do not impact the rest of the processing. We should be able to go step-by-step.

lemire avatar May 12 '24 01:05 lemire

@lemire sure, I'll open a PR soon! And it sounds interesting, I'd love to help more on that too.

CarlosEduR avatar May 12 '24 01:05 CarlosEduR

@lemire @anonrig I'd like to keep the discussion

CarlosEduR avatar May 14 '24 23:05 CarlosEduR

Yes. Absolutely.

Right now I don't know the easiest way to go forward.

lemire avatar May 14 '24 23:05 lemire