Alexander Sibiryakov

Results 124 comments of Alexander Sibiryakov

anything else needed here @dpkp and @jeffwidman ?

@dpkp we can't. I have seen it (not its subclass) thrown in my application and Python source code also suggests that it can be thrown in certain situation. I'm afraid...

yes, userinfo is username and password. > Could you please give an example? What's wrong with e.g. dots in hostname? google.com. obviously these are task-dependent issues, but there is no...

Hi @ChiraMircea can you guys write a test for this? It's extremely hard to reason on your code without tests.

I would +1 this. Is it still unsolved?

Hey @kmike there is Yandex internal URL parsing routine. https://github.com/yandex/balancer/tree/master/util/uri Used and tested in their content system infrastructure. It's dependent on Ragel (https://en.wikipedia.org/wiki/Ragel) for generation of URL parser and encoder....

@Preetwinder This list can be useful for your test https://github.com/sibiryakov/general-spider/blob/master/seeds_es_dmoz.txt. The problem with your test is code you're running will benefit from using CPU cache much more than production spider.

@Preetwinder nice job, really! I really do value your effort! Here are the problems in your testing code: - use clock(), it's meant for benchmarking, and test it on linux...

@Preetwinder Awesome! So we have about 5x times of difference between native urlparse and Chrome URL parser wrapped with Cython? 5 vs 1.1 mcs per URL, which isn't bad I...

Hey @Preetwinder here is what I've got using Yandex URL parser: `count 82257, avg 580.3806971 ns, median 507 ns, 90% 682 ns`. It's collected using Chromium 80K test set (chosen...