Mikhail Korobov

Results 479 comments of Mikhail Korobov

I agree that using stdlib re would be better. But if, as you said, "the giant regex will be cut into 1/3 of the original size", stdlib re probably won't...

With re2 it doesn't matter how many rules are combined in a regex. I like re2 because it feels right - DFA is a right algorithm / data structure for...

Hey @HanXHX, There are no plans to work on that soon; a PR is welcome! It seems @shawa has a fork which has some HTML rules support; I haven't checked...

A shameless plug: https://github.com/TeamHG-Memex/scrapy-crawl-once is a similar package, but storage decision is not based on items - an explicit meta key is used (users can still set it based on...

FTR: I've recently created a middleware similar to deltafetch, but which is more explicit: https://github.com/TeamHG-Memex/scrapy-crawl-once. It does a similar thing, but in a less automatic way - user needs to...

Hey Ian, Wrapper uses dict-like interface for that: `del trie['key']` should work. You're right that it is better to document that, it is indeed not documented . I'm glad to...

You're right, I should add a note about `del` to README. Hmm, sorting all data alphabetically before the insertion should make building as fast as possible. I'm not sure why...

I think the issue with datrie is not with memory allocation, but with copying of data, so preallocation won't help.

That's interesting, thanks for sharing. Unlike other implementations, libdatrie stores key "tails" in a separate data structure. So shared prefixes are stored in Double-Array Trie, but suffixes that are not...

Yeah, this is a problem. See also discussion at https://github.com/pytries/datrie/issues/12.