mwparserfromhell icon indicating copy to clipboard operation
mwparserfromhell copied to clipboard

C tokenizer: emit tokens simpler than the expensive PyObject* kind

Open earwig opened this issue 12 years ago • 3 comments

Allocating and filling the slots of PyObject*s every time we create a token (even if it is later discarded) is a large overhead; ideally, we use custom structs for each token that have the appropriate attributes.

The parser will either have to (1) convert these tokens to PyObject*s at the end of parsing (2) wrap them in some kind of capsule? (3) write a C port of the builder too that uses these new tokens.

(3) is ultimately the fastest solution, but it's the most work and, since regular Python tokens are never generated, we will need a new way to run C tokenizer test cases.

earwig avatar Aug 19 '13 09:08 earwig

assign this to me my dude

ghost avatar Jun 23 '17 08:06 ghost

Can't.

earwig avatar Jun 23 '17 08:06 earwig

lame-o

ghost avatar Jun 23 '17 08:06 ghost