mwparserfromhell
mwparserfromhell copied to clipboard
C tokenizer: emit tokens simpler than the expensive PyObject* kind
Allocating and filling the slots of PyObject*s every time we create a token (even if it is later discarded) is a large overhead; ideally, we use custom structs for each token that have the appropriate attributes.
The parser will either have to (1) convert these tokens to PyObject*s at the end of parsing (2) wrap them in some kind of capsule? (3) write a C port of the builder too that uses these new tokens.
(3) is ultimately the fastest solution, but it's the most work and, since regular Python tokens are never generated, we will need a new way to run C tokenizer test cases.
assign this to me my dude
Can't.
lame-o