mwparserfromhell C tokenizer: emit tokens simpler than the expensive PyObject* kind

C tokenizer: emit tokens simpler than the expensive PyObject* kind

Open earwig opened this issue 12 years ago • 3 comments

Allocating and filling the slots of PyObject*s every time we create a token (even if it is later discarded) is a large overhead; ideally, we use custom structs for each token that have the appropriate attributes.

The parser will either have to (1) convert these tokens to PyObject*s at the end of parsing (2) wrap them in some kind of capsule? (3) write a C port of the builder too that uses these new tokens.

(3) is ultimately the fastest solution, but it's the most work and, since regular Python tokens are never generated, we will need a new way to run C tokenizer test cases.

Aug 19 '13 09:08 earwig

assign this to me my dude

Jun 23 '17 08:06 ghost

Can't.

Jun 23 '17 08:06 earwig

lame-o

Jun 23 '17 08:06 ghost

mwparserfromhell mwparserfromhell copied to clipboard

C tokenizer: emit tokens simpler than the expensive PyObject* kind

mwparserfromhell
mwparserfromhell copied to clipboard