Performance of `prepare_regular_token`?

Open zenspider opened this issue 7 years ago • 1 comments

Continuing a conversation from #278 here for more visibility:

I am looking at prepare_regular_token because it is at the top of my profiles for some of our tests. I don't know how much effort has already been put in so I don't know if I should bother poking yet.

The method looks like a hand-build byte-oriented lexer using either StringIO/IO (tho the constructor seems out of date on that). Has any effort been made to benchmark this or optimize it? I was thinking a switch to wrapping the IO in a StringScanner instance might allow better lexing performance and fewer objects being made. For example, all those "".dups might be able to be thrown away because you'd just get the finished token from StringScanner in one fell swoop.

I can poke at this, but I'd love to know others have and it's been thoroughly explored and maxed out already.

Aug 16 '18 23:08 zenspider

I can poke at this, but I'd love to know others have and it's been thoroughly explored and maxed out already.

I definitely haven't given this method much attention, and I wouldn't be surprised if some noticeable gains are possible.

Git history says the last time I looked at the method in detail was 2012 480dac299b5f8e841deb05d8b986495cc77014b1

Aug 16 '18 23:08 yob