SGFParser
SGFParser copied to clipboard
Speed Increases
The SGF parser is slllloooow.
Specifically...
SGF::Parser#next_character:
[email protected]? && @stream.sysread(1) is really slow. The stream methods are poor choices because they don’t get cached at all. You’d find exponential improvements in speed by reading the whole thing to a buffer at once, and then just slicing it or iterating over it.
SGF::Parser#parse_property
is really slow, due to the while loops in parse_comment, parse_multi_property and parse_generic_property. Please use built in string methods instead, they are much faster than your pure ruby implementation. Since you should now be reading to a buffer, you don’t need to go over ever character individually - expect huge speed ups.
SGF::Parser#still_inside_node?
steam methods AND while loop. lucky this doesn’t get called much.
Generally, reworking the parser to work with built-ins on a buffered read should do two things:
- immense speed ups
- no more issues with ]
Define slow as compared to, say, every other implementation on the market. Also, while I truly appreciate the enthusiasm, all these are beginning to make me feel a little overwhelmed, because I'm getting the impression that you're not interested in contributing any code and I'm just going to do all the work. This parser has been out for over a year and it's been a little pet project, I don't know of anyone who's been using it for anything, so this is quite a big and sudden change.
Now - you're right. I just tried to parse KJD and not only did it take five seconds, it also actually bugged out and didn't give me a tree with anything at all. Fun.
On Wed, Dec 7, 2011 at 03:45, Colin Noga [email protected] wrote:
The SGF parser is slllloooow.
Specifically...
SGF::Parser#next_character:
[email protected]? && @stream.sysread(1) is really slow. The stream methods are poor choices because they don’t get cached at all. You’d find exponential improvements in speed by reading the whole thing to a buffer at once, and then just slicing it or iterating over it.
SGF::Parser#parse_property
is really slow, due to the while loops in parse_comment, parse_multi_property and parse_generic_property. Please use built in string methods instead, they are much faster than your pure ruby implementation. Since you should now be reading to a buffer, you don’t need to go over ever character individually - expect huge speed ups.
SGF::Parser#still_inside_node?
steam methods AND while loop. lucky this doesn’t get called much.
Generally, reworking the parser to work with built-ins on a buffered read should do two things: 1) immense speed ups 2) no more issues with ]
Reply to this email directly or view it on GitHub: https://github.com/Trevoke/SGFParser/issues/17
Well, I can work on the speed increases later, I’m still becoming familiar with ruby for now. I only started it like, 4 days ago in order to help a friend write a rails site for his Go ladder. So, uh, have patience.
Neat. Alright, well, in that case, let me ask you a different question. Speed increases can come later. I'm worried about the fact that Kogo's Joseki Dictionary apparently didn't parse right, so I'll look into that. What of the features requested do you think are absolutely necessary before I can make a 2.0 release? Say the release will be January 1st, 2012.
On Wed, Dec 7, 2011 at 22:36, Colin Noga [email protected] wrote:
Well, I can work on the speed increases later, I’m still becoming familiar with ruby for now. I only started it like, 4 days ago in order to help a friend write a rails site for his Go ladder. So, uh, have patience.
Reply to this email directly or view it on GitHub: https://github.com/Trevoke/SGFParser/issues/17#issuecomment-3057451
Documentation updating, fixing the output inside an iterator issue, and the overloaded initializers. Plus any fixes you need to properly parse KJD.
An easy thing to do to make the API more friendly would be the property methods - it’s just busy work, after all.
The speed increases are the most important update that doesn’t necessarily need to go in 2.0