usfm-grammar icon indicating copy to clipboard operation
usfm-grammar copied to clipboard

Improve performance of conversion

Open kavitharaju opened this issue 3 years ago • 2 comments

The parsing with the tree-sitter module is quite fast even for large and complex usfm files. Then we do a sequential parsing of the output syntax tree to convert them to USX , JSON etc. In doing so, the performance is greatly affected. Need to look into some alternate programming methodologies like callbacks to improve this.

kavitharaju avatar Nov 07 '22 05:11 kavitharaju

Yes, just to give some real world stats:

https://github.com/schierlm/BibleMultiConverter can do USFM->USX conversion for a whole Bible in ~6 seconds.

usfm-grammar (Node) takes 3-60 seconds per book, so probably ~2000 seconds for whole Bible. I didn't run the whole thing as might have taken half an hour.

But there's different use cases, and it looks like this could be really useful for a more feature rich converter. I'll be keen to hear if there are performance improvements.

shadow-light avatar Nov 21 '24 03:11 shadow-light

Note: In python could use https://docs.python.org/3/library/profile.html to find out where improvement is needed

kavitharaju avatar Nov 26 '24 15:11 kavitharaju