enketo-core icon indicating copy to clipboard operation
enketo-core copied to clipboard

Use of Native Xpath - Faster calculate when big data

Open mgogh opened this issue 2 years ago • 3 comments

If you create a form with select_one_from_file based on big csv file (more than 3000 rows and 10 columns), then you make ten calculate for each columns, it will be very slow. (the select_one was filtered by an other question).

Exemple : In France, we have 34 000 communes, we can filter it with region and department.

Trying to always use Xpath native approach improve the value search when model are big. If expr contains custom OpenRosa functions, it will use the fork as expected (jsEvaluate) which is slower than the native method.

mgogh avatar Jun 10 '22 13:06 mgogh

Hi @mgogh, thanks for this PR. I arrived at a similar approach in a project I'm using for exploring/prototyping a variety of performance improvements, but hadn't gotten around to getting it into a PR. In my prototype, I arrived at a few additional optimizations.

  1. I found that pre-compiling expressions with document.createExpression performs better than calling document.evaluate directly. (Surprisingly, it performs better even if you discard the expression and recompile on each evaluation, at least in Chrome.)

  2. Wrapping both cases in a class of the same shape performs better still, and catching the error on construction rather than evaluation performs much better still. This is partly because classes with a consistent shape are good JIT optimization targets, and partly because the try/catch branching is much more minimal and predictable.

I'll take some time to bring the pertinent prototype code in for a PR. In the meantime, would you be able to share a form like the one you described? I'd like to add it to my collection of performance-related forms.

eyelidlessness avatar Jun 10 '22 17:06 eyelidlessness

Hi @eyelidlessness , Here, a form with a list of avg 40 000 rows (French municipalities) and 11 calcuation. https://ee.kobotoolbox.org/Cijpflhc the form takes a long time to load the data (be patient)

The XLSForm and datas : XLSFORM_big_list.xlsx communes.csv departement.csv region.csv

Try a smaller list (~1 300 rows) : https://ee.kobotoolbox.org/1CmdNbQ0 communes_light.csv XLSFORM_small_list.xlsx (same departement/region)

jdugh avatar Jun 11 '22 14:06 jdugh

Thank you @jdugh! I meant to reply earlier, but wound up on a yak shaving adventure trying to get the large CSV to load on my local enketo-express/ODK central setup.

That aside, this is an awesome case to add to my growing collection of performance stress tests.

Earlier today @lognaturel and I discussed a safer, more limited approach to this. Instead of always deferring to the native evaluator, or doing more complex analysis of queries, we'll likely start with a more naive analysis to optimize queries which are obviously straightforward (nodeset references, basic operators with non-ambiguous operands). This isn’t the end of the line for optimization potential I’m exploring, but it will be a big perf boost for a lot of common cases and a lot of the groundwork is already laid.

eyelidlessness avatar Jun 15 '22 00:06 eyelidlessness