shyaml icon indicating copy to clipboard operation
shyaml copied to clipboard

More speed efficient

Open vaab opened this issue 12 years ago • 3 comments

The python implementation is desperately slow. Might have to rewrite this in C if we want to go faster.

vaab avatar Apr 30 '13 21:04 vaab

Absolutely true. I recently discovered shyaml, but I cannot use it, it is way too slow.

nowox avatar Nov 19 '15 08:11 nowox

Version 0.6.0 and on makes sure to use the libyaml C bindings that might help for speed efficiency. You might want to check shyaml --version (starting from 0.6.1) to double-check that you are using the libyaml binded version.

Would be happy to have an example YAML (or bunch of them) for benchmark to actually set some metric so we know what we speak about.

vaab avatar Dec 14 '18 16:12 vaab

There are numerous other ways to get more speed out of shyaml:

The code itself can be a little quicker (although, with the libyaml binding, is there a lot improvement left to be achieved ?):

  • compilation to binary code directly from existing python via nuitka (this work out of the box and could be done for each release/platform).
  • write intermediary code to get also a direct binary code via Cython.

Please note that a binary would have the added valuable benefit of not requiring python (well only libpython, but we could also go static...), and would not have any sort of dependency induced failure, to a point were we could also completely forget the python testing compatibility matrices (versions of python, versions of dependencies, installing tests) and save a lot of time from the python distribution hell. On the other hand, we would get into another hell of managing dependency between systems and architecture.

Even blazing fast code, because we use shyaml in shell, will face the costly spawning of process... So:

  • change of API to allow more done into one call of shyaml (rebuilding some more shyaml, tests, or little efficient language...), every call (spawning) that we can save can potentially save calls in tight loop. Some research might be necessary to see what are the most common mangling that would benefit of this.
    • This could go through a clever little language (but why introduce a new language ?) which could borrow
      a lot of ideas here and there (like XPATH). Of course, having a look to jq is mandatory.
    • But this could also leverage an available existing language, like evaling python. (What would be the real performance cost ?) And it seems to be way to big language for simple task.
  • A solution based on a daemon and interprocess communication (thinking of sockets) would be much more difficult to grasp for most, but would remove entirely the cost of spawning. With some work, we could probably offer a bash function using only builtin, insuring that shyaml is launched in daemon mode, and send it the proper way the queries and returning the responses and effectively offer the same interface that the normal CLI.

On the road toward better performance, we could think of adding a switch in shyaml measuring time spent in it's actual code compared to the time spent in PyYAML. I'm not expecting a surprise here and do not think the python code here is so important in itself.

The most important metric to move forward are :

  • the actual cost of spawning
  • The python interpreter loading time
  • the time in shyaml's code
  • the time in PyYAML's code (this ones dependends completely on the input YAML of course)

vaab avatar Dec 17 '18 07:12 vaab