kubeyaml icon indicating copy to clipboard operation
kubeyaml copied to clipboard

PyYAML + LibYAML Bindings 15x faster?

Open nabadger opened this issue 4 years ago • 0 comments

Related to https://github.com/fluxcd/flux/issues/1857 I was looking into kubeyaml.

One thing I'm aware of is the optional use of underlying C bindings with Python YAML Parsers.

From what I gather, the C extension is not available for the round-trip functionality of ruamel.yaml (which is used as the default typ arg) - so I don't think kubeyaml is making use of this...

In terms of timing, I parsed all our manifests through both versions (360 manifests, merged into a 1.8mb file).

time cat input.yaml \
| python kubeyaml.py image \
  --image="registry.gitlab.com/<my-site>:test" \
  --container="main" --kind=Deployment \
  --name dev-site \
  --namespace=default > output.yaml

Existing

0.00s user 0.01s system 0% cpu 18.336 total

PyYAML + CSafeLoader / CSafeDumper

1.07s user 0.04s system 94% cpu 1.178 total

The output from PyYAML has single-quotes in place of double-quotes, and seems to differ on the escaping of newline (I think the end-result is ok though, will check).

In terms of my code changes, it was:

import yaml
...
docs = yaml.load_all(infile, Loader=yaml.CSafeLoader)
yaml.dump_all(fn(docs), outfile, Dumper=yaml.CSafeDumper)

For this experiment, I also removed the existing yaml(), but I think preserving of single-quotes is supported by PyYAML (just not via a nice booelan flag).

I may have missed something - I'll try to apply the result to our cluster and see...just thought it was a crazy time difference worth getting your feedback on :)

Note, I've not tried this with fluxcd yet...

nabadger avatar Apr 26 '20 15:04 nabadger