kapitan icon indicating copy to clipboard operation
kapitan copied to clipboard

replace jsonnet with go-jsonnet

Open ramaro opened this issue 6 years ago • 30 comments

ramaro avatar Sep 03 '18 10:09 ramaro

can you explain about this problem statement?

mithil2311 avatar Mar 07 '19 06:03 mithil2311

@mithil2311 We are currently using the direct Python library which uses the C++ version of jsonnet. To increase the speed of jsonnet compilation, we need to move to the Go version https://github.com/google/go-jsonnet and create Python bindings for it.

adrianchifor avatar Mar 07 '19 23:03 adrianchifor

(Jsonnet dev here)

FYI basic C bindings for go-jsonnet are probably going to be merged soon (PR: https://github.com/google/go-jsonnet/pull/257). The main missing part is the support for native functions. Other than that it's a drop-in replacement.

My understanding is that the Python bindings just use C bindings internally, so no changes outside of setup.py should be necessary.

sbarzowski avatar Mar 08 '19 00:03 sbarzowski

HI @sbarzowski,

Please excuse my lack of understanding. Could you kindly explain what you mean by

the main missing part is the support for native functions

yoshi-1224 avatar Mar 24 '19 09:03 yoshi-1224

Sure. I meant that this function from the libjsonnet.h is not implemented yet in the bindings:

/** Register a native extension.
 *
 * This will appear in Jsonnet as a function type and can be accessed from std.nativeExt("foo").
 *
 * DO NOT register native callbacks with side-effects!  Jsonnet is a lazy functional language and
 * will call your function when you least expect it, more times than you expect, or not at all.
 *
 * \param vm The vm.
 * \param name The name of the function as visible to Jsonnet code, e.g. "foo".
 * \param cb The PURE function that implements the behavior you want.
 * \param ctx User pointer, stash non-global state you need here.
 * \param params NULL-terminated array of the names of the params.  Must be valid identifiers.
 */
void jsonnet_native_callback(struct JsonnetVm *vm, const char *name, JsonnetNativeCallback *cb,
                             void *ctx, const char *const *params);

Native extensions are a feature which allows calling your own C function from Jsonnet - I can see that Kapitan uses it for a few functions (https://github.com/deepmind/kapitan/blob/master/kapitan/lib/kapitan.libjsonnet).

The jsonnet_json_* family of functions (which is only useful for native functions) is also not implemented in the bindings.

The other missing parts unrelated to native functions are evaluation variants *_stream and *_multi and import callbacks (jsonnet_import_callback function).

Just to be clear, we definitely intend to have all of these implemented on go-jsonnet side so that it becomes a full drop-in replacement for libjsonnet (that is for evaluation, not code reformatting). Any help is ofc welcome and will significantly speed things up.

sbarzowski avatar Mar 24 '19 13:03 sbarzowski

@sbarzowski Thanks for the detailed explanation! I now see what the challenges are. If I have time to work on this issue I'll try my best to help.

yoshi-1224 avatar Mar 25 '19 02:03 yoshi-1224

@yoshi-1224 Can I take this issue for SoC if you aren't planning to work on it full time? I have some knowledge of golang that might help in implemention on go-jsonnet.

vnzongzna avatar Apr 02 '19 13:04 vnzongzna

Hi @vaibhavk Of course! I'm not submitting a proposal for this, but even if I were there's no need to ask because it's not first come first served.

yoshi-1224 avatar Apr 02 '19 13:04 yoshi-1224

These have now been fixed https://github.com/google/go-jsonnet/issues/322 https://github.com/google/go-jsonnet/issues/323

ramaro avatar Oct 17 '19 16:10 ramaro

I wonder, how you are going to use go-jsonnet shared library:

And if you are going to use the second approach, should we wait for go-jsonnet to add this code for python or we are going to add it in this repository?

alldroll avatar Oct 21 '19 09:10 alldroll

@alldroll ctypes should work just fine - however the second approach is ideal, in which case, I agree that that it makes sense to add it to go-jsonnet

ramaro avatar Oct 21 '19 18:10 ramaro

Hello, is there any progress on this front? The optimisations of go-jsonnet are badly needed in some contexts.

joulaud avatar Apr 08 '20 16:04 joulaud

Progress is being made and some bugs and issues are being found along the way. Unfortunately at the moment I don't have a fully working version of it yet.

On Wed, Apr 8, 2020 at 5:18 PM francoisj [email protected] wrote:

Hello, is there any progress on this front? The optimisations of go-jsonnet are badly needed in some contexts.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/deepmind/kapitan/issues/144#issuecomment-611053527, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEFAADOUJOXLZSEW6AOAJLRLSPWRANCNFSM4FS5L3WQ .

ramaro avatar Apr 08 '20 16:04 ramaro

if needed we can help. If you have some work in progress somewhere we can take a look at it.

joulaud avatar Apr 08 '20 16:04 joulaud

@ramaro We are trying to integrate gojsonnet with Kapitan, as we need it to improve our compile time (our usage of Kapitan is growing every weeks). Our work is available here. After trying GoJsonnet bindings in python, I found no issues with it. But, when used with Kapitan, it seems to have some sort of "lock" (can be easily tested using the commit https://github.com/radiofrance/kapitan/commit/c718cd50828e88470425ad3afbfa9ad34f126328 with the files in DRAFT-GOJSONNET).

After some research, I deduce that these "deadlocks" are due to the way python handles multiprocessing by using fork. These "locks" disappear when I run multiprocessing with the spawn context, but introduces another problem; global variables are not copied (because the spawn mode creates an empty process).

Unfortunately, I didn't have time to do more investigation and I'm not sure if I'm on the right way (I'm begining in Python).

EDIT: I am a teammate of @joulaud

xunleii avatar Oct 21 '20 09:10 xunleii

Hey @xunleii great to hear you're using Kapitan!

After some research, I deduce that these "deadlocks" are due to the way python handles multiprocessing by using fork. These "locks" disappear when I run multiprocessing with the spawn context, but introduces another problem; global variables are not copied (because the spawn mode creates an empty process).

That pretty much sums my findings and I started working improving multiprocessing but ran into exactly the same issues, but unfortunately haven't finalised it yet. I will have a look at your work. Thanks

ramaro avatar Oct 21 '20 09:10 ramaro

Could it have something to do with this: https://github.com/google/jsonnet/pull/814?

(It's not yet ported to go-jsonnet.)

sbarzowski avatar Oct 21 '20 10:10 sbarzowski

GIL handling was ported to go-jsonnet in last release cf. https://github.com/google/go-jsonnet/pull/474 and https://github.com/google/jsonnet/releases/tag/v0.17.0

joulaud avatar Dec 11 '20 16:12 joulaud

Yes. Unfortunately, it seems we still have some elusive deadlock issues: https://github.com/google/go-jsonnet/issues/484. I wasn't able to dig deeply enough into it yet.

If anyone can share an idea about what that might be about, it will be really appreciated!

sbarzowski avatar Dec 11 '20 19:12 sbarzowski

When I took a stack-trace on the deadlock produced by @xunleii test-case above I find the exact same stack than in go-jsonnet ticket including Go runtime.gcStart, which obviously must wait for some other goroutine but there is only one thread in the affected process.

After digging a little I was arriving to the same conclusion as @githubaccount888 on https://github.com/google/go-jsonnet/issues/484#issuecomment-744046277. Go runtime is multithreaded by construction and have a concurrent mark-and-sweep GC. And trying to use multithread with python multiprocessing fork method instead of spawn being a pool full of shark I think that not-withstanding any improvement on go-jsonnet binding we must make kapitan work with the spawn method to have some chance to make it working reliably.

What are the real blockers to make this work? @xunleii you say there is some missing global variables in the spawn cases. Do you know what they are? Would it be difficult to add them to the "worker" closure created by "functools.partial" at https://github.com/deepmind/kapitan/blob/master/kapitan/targets.py#L101-L108 ?

EDIT: typo s/being of/being a/

joulaud avatar Dec 13 '20 21:12 joulaud

Unfortunately, I need to investigate another time to found them.

xunleii avatar Dec 14 '20 08:12 xunleii

I tried to advance on this once again. By using the "spawn" method of multiprocessing I stumble on problems with callbacks I did not investigate further.

I just bypassed the whole multiprocessing thing in order to test with

diff --git a/kapitan/targets.py b/kapitan/targets.py
index bb952b0..9d3adbd 100644
--- a/kapitan/targets.py
+++ b/kapitan/targets.py
@@ -123,7 +125,9 @@ def compile_targets(
 
         # compile_target() returns None on success
         # so p is only not None when raising an exception
-        [p.get() for p in pool.imap_unordered(worker, target_objs) if p]
+        for x in target_objs:
+            worker(x)
 
         os.makedirs(compile_path, exist_ok=True)
 

and it mostly works with a lot of performance improvements.

On my test case target compilation itself is around 10 to 25 times faster with go-jsonnet than with cpp-jsonnet (my typical target takes around 5s to 20s with cpp-jsonnet and arount 0.10s to 0.80s with go-jsonnet).

This means than even disabling multiprocessing when launching kapitan compile on my 55 targets, I am better off with go-jsonnet and without multiprocessing than before.

  • cpp-jsonnet and multiprocessing: 228,19s user 1,95s system 337% cpu 1:08,28 total
  • go-jsonnet no multiprocessing: 53,92s user 0,95s system 112% cpu 48,861 total

This is not the case on single target compile as there seems to be some heavier startup time.

Note: I have problem with the yaml_load callback and I just disabled everything relying on it on my tests, I still must investigate on those callback problems.

joulaud avatar Dec 17 '20 09:12 joulaud

Hey @joulaud happy new year!

I have very similar observations - disabling multiprocessing for compilation is actually faster and having it enabled hangs compilation when using go-jsonnet. I have yet to assess this better, but it feels to me that a future without multiprocessing (at least for compilation) will be much simpler to develop with.

I have committed my latest changes for go-jsonnet support at https://github.com/ramaro/kapitan/tree/gojsonnet It disables multiprocessing as you suggest but also adds caching for go-jsonnet import paths and a new component to simply test the yaml_load() callback. Tests pass on Linux running Python 3.8.

Let me know if this works for you!

ramaro avatar Jan 08 '21 23:01 ramaro

Hi @ramaro!

Thank you very much for this New Years gift. I have been using it for a week and it works pretty well. The compilation works much better than before (16s to 0.50s on average), without making any changes after compilation. ~Unfortunately on larger targets I didn't see huge differences but it's probably due to the Jsonnet code which can be optimized on my end (or a misconfiguration of which Kapitan I use)~.

EDIT: It's a misconfiguration on my side (I have both kapitan and for the largest one, I've used the "old" kapitan)

xunleii avatar Jan 19 '21 10:01 xunleii

From all the team and @joulaud, please accept our warmest thanks @ramaro !

uZer avatar Jan 19 '21 11:01 uZer

I repeat the thanks to @ramaro This change is a life-saver for our daily workflow and we just migrated fully to your branch.

What would be the next step to make this branch in state of being mergeable for next release of kapitan? Can we help in anything?

joulaud avatar Jan 25 '21 14:01 joulaud

@joulaud @uZer @xunleii hey apologies, I haven't been able to look at this further. I did spent sometime a few weeks ago on this and on my current infra (~140 jsonnet targets), kapitan with go-jsonnet seems to run out of memory. So while I'm not sure go-jsonnet is ready for primetime yet (more debugging pending, could be an issue with Kapitan), maybe what we can do is have kapitan load go-jsonnet instead of jsonnet if the go-jsonnet module is present. So you could use it and help figure out any issues. How does that sound?

ramaro avatar Feb 25 '21 21:02 ramaro

If you come across anything that can be reproduced with go-jsonnet (either the command or the Python bindings), please let me know!

sbarzowski avatar Feb 26 '21 10:02 sbarzowski

@joulaud @uZer @xunleii hey apologies, I haven't been able to look at this further. I did spent sometime a few weeks ago on this and on my current infra (~140 jsonnet targets), kapitan with go-jsonnet seems to run out of memory.

I did just try to compile 25 full kube-prometheus stacks and didn't notice any unbound increase in memory usage. The memory usage climb around 500MB in 4 or 5 runs and then remain at this level until end of compilation. This is with golang 1.16 and default gojsonnet release (pip install gojsonnet).

maybe what we can do is have kapitan load go-jsonnet instead of jsonnet if the go-jsonnet module is present. So you could use it and help figure out any issues. How does that sound?

That would be a great way to let more people experiment with it.

Jean-Daniel avatar Mar 05 '21 15:03 Jean-Daniel

@ramaro It would be great to get the option to use go-jsonnet. I'm currently in the progress of adding kube-prometheus to our stack to replace the old prometheus install, but I'm not to happy about the 2 minute extra compile time. I tested it and it solves my issue at least. Can I help you by forking your branch and make it automatically import gojsonnet over jsonnet if available so we can get it merged and release?

pvanderlinden avatar Mar 30 '21 15:03 pvanderlinden