connect icon indicating copy to clipboard operation
connect copied to clipboard

Add `javascript` processor

Open kabukky opened this issue 2 years ago • 9 comments

You won't remember, but a few months ago I went on discord and asked if a javascript processor is something you guys would consider merging into main.

I got a very positive reaction, so here it is!

There are a few TODOs in here that are worth discussing I think, so feel free to comment.

Here are two examples:

With inline code:

config.yaml

input:
  http_server:
    path: /test
    allowed_verbs:
      - POST
    sync_response:
      status: 200

pipeline:
  processors:
  - javascript:
      code: |
          console.log(getMeta("http_server_request_path"));
          var root = getRoot();
          root.foo = "bar";
          setRoot(root);

output:
  sync_response: {}

With .js files:

config.yaml

input:
  http_server:
    path: /test
    allowed_verbs:
      - POST
    sync_response:
      status: 200

pipeline:
  processors:
  - javascript:
      file: test.js

output:
  sync_response: {}
  processors:
    - bloblang: |
        root = "see console"

test.js

let foo = require("foo");

console.log(foo.bar());

foo.js

const bar = () => {
    return "foobar!"
}

exports.bar = bar;

I've tested this a bit and found JavaScript very helpful when complex logic is needed that is sometimes hard to read or difficult to implement in bloblang.

This is a first draft that works well for me. What do you think?

kabukky avatar Aug 22 '22 17:08 kabukky

Hey @kabukky, this is awesome, it might take me a little while to dig into the implementation properly but at a glance it looks great and this is going to be super powerful.

Jeffail avatar Aug 24 '22 15:08 Jeffail

@Jeffail that is good to hear. Please let me know if you have any questions or suggestions. I will get right to it.

Best

kabukky avatar Aug 25 '22 14:08 kabukky

May be add v8go (https://github.com/rogchap/v8go) support instead goja?. It's binding for v8 and it's much faster than goja (it's used in current implementation). From comment: https://github.com/dop251/goja/issues/2#issuecomment-426429140

function factorial(n) {
    return n === 1 ? n : n * factorial(--n);
}

var i = 0;

while (i++ < 1e6) {
    factorial(10);
}

The execution times roughly were: otto: 33.195s goja: 3.937s duktape: 1.545s v8 (go binding): 0.309s v8 native (d8): 0.187s

In that test v8go x10 times faster than goja.

packman80 avatar Aug 27 '22 09:08 packman80

@packman80 I'm not sure the performance benefit is worth the hassle of using v8go. v8go would add dependencies to Cgo and V8 binaries to Benthos. And as far as I can see, v8go only brings binaries for Linux and macOS.

A Go native JavaScript VM is much easier to maintain in my opinion. But if the maintainers think otherwise, I will change this to another Go JS library, no problem.

kabukky avatar Aug 27 '22 10:08 kabukky

@kabukky Great feature. Can I directly read and write the cache? Or directly access data from other components with label names. It would be even better if you could call output directly. When I need mqtt.publish, the current output mechanism is too cumbersome. If you can use outputs.getCompentsByLabel(labelname).asMqtt.publish(json) directly That would be great.

darcyg avatar Oct 25 '22 03:10 darcyg

@kabukky Great feature. Can I directly read and write the cache? Or directly access data from other components with label names. It would be even better if you could call output directly. When I need mqtt.publish, the current output mechanism is too cumbersome. If you can use outputs.getCompentsByLabel(labelname).asMqtt.publish(json) directly That would be great.

@darcyg That's not possible atm. @Jeffail Is that something that fits into the design principle of Benthos?

kabukky avatar Oct 25 '22 10:10 kabukky

@darcyg That's not possible atm. @Jeffail Is that something that fits into the design principle of Benthos?

I think It's not a big problem if you use a scripting engine in your outputs. Using javascript for conditional branching can simplify configuration logic Especially the branch configuration of switch:case, when there are nested branches. My ambition is to implement a little more complicated logic. Especially conditional output, or data read and write. yaml is too complicated to configure. I had to do it with multiple streams.

Of course I use benthos as a rule engine. Not exactly the idea of a streaming engine.

darcyg avatar Oct 25 '22 10:10 darcyg

@kennyp I redid the javascript patch on the latest benthos (4.10). No interface with v4/public/service The interface under v4/internal/ is used It is convenient for later reading and calling other process and output and component functions of cache 0001-process-javascript-new.patch.zip

My code is on my own gitlab server, so I submit it as a patch.

darcyg avatar Nov 02 '22 04:11 darcyg

@kabukky @Jeffail Based on the previous patch files. Add cache and output resource access

cat test.yaml

input:
  http_server:
    path: /test
    allowed_verbs:
      - POST
    sync_response:
      status: 200

pipeline:
  processors:
  - javascript:
      cache_res:
        - testa
      code: |
          console.log(getMeta("http_server_request_path")+" 1234");
          var root = getRoot();
          root.foo = getMeta("http_server_request_path")+"_1234";
          setRoot(root);
          setCache("cc1","300s");
          root.foo1 = getCache("cc1");
output:
  label: js1
  broker:
    outputs:
      - javascript:
          cache_res:
            - testa
          output_res:
            - mqtt_ores
          code: |
            console.log(getMeta("http_server_request_path")+" 5678");
            var root = getRoot();
            root.foo = getMeta("http_server_request_path")+"_5678";
            root.topic = "abc"+root.foo;
            console.log(root.foo);
            if (root.abc == "1") {
              console.log("check yes");
              benthos_output("mqtt_ores",JSON.stringify(root))
            } else {
              console.log("check no");
            }
           
      - sync_response: {}
      
cache_resources:
  - label: testa
    memory:
      default_ttl: 600s
      compaction_interval: ""
      init_values:
        cc1: bar1
        cc2: bar2
        
output_resources:
  - label: mqtt_ores
    mqtt:
      urls:
        - tcp://192.168.36.1:1883
      topic: ${! json("topic")}

call

curl -X POST http://127.0.0.1:4195/test -d '{"abc":"1"}'  # Send mqtt data, print "check yes"
curl -X POST http://127.0.0.1:4195/test -d '{"abc":"0"}'  # not Send mqtt data, print "check no"

mqtt

0003-javascript-call-cache-and-output-components.zip

darcyg avatar Nov 16 '22 11:11 darcyg

@darcyg good job. But can you please issue it as a pull request

packman80 avatar Dec 06 '22 16:12 packman80

@darcyg good job. But can you please issue it as a pull request

I will deal with it after the release of 4.11.0 a few days later. I'm busy these days. My github account is too messy. Fork too many projects.

darcyg avatar Dec 09 '22 04:12 darcyg

Did you get a chance to look at this? I still think this would be the minimally invasive way of adding a JS processor.

Best

kabukky avatar Jan 18 '23 08:01 kabukky

Hey @darcyg, any news ?

packman80 avatar Feb 07 '23 12:02 packman80

Hey @darcyg, any news ?

https://github.com/darcyg/benthos https://github.com/darcyg/benthos/commit/0a4a6703a30a7d4f23f32b7b7d3e7df89d646d0c

Sorry, I rarely use github to manage projects. It is unclear how pull requests submit commit to this # 1406.

Maybe you can clone my project, modify it and submit it to the main project.

I didn't write test code. My go language is temporary learning and not proficient.

darcyg avatar Feb 20 '23 05:02 darcyg

Thanks @kabukky, I finally got around to working on this, I've done a pretty hefty rewrite of some bits, especially the docs/functions, but the core bits are mostly the same.

Merged via https://github.com/benthosdev/benthos/commit/226de1bf83a46862cd29052b1022c17afcf0313f And https://github.com/benthosdev/benthos/commit/813a3427ed6248dff1bf78ad62f7c552508361b4

Jeffail avatar Apr 24 '23 15:04 Jeffail

Thanks so much @Jeffail I'm not sure about the VM re-use via the pool. Goja runtime re-use is tricky. See here for example: https://github.com/dop251/goja/issues/291

kabukky avatar Apr 26 '23 14:04 kabukky

Thanks so much @Jeffail I'm not sure about the VM re-use via the pool. Goja runtime re-use is tricky. See here for example: dop251/goja#291

Yeah I had a play around with the different options, I've updated the docs to explain how the current system works with caveats. Basically if programs that utilise variables encapsulate their logic within anonymous functions then if I've understood things correctly we should be fine. I've also added examples that show it in action (the structure mutation one here: https://www.benthos.dev/docs/components/processors/javascript#examples).

If things turn ugly and it trips up users we can do some tinkering to get it right, but for now I think for the performance benefits it's worth trying this approach.

Jeffail avatar Apr 27 '23 09:04 Jeffail