dolphinscheduler icon indicating copy to clipboard operation
dolphinscheduler copied to clipboard

[Feature]Support to Input a json string instead of k-v pairs while triggering a new process instance

Open WilliamChen-luckbob opened this issue 4 years ago • 5 comments

See issue #2561

So far, I can see how to input k-v pairs into a new process instance.

Inspired by Azkaban and Airflow, I'm thinking about another situation like this :

While starting a new process instance, sometimes the input parameters are not only k-v pairs, but also too many of them.

It would be better if I can input only one complex JSON string instead of multiple k-v pair parameters when I'm starting a new process instance.

Then dolphinscheduler transform and save it as a global map value. Other tasks may share this map using pattern like ${aaa.bbb}.

e.g. input is a complex JSON and each branch of the first floor may be the initialize data of each task { "a1" : { "a21":1, "a22":2 }, "b1":{ "b21":{ "b31":"xxxxx", "b32":"yyyyy" }, "b22":"cccccc" } }

task1: uses init data ${a1} (which means the map includes a21 and a22 is required) task2: uses init data ${b1.b21.b31} (which means only xxxxx is required) ...

So far, feature dev seems not support to pass a complex JSON while starting new process instance.

I think It is conveniont for us if there are too many parameters, and also in the cases if tasks supports a dynamic parameter.

On the other hand, inspired by azkaban, when 2 tasks has input-response-relationship, e.g. taskB requires the specified response from taskA, there should be someway to get and set those parameters.

WilliamChen-luckbob avatar Feb 05 '21 09:02 WilliamChen-luckbob

By throwing : com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of java.lang.String out of START_OBJECT token There is no way to input complex JSON string which has more than 1 nested level Jackson will try directly to deserialize all the structure. I think that here should be Map<String,Object> or provide a way which can let us to input JSON string and then pass to the global variables. image

WilliamChen-luckbob avatar Feb 05 '21 10:02 WilliamChen-luckbob

Do you mean, input in the form of JOSN string, and then we automatically parse it into the form of K-V?

CalvinKirs avatar Feb 06 '21 02:02 CalvinKirs

Do you mean, input in the form of JOSN string, and then we automatically parse it into the form of K-V?

@CalvinKirs Nope, sorry that English is not my first language and that may lead to some misuderstanding.

To input in the form of JOSN string, and then DolphinScheduler automatically parse it into the form of K-V is only a convenient way for us to set start parameters.

I can also generate the K-V pairs before I call the StartNewProcessInstance api.

The primary point is:

I think that in some cases, tasks accept not only form-data in K-V pair, but also accept form-data in K-JSON pair. e.g. ProcessDefinition api(/dolphinscheduler/projects/{projectName}/process/save) This api accepts form-data parameter which can directly input JSON string.

Tasks in the ProcessInstance may have the same situation.

But when I'm trying to send a start parameter (K in string, V in JSON string) , it leads to the exception.

Now DolphinScheduler can only accept (K in string,V in string). When I input start parameters(K in string, V in JSON string), Jackson will report an exception that cannot transform to Map<String,String>.

And if we chose to get parameters by using Map<String,Object>, I think and also tried, Jackson will not report such exception.

And that was what I mean.

By the way, here are some new ideas I came out this weekend while using DolphinScheduler showing below:

  1. After starting a new ProcessInstance it would be better to return the Istance ID of this Instance so that we can quickly bind the instance ID to the options of our own system which using DolphinScheduler as a scheduler structure. Otherwise, it is quite confuzed when we are trying to achive Instance log.

  2. Also I think the way of how to share parameters between tasks in Azkaban is quite a good idea. The Same way as DolphinScheduler, but they support more situations.

WilliamChen-luckbob avatar Feb 08 '21 01:02 WilliamChen-luckbob

I have figured it out by creating my own runId and then using this runId as key to put the complex JSON string parameters in redis. And then pass this runId as request header. Each task will then consider to get parameters from redis instead of pass parameters through DolphinScheduler. Still, I think, to support passing complex JSON string at the beginning of the ProcessInstance is a good consideration in the future.

WilliamChen-luckbob avatar Feb 09 '21 03:02 WilliamChen-luckbob

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar May 07 '24 00:05 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

github-actions[bot] avatar May 15 '24 00:05 github-actions[bot]