kun-scheduler [Feature]Flink operator and help wanted

[Feature]Flink operator and help wanted

Open jentle opened this issue 4 years ago • 0 comments

Is your feature request related to a problem? Please describe. 提交 flink 应用的operator，不过目前 workflow 还不支持 continuous， streaming应用只能当不会停止的oneshot来使用。代码分支

Describe the solution you'd like 参考 README.md

Additional context 开发的时候，碰到了以下问题，还没解决

workflow执行时，classpath冲突的问题 workflow 执行operator时，主动添加了当前java classpath ，参考这里。

command.add(buildClassPath());
command.add("com.miotech.kun.workflow.worker.local.OperatorLauncher");
      

private String buildClassPath() {
        String classPath = System.getProperty("java.class.path");
        checkState(StringUtils.isNotEmpty(classPath), "launcher jar should exist.");
        return classPath;
    }

但实际上，只需要OperatorLauncher 的依赖uber jar就可以了。即使只添加了OperatorLauncher 的依赖后，依然和 flink operator 本身的依赖冲突（guava-3.0 ， hadoop-2.7.3需要），没有办法解决，开发时用了一些hardcode去编译uber jar。如果是用户自定义开发的operator，jar包冲突的可能性还是蛮大的。

资源配置文件

flink operator运行时依赖一些 hadoop 配置文件，目前只是通过配置传入明文，但是比较丑陋。

name	type	note
hadoopConfYarn	string	hadoop yarn-site.xml 配置文件内容，xml格式
hadoopConfCore	string	hadoop core-site.xml 配置文件内容，xml格式
hadoopConfHdfs	string	hadoop hdfs-site.xml 配置文件内容，xml格式

**看了一下 spark **里也有类似的配置，目前是hard coded的，不是很安全，其实完全可以通过 core-site.xml来配置连接的文件系统信息。 workflow 之前我记得是有resource的概念的，可以通过resource相关的接口获取配置文件，避免直接在代码中暴露。

Further Plan 目前flink operator还只是提交运行的 flink 应用，没有跟workflow做整合。可以考虑在 元数据的基础上，定义流表注册到flink 的metastore中, 然后直接使用 flink sql，产生的结果也是个流表，可以直接落入到终端的 es，db，graph，kafka等存储中。这个还是需要看workflow本身的定义和规划。

Jan 24 '21 03:01 jentle

kun-scheduler kun-scheduler copied to clipboard

[Feature]Flink operator and help wanted

kun-scheduler
kun-scheduler copied to clipboard