dolphinscheduler icon indicating copy to clipboard operation
dolphinscheduler copied to clipboard

[Bug] DataSource data source caching problem

Open jack-wqing opened this issue 1 year ago • 5 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

类: org.apache.dolphinscheduler.plugin.datasource.api.plugin.DataSourceClientProvider dataSouce本地缓存: private static final Cache<String, DataSourceClient> uniqueId2dataSourceClientCache = CacheBuilder.newBuilder() .expireAfterWrite(duration, TimeUnit.HOURS) .removalListener((RemovalListener<String, DataSourceClient>) notification -> { try (DataSourceClient closedClient = notification.getValue()) { logger.info("Datasource: {} is removed from cache due to expire", notification.getKey()); } }) .maximumSize(100) .build();

What you expected to happen

expireAfterWrite: 指定的时间到之后,不管怎样都是强者关闭DataSource 更好的方式可以使用 expireAfterAccess 替代,保存使用的过程中尽量不被关闭

How to reproduce

比如Hive场景,hql执行时间一般较长,使用的过程DataSource比关闭;导致hive thrift地址的socket被关闭,但是内部状态判断还在继续 案例: sql task error and appId:[] java.sql.SQLException: org.apache.thrift.transport.TTransportException: SASL authentication not complete at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) at org.apache.hive.jdbc.HiveStatement.executeUpdate(HiveStatement.java:490) at org.apache.hive.jdbc.HivePreparedStatement.executeUpdate(HivePreparedStatement.java:122) at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61) at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java) at org.apache.dolphinscheduler.plugin.task.sql.SqlTask.executeUpdate(SqlTask.java:323) at org.apache.dolphinscheduler.plugin.task.sql.SqlTask.executeFuncAndSql(SqlTask.java:220) at org.apache.dolphinscheduler.plugin.task.sql.SqlTask.handle(SqlTask.java:167) at

Anything else

No response

Version

3.1.x

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

jack-wqing avatar Nov 26 '24 06:11 jack-wqing

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

Class: org.apache.dolphinscheduler.plugin.datasource.api.plugin.DataSourceClientProvider dataSouce local cache: private static final Cache<String, DataSourceClient> uniqueId2dataSourceClientCache = CacheBuilder.newBuilder() .expireAfterWrite(duration, TimeUnit.HOURS) .removalListener((RemovalListener<String, DataSourceClient>) notification -> { try (DataSourceClient closedClient = notification.getValue()) { logger.info("Datasource: {} is removed from cache due to expire", notification.getKey()); } }) .maximumSize(100) .build();

What you expected to happen

expireAfterWrite: After the specified time is up, the strong one will close the DataSource no matter what. A better way can be to use expireAfterAccess instead, and try not to be closed during saving and use.

How to reproduce

For example, in the Hive scenario, hql execution time is generally long, and the process DataSource used is closed; causing the socket of the hive thrift address to be closed, but the internal status judgment continues. Case: sql task error and appId:[] java.sql.SQLException: org.apache.thrift.transport.TTransportException: SASL authentication not complete at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) at org.apache.hive.jdbc.HiveStatement.executeUpdate(HiveStatement.java:490) at org.apache.hive.jdbc.HivePreparedStatement.executeUpdate(HivePreparedStatement.java:122) at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61) at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java) at org.apache.dolphinscheduler.plugin.task.sql.SqlTask.executeUpdate(SqlTask.java:323) at org.apache.dolphinscheduler.plugin.task.sql.SqlTask.executeFuncAndSql(SqlTask.java:220) at org.apache.dolphinscheduler.plugin.task.sql.SqlTask.handle(SqlTask.java:167) at

Anything else

No response

Version

3.1.x

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

github-actions[bot] avatar Nov 26 '24 06:11 github-actions[bot]

Please using hive task type instead of sql task type to execute hive sql.

SbloodyS avatar Nov 26 '24 07:11 SbloodyS

Please use AdHocDataSourceClient, you can find the related code in dev.

ruanwenjun avatar Nov 27 '24 11:11 ruanwenjun

Please using hive task type instead of sql task type to execute hive sql.

所有的sql任务,使用该provider的强制关闭都是有问题;不光是hive sql

jack-wqing avatar Nov 29 '24 08:11 jack-wqing

AdHocDataSourceClient

其实sql任务数据源强制关闭,都会影响已经执行的sql任务;我的是3.1.x得版本;我看master/dev还存在我指出的代码位置;个人建议还是可以修改缓存的过期方式

jack-wqing avatar Nov 29 '24 08:11 jack-wqing