spring-cloud-dataflow
spring-cloud-dataflow copied to clipboard
Configurable DataSource property keys for spring cloud task applications
Hi, Im using scdf version 2.9.2 and I tested it both locally and on kubernetes(also scheduling). What I found out that scdf server pass information about datasource and another properties via SPRING_APPLICATION_JSON environment when boot entryPointStyle.
Is possible to completely disable passing datasource properties into task application or at least control/filter them? We have some legacy task applications using primary datasource for business and secondary one for task handling. Another case is that task application can use different jdbc driver and it is overriden by scdf server value. I have read something about prefixes, but this was just suggestion, not implemented yet I think.
thanks
All tasks launched from Spring Cloud Data Flow must implement Spring Cloud Task i.e. @EnableTask
and have a data source that connects to the SCDF database.
So it can't be disabled
However, we do have a sample of how you can support multiple data sources in a single task-boot application. https://github.com/spring-cloud/spring-cloud-task/tree/main/spring-cloud-task-samples/multiple-datasources. Such that you can select which datasource task should use.
I understand that I can set up secondary datasource for task handling. That it is not a problem.
But SCDF will always send arguments like spring.datasource.* into my could application and override first datasource used to business logic right?
If so I would like to have full control on datasource for task handling, that means that task applications will have central config with datasource to scdf database
A user wants the ability to set the establish the keys for data source properties for their task applications. Currently these keys are fixed: https://github.com/spring-cloud/spring-cloud-dataflow/blob/main/spring-cloud-dataflow-server-core/src/main/java/org/springframework/cloud/dataflow/server/service/impl/TaskServiceUtils.java#L133-L155
As I mentioned it would be great to disable(configure) all datasource arguments passed to task application. I think that task could have full control how to connect to scdf database, for example via config server and so on. Further passing spring.datasource.driverClassName is very restrictive, when SCDF running with org.mariadb.jdbc.Driver, but task uses com.mysql.cj.jdbc.Driver then application has to include mariadb dependecy.
There have been several requests along this area. We are going to investigate how to better handle multiple 'business database' instances as first class citizens. Thanks for the feedback
I've re-read the issue and wanted to make some suggestions for a workaround that should be easier vs. revamping SCDF to read from multiple 'business databases' that also contain the spring cloud task/batch bookeeping tables, which is a larger design change, but one that we may eventually get to.
The spring cloud task autoconfiguration can be disabled and replaced with configuration that creates a custom implementation of the TaskConfigurer
interface, say DataFlowTaskConfigurer
. This implementation would read properties such as
spring.scdf.datasource.url
spring.scdf.datasource.driverClassName
spring.scdf.datasource.password
spring.scdf.datasource.username
This would setup the task related infrastructure. SCDF (as it stands now) would still be sending in the 'bookeeping' database and not the 'business' database but there are two ways to override that w/o a change to SCDF
- Use config server, values from config server have a higher priority than those from
SPRING_APPLICATION_JSON
- Pass in the standard spring boot database properties as command line arguments when launching the task, since command line arguments have higher priority than
SPRING_APPLICATION_JSON
.
One would have to pass in the spring.scdf.datasource.*
properties as well - and since there are credentials involved, using config server for that would also be recommended.