spring-cloud-dataflow
spring-cloud-dataflow copied to clipboard
[task-relaunch] Add ability to define default deployment properties for an App
Hi there,
I have been playing around with SCDF quite a bit lately and like it very much, so thanks for all your hard work!
There is one feature that I'm missing though: I'm using SCDF with Dockerized task apps. Some of those need Persistent volume claims and other complicated properties like that. Afaik the only way to specify this stuff is when launching the task app.
Having to pass this everytime the app is launched is kind of error prone. Also, since these properties are pretty low level, I think it would make sense to keep this out of the users/clients way as much as possible. My suggestion/wish would be to add some way to add such deployment properties when the app is registered.
Thanks for your time and kind regards, Philipp
Hi @philippn , Thanks for trying out SCDF and your input.
The feature you mention above is something we'd like to showcase as a recipe in our SCDF site: https://github.com/spring-io/dataflow.spring.io/issues/247.
We'll keep you posted on our attempt to set this up.
@philippn: It is indeed great to see this feedback—thank you for the support! Apart from what Ilaya pointed out, you could also define platform accounts as SCDF configuration. In the desired platform account, you could plug the volume claims as to the global property that can be used at every launch.
So, when launching the tasks, you'd select the platform (with that extra configuration) via the optional --platformName property.
@ilayaperumalg I'm looking forward to your showcase, thanks in advance!
@sabbyanandan That is very interesting, thank you! It definitely comes in handy when deploying all of it on different cloud providers.
My specific use case is really geared towards specific apps though. For example, I have a task task app that needs a certain PVC, while others do not and so on. This is not limited to deployer-properties, but as far as I understand for the plain "application.properties" you can already utilize the metadata jar approach and then configure them per app/task. Something similar for deployer properties would be quite useful.
For my particular use-case, its not a big issue though, because the tasks are launched via the Rest API from an external system. Basically I just need to make sure that this system manages these properties for now.
So I just thought this might be useful for maybe more people. Thanks to both of you for your time and keep up the good work!
I recently run into this issue as well. I deploy dataflow server into a k8s cluster using helm chart. There're some properties in task which needs override (e.g. internal storage server address)
From what I understand, there're two ways to pass in configurations for a task:
- In platform's default setting inside dataflow server's application.yml, which will apply to all tasks.
- As deployer property when launch a task. This applies to the task execution, but needs to provide those parameters each time running a task.
- Create one platform for each tasks.
But there're limitations in either approaches.
- With approach 1, the configuration of tasks exists in dataflow server, and server needs to be aware of all the tasks it needs to run. And so far I haven't find a way to configure each tasks individually because it's a platform's configuration. It makes configuration a bit messy. And also the resources, volume and envvar may differ from one task to another, making this approach less feasible.
- With approach 2, I need to pass in the configuration each time launching a task, which transfers the responsibility of configuration to the user of tasks, not ideal either.
- With approach 3, it's a compromise between 1 and 2. Who triggers the task needs to provide a platform name as mentioned above. It's easier than provide a whole list properties, but still exposing some internal details to the caller.
Wondering if it's possible to have a more flexible way to configure tasks. Here's some thoughts:
- Provide default properties ofan application when registering with spring cloud dataflow server. By this way, the configuration is done at the time of registration / updating, and it separates dataflow server configuration from task configuration.
- Be able to provide per-app confiugration in dataflow server's configuration. It is less ideal comparing to 1, but maybe it's easier to implement in current framework.
The main idea about this is to separate the concern between task, dataflow server, and downstream services who triggers a task. Tasks takes care of its own configuration and registration, dataflow server takes care of platform setting, and downstream caller only cares about data passed into task.
Thanks!
Hello @guoyiang,
Thank you for your feedback.
Let's explore this topic a bit more.
When you say task, do you mean the Task App or the Task Definition?
For your first point, "From what I understand, there're two ways to pass in configurations for a task:"
- You can set application properties for a task at task definition creation time. i.e.
timestamp --timestamp.format=YYYY - For deployer properties, I'll raise an issue to see if we can add deployer properties (Like app properties) to a task definition or at app registration during our next standup.
For your second point, "As deployer property when launch a task. This applies to the task execution, but needs to provide those parameters each time running a task."
- When you are using properties (deployer or applicaton) for a task definition, each subsequent launch of that task definition will reuse the properties from the previous execution. This is discussed as a part of the CI/CD docs located here: https://dataflow.spring.io/docs/batch-developer-guides/continuous-deployment/cd-basics/ . So you should only have to apply these for the first launch of a task definition.
- We will be writing up an entry on the use of config maps as discussed here: spring-io/dataflow.spring.io#247.
- Also you can look at using a config server. https://cloud.spring.io/spring-cloud-config/reference/html/
@cppwfs Thank you a lot for the info. Great hints and I understand something more.
I mixed Task App and Task Definition together, because we have a simple task definition with only one task at the moment. But the end goal is to pass some configuration when running a task app, which is triggered through a definition.
I played a bit with approaches you mentioned:
- Set application properties for task definition creation time
I missed this point when I was reading docs, but it will actually meet the need to pass in some default configuration to the app. That's one of the reason I was considering to use deployment properties, to pass in some environment variables to configure the app.
However, I see a minor drawback of this approach.
- Update of task definition is not allowed. So in order to the property, it's required to delete and re-create the task definition, which will incur a minor downtime. And it's not as flexible as using a yaml file.
- Launch a task definition with deployer properties to allow subsequent runs use same property
This approach works to configure the app as well. However, it has its own limitations as well:
- Launching the tasks is required to make the properties persist in spring cloud dataflow, but some parameters are unknown at time of deployment. e.g. With a task app to do batch import, at the time rolling out a new task version, I do not have the file to be imported.
- It's tricky to remove a property. From my experiment I need to set the value to empty to remove it, and there's no "overwrite' mode. So I need to query the last exeuction to get applied deployer properties and compare with current ones to find the difference. Further more, deployer property names got changed to
spring.cloud.deployer.kubernetes.*from original settingdeployer.<app_name>.kubernetes.*.
It would be great if there's a formal way to configure deployer properties of a task app or task definition, instead of implicit inheritance from last run, so we can set this properties without need to run the task or check for last run. I think it will be covered by what you already mentioned, to allow set deployer properties during task definition or app registration. If this function can be added, ideally along with capability to update a task definition, I think it will give a lot of flexibility in configuring task app/definition. Looking forward to the news!
spring cloud config server is also another approach we'll evaluate, but probably over a longer term because the complexity from deploying one more component.
Thanks again!
Noticed there's a similar issue #2194
@guoyiang I had a discussion with the team on this topic. So we created the following issue to handle what you have discussed. https://github.com/spring-cloud/spring-cloud-dataflow/issues/4423 This should address your request. Thank you for providing excellent feedback.
Hello @guoyiang, Thank you for your feedback. Let's explore this topic a bit more. When you say task, do you mean the
Task Appor theTask Definition?For your first point, "From what I understand, there're two ways to pass in configurations for a task:"
- You can set application properties for a task at task definition creation time. i.e.
timestamp --timestamp.format=YYYY- For deployer properties, I'll raise an issue to see if we can add deployer properties (Like app properties) to a task definition or at app registration during our next standup.
For your second point, "As deployer property when launch a task. This applies to the task execution, but needs to provide those parameters each time running a task."
- When you are using properties (deployer or applicaton) for a task definition, each subsequent launch of that task definition will reuse the properties from the previous execution. This is discussed as a part of the CI/CD docs located here: https://dataflow.spring.io/docs/batch-developer-guides/continuous-deployment/cd-basics/ . So you should only have to apply these for the first launch of a task definition.
- We will be writing up an entry on the use of config maps as discussed here: Need to add a doc on how to use ConfigMaps and config server instead of props for tasks. spring-io/dataflow.spring.io#247.
- Also you can look at using a config server. https://cloud.spring.io/spring-cloud-config/reference/html/
So, do you mean that I can use configMaps for deployer properties as well, not only for application properties?
How can I do that?
It's possible to pass application properties to a task launching it with the deployment property below:
deployer.mytask.kubernetes.config-map-refs=myconfigmap
But specifying the usage of configMap can be done only (as far as I know) through a deployer property passed to the task launcher... so, how can I define the configMap which contains the deployer properties for a specific task?
Thanks in advance
Hello!! We are trying out SCDF in our project, and were also surprised about these configuration nuances. I think that this sentence by @guoyiang synthetizes the problem:
The main idea about this is to separate the concern between task, dataflow server, and downstream services who triggers a task.
The current design of SCDF forces to specify at launch time three different kinds of information:
- application properties (e.g. Spring Cloud Config URL)
- deployment properties (e.g. container CPU limit in Kubernetes platform)
- command line arguments (for Spring Batch)
The first two should be abstracted away from the team responsible for launching the tasks. In other words, this team should only bother about the what (task to launch, plus its business parameters), not the how (technical details such as the entrypoint style to start up the container in Kubernetes).
Moreover, the successive task executions should not be dependent on each other, which makes guessing about the "current" deployment configuration quite cumbersome. Launch executions should be deterministic. I guess that this solution was implemented in order to avoid passing deployment parameters over and over (such as the entrypoint style in the example above).
Please, let me give my two cents and share how we just solved it in our local SCDF instance. We patched DefaultTaskExecutionService as follows:
public long executeTask(String taskName, Map<String, String> taskDeploymentProperties, List<String> commandLineArgs) {
...
Map<String, String> launchProperties = new HashMap<>();
launchProperties.putAll(taskConfigurationProperties.getProperties());
launchProperties.putAll(taskDeploymentProperties);
taskExecutionInformation.setTaskDeploymentProperties(launchProperties);
// Finally create App deployment request
AppDeploymentRequest request = this.taskAppDeploymentRequestCreator.createRequest(taskExecution,
taskExecutionInformation, commandLineArgs, platformName, launcher.getType());
TaskManifest taskManifest = createTaskManifest(platformName, request, launchProperties);
We added a new "properties" attribute to TaskConfigurationProperties class, which lets us specify task-specific properties (both "app" and "deployer"). If additional properties are provided at launch-time (e.g. "properties" param in /task/executions), these take precedence. The manifest is persisted with the effective configuration which will be used to launch the task, but will not be used to calculate configuration for successive ones.
Global properties and job-specific properties can be configured in any property source from the Spring environment, such as config map, Cloud Config or Zookeeper.
# GLOBAL APPLICATION PROPERTIES
spring.cloud.dataflow.applicationProperties.task:
spring.config.import: configserver:localhost:9393
sprin.main.banner-mode: log
# GLOBAL DEPLOYER PROPERTIES
spring.cloud.dataflow.task.platform.kubernetes.accounts.default:
createJob: false
entryPointStyle: shell
# TASK-SPECIFIC APPLICATION/DEPLOYER PROPERTIES
spring.cloud.dataflow.task.properties:
deployer.my-task.kubernetes.limits.cpu: 270m
app.my-task.spring.main.banner-mode: off
All in all, task configuration and launch are now separated, and, as a result, can be addressed by different teams.
I need to use a single docker image, containing an Java SpringBoot application, to launch different business level tasks by feeding in different java properties at runtime. Say, I would like to define 40 Applications at SCDF, having all of them using the same docker image. These 40 Applications would be the artifacts involved in composed tasks definition as well.
As a result, I need to define application properties at the time I create Applications. Will this requirement be addressed soon?