Add proposal for flexible process types
Maintainers,
As you review this RFC please queue up issues to be created using the following commands:
/queue-issue <repo> "<title>" [labels]...
/unqueue-issue <uid>
Issues
(none)
I don't see anything in here about chaining transformations. Would we want to support that? If so, perhaps we could consider changing the names of the variables to something like $SOURCE_CMD instead of $ORIGINAL_CMD or something to that effect.
What do you think about adding an optional reason field to the transform section? Since we're going to be logging the transformation, it might be good for user experience.
I don't see anything in here about chaining transformations. Would we want to support that? If so, perhaps we could consider changing the names of the variables to something like
$SOURCE_CMDinstead of$ORIGINAL_CMDor something to that effect.
Not sure I follow, do you have an example of what you mean by chaining?
What do you think about adding an optional
reasonfield to the transform section? Since we're going to be logging the transformation, it might be good for user experience.
Sounds good to me. I'll add it.
I don't see anything in here about chaining transformations. Would we want to support that? If so, perhaps we could consider changing the names of the variables to something like
$SOURCE_CMDinstead of$ORIGINAL_CMDor something to that effect.Not sure I follow, do you have an example of what you mean by chaining?
Kind of like what @jabrown85 mentioned in the slack thread. "time wraps secretmanager wraps some-cmd".
I could see multiple buildpacks that do this transforming, so if you're the 3rd transformation buildpack, $ORIGINAL_CMD might not be accurate because it might not be the original per se, because it's already transformed. I think calling it something like $source helps to make it more apparent that we're transforming the process that may not be in its original state.
I don't see anything in here about chaining transformations. Would we want to support that? If so, perhaps we could consider changing the names of the variables to something like
$SOURCE_CMDinstead of$ORIGINAL_CMDor something to that effect.Not sure I follow, do you have an example of what you mean by chaining?
Kind of like what @jabrown85 mentioned in the slack thread. "time wraps secretmanager wraps some-cmd".
I could see multiple buildpacks that do this transforming, so if you're the 3rd transformation buildpack,
$ORIGINAL_CMDmight not be accurate because it might not be the original per se, because it's already transformed. I think calling it something like $source helps to make it more apparent that we're transforming the process that may not be in its original state.
Ohh, ok. I understand. Yes, I can see how that might get confusing, and a rename sounds reasonable. How about PREVIOUS_ or CURRENT_? I'm not necessarily opposed to SOURCE_ but it is kind of a synonym to ORIGINAL. Or we could just take off the prefix? or use a generic prefix like LAUNCH_? Thoughts?
Yeah I think dropping the prefix altogether is the right approach. It should be clear in context what those placeholders are.
Yeah I think dropping the prefix altogether is the right approach. It should be clear in context what those placeholders are.
Done.
@joeybrown-sf Any thoughts on the alternatives section? or the format of the toml in the spec changes section? Even if it's just to say you prefer it the way it is now. Lot of ways to do this, just hoping to get a little feedback on that. thanks!
sure thing!
I do prefer the single transform block, but that's just my personal preference. I like it because it's a bit more terse, less nested, and self-explanatory. That being said, I like "transformations" better than "transforms" because the former is always a noun where the latter could be a verb.
In my opinion, the following format would is my favorite.
[[transformations]]
type = "task"
command = ["time", $CMD]
args = ["more", "args"]
working-dir = "/somewhere-else"
@dmikusa do you want to take a round of edits given the feedback and get this up for vote soon?
@dmikusa do you want to take a round of edits given the feedback and get this up for vote soon?
@jabrown85 I will try, but things are a little busy over at Paketo at the moment and it's eating up my time. I will try.
Sorry for the delay. I've updated this to use the transformations block instead. I think this is ready for another review.
Following the discussion on the #buildpack-authors Slack channel, I have studied the 'Flexible Process Types' proposal regarding the use case:
Is there a way for an executable included in a CNB buildpack to start automatically during the application startup and rebind the
PORTenvironment variable that's visible to web processes?Background: I'm porting a Heroku legacy buildpack to CNB (https://github.com/wwwhisper-auth/wwwhisper-heroku-buildpack). The buildpack provides an authorization proxy that runs in front of a web app. Currently, the proxy starts automatically from a
.profile.dscript, binds the publicPORTpassed by Heroku runtime, and remaps thePORTto a local port. This allows users to simply run Herokubuildpack:add ...to enable the proxy without modifying theirProcfile.
It looks like the proposal addresses this use case. The authorization web proxy buildpack would provide the following launch.toml:
[[transformations]]
type = "web"
command = ["authproxy", "'$CMD_STRING'"]
reason = "Starting authorization proxy in front of the web app"
The authproxy process would then start listening on externally accessible PORT and change PORT to a local port before executing CMD_STRING as a shell script. For this use case, it is important for the CMD_STRING to still contain the original command (for example, fastapi run --port $PORT --host ::), not the expanded command (fastapi run --port 14000 --host ::), so PORT rebinding is visible in the executed command. My understanding is that this is what the RFC proposes.
Short: I am :+1: for adding a way to do this in a non-trivial case (it's possible for a trivial case already). I'm :-1: on adding a TOML-based DSL. I have an alternative proposal involving exposing resolved launch.toml information to the buildpack.
Longer:
What is possible today
For the simplest case where there's only one launch.toml that defines processes, this is already possible:
$ $ pack build wrap_time_demo --path . && pack inspect wrap_time_demo | grep web -A5 -B5
...
me/time-wrapper 0.0.0 -
Processes:
TYPE SHELL COMMAND ARGS WORK DIR
web (default) time bash -c bin/rails server --binding "[::]" --port "${PORT:?Error: PORT env var is not set!}" --environment "$RAILS_ENV" /workspace
This is because:
- launch.toml is already readable by future buildpacks
- Duplicate entries are spec-d such that the last running buildpack wins
However, it's not possible in the more complex case where there are multiple entries, or even multiple buildpacks that want to wrap commands i.e. time mcp other bin/rails. That's because the buildpack doesn't know the order of execution of previous buildpacks, but the lifecycle does.
This proposal
I'm +1 on the use case. I don't love adding a TOML-based DSL, though. This opens up the door for people to suggest new syntax options and pitch new use cases that need to be evaluated. There are also a lot of edge cases that upstream (lifecycle) will be tasked with handling. Such as: In the above example where someone is using bash -c (https://github.com/heroku/buildpacks/discussions/15) perhaps we want to special case that and instead modify the contents of the argument such that it is time bin/rails rather than time with bash -c bin/rails etc. If we handed the buildpack the TOML, it could make this distinction, but if we provide them a DSL, the DSL would need to add affordances for each and every edge case, or otherwise provide some level of Turing completeness, which I don't think we want.
Resolved launch.toml proposal
When faced with a specific problem, I ask if there's a general-purpose solution that could help me solve it if it existed. If we could provide the buildpack desiring to modify the resolved and finalized launch.toml as the lifecycle would see it, then it could read it in and make any modifications it wanted to its own launch.toml, which will take precedence in the event of name conflict.
I propose we inject this information into the bin/build process via a CNB_PRIOR_LAUNCH_TOML environment variable. The output of this would be TOML resolved from prior buildpacks, and then it's up to the buildpack to do what they want with it. They can choose to read it in modify it as they see fit, and then write it to their own launch.toml. When they pick conflicting/overlapping type names. That would allow them to effectively get the features needed to wrap commands, but we don't need to introduce or maintain a new DSL.
I think from the platform implementer side, the logic for doing this resolution is already present; it's just a matter of refactoring internals to expose it and standardize on how we want to expose the information. They wouldn't have to maintain N different DSLs for N different spec versions. While bash-based buildpacks reading and modifying TOML is more cumbersome than Go or Rust-based buildpacks, it's still possible. For the buildpack maintainer, it's slightly more effort than appending static toml, but it's one fewer concept to learn and supports use cases we've not yet realized we have.
I too am in favor of adding this functionality in general.
I really like the idea of having something like CNB_PRIOR_LAUNCH_TOML available to buildpack authors for potential modification. It would be a perfect use case for inline buildpacks and would potentially eliminate the need to have custom buildpacks for a somewhat simple change.
One challenge with inline buildpacks, at least from my experience, is that because they always pass detection you cannot create a build plan to require things like python, go, java, etc. This would mean you are limited to tools installed on the build image only. If this is a problem then inline buildpack won’t work and you would then need a custom buildpack so you can have a proper detect script, or you can keep the inline buildpack and use something like the paketo-community/buildp-plan buildpack to specify a build plan and any requirements to the accompanying plan.toml file. Either way this leads to extra files/directories being created.
One thing that would be great with any approach is that it can be configured through environment variables or project.toml entirely without the need for extra files and directories.
I know this is beyond the scope of this rfc, but if inline buildpacks did support a build plan (while still always passing detection), then you would have a very effective way to modify prior launch toml. If anyone else likes this idea and thinks it’s worth a separate rfc, please thumbs up.
@schneems - The intent of this proposal is to only cover basic use cases. It is out of scope to cover every edge case and possibility that someone might dream up. The belief of this proposal is that having a basic, limited way to address this user need is better than a.) nothing viable (present situation) and b.) a complicated way of doing everything a user might cook up.
It is not trying to be future-proof either. If additional legitimate use cases surface that are not covered by this proposal, then it would be up to the CNB team to evaluate those in the future.
Resolved launch.toml proposal
Hmm, that's interesting. The context behind the present proposal is on this thread in Slack: https://cloud-native.slack.com/archives/C0331B5QS02/p1734193643763589
From that, I think the three largest design considerations were:
- Not producing garbage process types (i.e. having a subsequent buildpack break a prior buildpack)
- Knowing who changed what, so that buildpacks cannot covertly change another buildpack's commands and so that the process is easily debuggable.
- A buildpack should not require knowledge about what other buildpacks (before or after) are doing. i.e. they should operate in isolation.
I think I'm following what you're suggesting, and it sounds like that proposal could address these considerations as well. Item 3.) is debatable because this new proposal would be giving a buildpack knowledge of what happened previously, but I think the spirit of 3.) is that buildpacks don't need to reach into the files of other buildpacks (i.e. not reading the launch.toml of other buildpacks from disk). If the lifecycle is coordinating things, IMO, that meets 3.).
Anyway. I'm not tied to the particular implementation, but feel strongly that we need this feature. I'd be curious to know from those working on the lifecycle which approach sounds like it'd be easier to implement. That might be a good data point in this decision as well.
Regarding wanting focus. A specific desire for Heroku is to pattern match. For example by transforming all types that start with “mcp.*”
I talked with @jabrown85 on implementation limitations of my “prior launch time” proposal is the need to handle multiple launch.toml versions.
To address that we could say that launch.toml changes need to be compatible with the prior version (changes are infrequent but have happened).
The transform DSL also introduces restrictions to future launch.toml changes. (Future updates need to not break the transform promises and be backwards compatible).