rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Add proposal for flexible process types

Open dmikusa opened this issue 1 year ago • 20 comments

Readable

dmikusa avatar Jan 07 '25 17:01 dmikusa

Maintainers,

As you review this RFC please queue up issues to be created using the following commands:

/queue-issue <repo> "<title>" [labels]...
/unqueue-issue <uid>

Issues

(none)

buildpack-bot avatar Jan 07 '25 17:01 buildpack-bot

I don't see anything in here about chaining transformations. Would we want to support that? If so, perhaps we could consider changing the names of the variables to something like $SOURCE_CMD instead of $ORIGINAL_CMD or something to that effect.

joeybrown-sf avatar Jan 10 '25 16:01 joeybrown-sf

What do you think about adding an optional reason field to the transform section? Since we're going to be logging the transformation, it might be good for user experience.

joeybrown-sf avatar Jan 10 '25 16:01 joeybrown-sf

I don't see anything in here about chaining transformations. Would we want to support that? If so, perhaps we could consider changing the names of the variables to something like $SOURCE_CMD instead of $ORIGINAL_CMD or something to that effect.

Not sure I follow, do you have an example of what you mean by chaining?

dmikusa avatar Jan 10 '25 16:01 dmikusa

What do you think about adding an optional reason field to the transform section? Since we're going to be logging the transformation, it might be good for user experience.

Sounds good to me. I'll add it.

dmikusa avatar Jan 10 '25 16:01 dmikusa

I don't see anything in here about chaining transformations. Would we want to support that? If so, perhaps we could consider changing the names of the variables to something like $SOURCE_CMD instead of $ORIGINAL_CMD or something to that effect.

Not sure I follow, do you have an example of what you mean by chaining?

Kind of like what @jabrown85 mentioned in the slack thread. "time wraps secretmanager wraps some-cmd".

I could see multiple buildpacks that do this transforming, so if you're the 3rd transformation buildpack, $ORIGINAL_CMD might not be accurate because it might not be the original per se, because it's already transformed. I think calling it something like $source helps to make it more apparent that we're transforming the process that may not be in its original state.

joeybrown-sf avatar Jan 10 '25 18:01 joeybrown-sf

I don't see anything in here about chaining transformations. Would we want to support that? If so, perhaps we could consider changing the names of the variables to something like $SOURCE_CMD instead of $ORIGINAL_CMD or something to that effect.

Not sure I follow, do you have an example of what you mean by chaining?

Kind of like what @jabrown85 mentioned in the slack thread. "time wraps secretmanager wraps some-cmd".

I could see multiple buildpacks that do this transforming, so if you're the 3rd transformation buildpack, $ORIGINAL_CMD might not be accurate because it might not be the original per se, because it's already transformed. I think calling it something like $source helps to make it more apparent that we're transforming the process that may not be in its original state.

Ohh, ok. I understand. Yes, I can see how that might get confusing, and a rename sounds reasonable. How about PREVIOUS_ or CURRENT_? I'm not necessarily opposed to SOURCE_ but it is kind of a synonym to ORIGINAL. Or we could just take off the prefix? or use a generic prefix like LAUNCH_? Thoughts?

dmikusa avatar Jan 10 '25 18:01 dmikusa

Yeah I think dropping the prefix altogether is the right approach. It should be clear in context what those placeholders are.

joeybrown-sf avatar Jan 10 '25 18:01 joeybrown-sf

Yeah I think dropping the prefix altogether is the right approach. It should be clear in context what those placeholders are.

Done.

dmikusa avatar Jan 10 '25 18:01 dmikusa

@joeybrown-sf Any thoughts on the alternatives section? or the format of the toml in the spec changes section? Even if it's just to say you prefer it the way it is now. Lot of ways to do this, just hoping to get a little feedback on that. thanks!

dmikusa avatar Jan 10 '25 18:01 dmikusa

sure thing!

I do prefer the single transform block, but that's just my personal preference. I like it because it's a bit more terse, less nested, and self-explanatory. That being said, I like "transformations" better than "transforms" because the former is always a noun where the latter could be a verb.

In my opinion, the following format would is my favorite.

[[transformations]]
type = "task"
command = ["time", $CMD]
args = ["more", "args"]
working-dir = "/somewhere-else"

joeybrown-sf avatar Jan 10 '25 18:01 joeybrown-sf

@dmikusa do you want to take a round of edits given the feedback and get this up for vote soon?

jabrown85 avatar Mar 27 '25 14:03 jabrown85

@dmikusa do you want to take a round of edits given the feedback and get this up for vote soon?

@jabrown85 I will try, but things are a little busy over at Paketo at the moment and it's eating up my time. I will try.

dmikusa avatar Mar 30 '25 20:03 dmikusa

Sorry for the delay. I've updated this to use the transformations block instead. I think this is ready for another review.

dmikusa avatar May 21 '25 20:05 dmikusa

Following the discussion on the #buildpack-authors Slack channel, I have studied the 'Flexible Process Types' proposal regarding the use case:

Is there a way for an executable included in a CNB buildpack to start automatically during the application startup and rebind the PORT environment variable that's visible to web processes?

Background: I'm porting a Heroku legacy buildpack to CNB (https://github.com/wwwhisper-auth/wwwhisper-heroku-buildpack). The buildpack provides an authorization proxy that runs in front of a web app. Currently, the proxy starts automatically from a .profile.d script, binds the public PORT passed by Heroku runtime, and remaps the PORT to a local port. This allows users to simply run Heroku buildpack:add ... to enable the proxy without modifying their Procfile.

It looks like the proposal addresses this use case. The authorization web proxy buildpack would provide the following launch.toml:

[[transformations]]
type = "web"
command = ["authproxy", "'$CMD_STRING'"]
reason = "Starting authorization proxy in front of the web app"

The authproxy process would then start listening on externally accessible PORT and change PORT to a local port before executing CMD_STRING as a shell script. For this use case, it is important for the CMD_STRING to still contain the original command (for example, fastapi run --port $PORT --host ::), not the expanded command (fastapi run --port 14000 --host ::), so PORT rebinding is visible in the executed command. My understanding is that this is what the RFC proposes.

wrr avatar Jun 11 '25 09:06 wrr

Short: I am :+1: for adding a way to do this in a non-trivial case (it's possible for a trivial case already). I'm :-1: on adding a TOML-based DSL. I have an alternative proposal involving exposing resolved launch.toml information to the buildpack.

Longer:

What is possible today

For the simplest case where there's only one launch.toml that defines processes, this is already possible:

$ $ pack build wrap_time_demo --path . && pack inspect wrap_time_demo | grep web -A5 -B5
...
  me/time-wrapper             0.0.0          -

Processes:
  TYPE                 SHELL        COMMAND        ARGS                                                                                                                          WORK DIR
  web (default)                     time           bash -c bin/rails server --binding "[::]" --port "${PORT:?Error: PORT env var is not set!}" --environment "$RAILS_ENV"        /workspace

This is because:

  • launch.toml is already readable by future buildpacks
  • Duplicate entries are spec-d such that the last running buildpack wins

However, it's not possible in the more complex case where there are multiple entries, or even multiple buildpacks that want to wrap commands i.e. time mcp other bin/rails. That's because the buildpack doesn't know the order of execution of previous buildpacks, but the lifecycle does.

This proposal

I'm +1 on the use case. I don't love adding a TOML-based DSL, though. This opens up the door for people to suggest new syntax options and pitch new use cases that need to be evaluated. There are also a lot of edge cases that upstream (lifecycle) will be tasked with handling. Such as: In the above example where someone is using bash -c (https://github.com/heroku/buildpacks/discussions/15) perhaps we want to special case that and instead modify the contents of the argument such that it is time bin/rails rather than time with bash -c bin/rails etc. If we handed the buildpack the TOML, it could make this distinction, but if we provide them a DSL, the DSL would need to add affordances for each and every edge case, or otherwise provide some level of Turing completeness, which I don't think we want.

Resolved launch.toml proposal

When faced with a specific problem, I ask if there's a general-purpose solution that could help me solve it if it existed. If we could provide the buildpack desiring to modify the resolved and finalized launch.toml as the lifecycle would see it, then it could read it in and make any modifications it wanted to its own launch.toml, which will take precedence in the event of name conflict.

I propose we inject this information into the bin/build process via a CNB_PRIOR_LAUNCH_TOML environment variable. The output of this would be TOML resolved from prior buildpacks, and then it's up to the buildpack to do what they want with it. They can choose to read it in modify it as they see fit, and then write it to their own launch.toml. When they pick conflicting/overlapping type names. That would allow them to effectively get the features needed to wrap commands, but we don't need to introduce or maintain a new DSL.

I think from the platform implementer side, the logic for doing this resolution is already present; it's just a matter of refactoring internals to expose it and standardize on how we want to expose the information. They wouldn't have to maintain N different DSLs for N different spec versions. While bash-based buildpacks reading and modifying TOML is more cumbersome than Go or Rust-based buildpacks, it's still possible. For the buildpack maintainer, it's slightly more effort than appending static toml, but it's one fewer concept to learn and supports use cases we've not yet realized we have.

schneems avatar Jul 31 '25 22:07 schneems

I too am in favor of adding this functionality in general.

I really like the idea of having something like CNB_PRIOR_LAUNCH_TOML available to buildpack authors for potential modification. It would be a perfect use case for inline buildpacks and would potentially eliminate the need to have custom buildpacks for a somewhat simple change.

One challenge with inline buildpacks, at least from my experience, is that because they always pass detection you cannot create a build plan to require things like python, go, java, etc. This would mean you are limited to tools installed on the build image only. If this is a problem then inline buildpack won’t work and you would then need a custom buildpack so you can have a proper detect script, or you can keep the inline buildpack and use something like the paketo-community/buildp-plan buildpack to specify a build plan and any requirements to the accompanying plan.toml file. Either way this leads to extra files/directories being created.

One thing that would be great with any approach is that it can be configured through environment variables or project.toml entirely without the need for extra files and directories.

I know this is beyond the scope of this rfc, but if inline buildpacks did support a build plan (while still always passing detection), then you would have a very effective way to modify prior launch toml. If anyone else likes this idea and thinks it’s worth a separate rfc, please thumbs up.

jericop avatar Aug 01 '25 03:08 jericop

@schneems - The intent of this proposal is to only cover basic use cases. It is out of scope to cover every edge case and possibility that someone might dream up. The belief of this proposal is that having a basic, limited way to address this user need is better than a.) nothing viable (present situation) and b.) a complicated way of doing everything a user might cook up.

It is not trying to be future-proof either. If additional legitimate use cases surface that are not covered by this proposal, then it would be up to the CNB team to evaluate those in the future.

dmikusa avatar Aug 06 '25 13:08 dmikusa

Resolved launch.toml proposal

Hmm, that's interesting. The context behind the present proposal is on this thread in Slack: https://cloud-native.slack.com/archives/C0331B5QS02/p1734193643763589

From that, I think the three largest design considerations were:

  1. Not producing garbage process types (i.e. having a subsequent buildpack break a prior buildpack)
  2. Knowing who changed what, so that buildpacks cannot covertly change another buildpack's commands and so that the process is easily debuggable.
  3. A buildpack should not require knowledge about what other buildpacks (before or after) are doing. i.e. they should operate in isolation.

I think I'm following what you're suggesting, and it sounds like that proposal could address these considerations as well. Item 3.) is debatable because this new proposal would be giving a buildpack knowledge of what happened previously, but I think the spirit of 3.) is that buildpacks don't need to reach into the files of other buildpacks (i.e. not reading the launch.toml of other buildpacks from disk). If the lifecycle is coordinating things, IMO, that meets 3.).

Anyway. I'm not tied to the particular implementation, but feel strongly that we need this feature. I'd be curious to know from those working on the lifecycle which approach sounds like it'd be easier to implement. That might be a good data point in this decision as well.

dmikusa avatar Aug 06 '25 13:08 dmikusa

Regarding wanting focus. A specific desire for Heroku is to pattern match. For example by transforming all types that start with “mcp.*”

I talked with @jabrown85 on implementation limitations of my “prior launch time” proposal is the need to handle multiple launch.toml versions.

To address that we could say that launch.toml changes need to be compatible with the prior version (changes are infrequent but have happened).

The transform DSL also introduces restrictions to future launch.toml changes. (Future updates need to not break the transform promises and be backwards compatible).

schneems avatar Aug 07 '25 02:08 schneems