Summary

astro:env - Improve DX and security around environment variables in Astro

Links

Apr 15 '24 15:04 florian-lefebvre

Hey @florian-lefebvre I love the general idea! Tbh I don't know much about env at the moment so I can't really comment on all of this.

I like the idea of the zod-lite validation approach and creating an envSchema. I also really like how integrations could enforce certain environment variables too! That's really great.

I just wonder: with the separation between static and dynamic, are devs supposed to know and keep track of which env variables will be static or dynamic? Or can all env variables be accessed by both astro:env/static and astro:env/dynamic and basically it's just under the hood works differently?

Also just while we're on the topic of environment variables. Is there a reason astro uses the PUBLIC_ prefix for public variables, rather than the VITE_ prefix from Vite itself? Is it just to abstract Vite away a little bit more?

Apr 16 '24 09:04 jdtjenkins

I just wonder: with the separation between static and dynamic, are devs supposed to know and keep track of which env variables will be static or dynamic? Or can all env variables be accessed by both astro:env/static and astro:env/dynamic and basically it's just under the hood works differently?

No variables won't be available as both static and dynamic, it's either one or another. I think it's great this way for technical reasons and also so that devs are aware where their variables end up (eg. static public can end up on the client). Do you see a possible downside to such approach?

Is there a reason astro uses the PUBLIC_ prefix for public variables, rather than the VITE_ prefix from Vite itself? Is it just to abstract Vite away a little bit more?

I don't know the reason behind this, I just reused the existing convention. IMO it's more explicit that it can end up on the client than having VITE_ as a prefix

Apr 16 '24 10:04 florian-lefebvre

I don't think this idea of static/dynamic and public/private is quite right. Any "static" variable, meaning one that is set at build time, is always "public", in the sense that anyone with access to the source code can see the value. I would recommend a new distinction here:

public is any variable that is replaced at build-time into the code.
private is any variable that is retrieved at run-time through the process.env[name] etc. API.
static is any variable that is defined in the config schema.
dynamic is variables that are not, ie getEnv(foo + bar) is dynamic.

With these definitions, I don't think the .dynamic() and .static() helpers are needed at all.

Additionally, I don't think the /static and /dynamic subpaths are needed at all. They can all be part of astro:env, but the exports of astro:env will differ in the client where only the public items will be available.

If we want to restrict some public environment variables to only be available in the server or client I would recommend .client() and .server() helpers. If we went with this, .client().private() should be a type (and runtime) error.

Apr 17 '24 17:04 matthewp

I think that makes sense, let me summarize how the api looks and what it means:

.server().private():
- process.env
- Supports static (ie. key and value typed) and dynamic access (ie. any key, returns string | undefined)
.server().public():
- import.meta.env, any key except built-in ones (eg. import.meta.env.SSR)
- Only supports static access
.client().private():
- Not supported because it would require passing some possibly sensible data to the client
.client().public():
- import.meta.env, key prefixed by PUBLIC except built-in ones (eg. import.meta.env.SSR)
- Only supports static access

Now let's say we have such config:

import { defineConfig, envField } from "astro/config"

export default defineConfig({
	env: {
		schema: {
			API_PORT: envField.server().private().number(),
			API_URL: envField.server().public().string(),
			PUBLIC_FLAG: envField.client().public().boolean()
		}
	}
})

If we have a single module like below, there's no issue on the server but on the client because we have no way to update types based on where it's called:

import { API_URL, PUBLIC_FLAG, getSecret } from "astro:env"

// On the server, no issues
API_URL // string
PUBLIC_FLAG // boolean
getSecret("API_PORT") // number
getSecret("UNKNOWN") // string | undefined

// But on the client...
API_URL // typed as string, returns undefined
PUBLIC_FLAG // boolean
getSecret("API_PORT") // typed as number, returns undefined
getSecret("UNKNOWN") // typed as string | undefined, returns undefined

I think it will cause too much confusion for users, because you can't guess it's going to be undefined if not correctly typed. Instead it could make sense to at least have 2 imports like astro:env/server and astro:env/client:

import { API_URL, getSecret } from "astro:env/server"
import { PUBLIC_FLAG } from "astro:env/client"

Let me know what you think, and if I summarized correctly at the top!

Apr 18 '24 05:04 florian-lefebvre

I think that's also an interesting way to look at it. I would prefer chaining .public()/.private() first, and then chain .client()/.server(), as the bonus with this is that we can say .client()/.server() is only available after chaining .public(). .private() variables are always to be used in servers only. Personally I quite like this API too.

About combining astro:env, I also think we need a better way to warn misused env var rather than undefined (I also touched at here). I think what Florian suggested with two imports seems like the best way to warn in that case. (EDITed away some outdated comment)

Apr 18 '24 06:04 bluwy

I agree with Blu on the ordering! I also have a question on API design. Astro DB uses a base function with an object parameter (column.number({ optional: true })), while Astro Env is using a chain (envField.private().static()). Ideally, Astro's custom validators would be aligned. Astro DB is pre-1.0, so that API choice is not set in stone. I'm curious what reasons y'all have for preferring one format over another?

Apr 18 '24 11:04 bholmesdev

My initial proposal was to have chaining first for the important stuff (so server/client and public/private) and then object syntax per type to match astro db:

envField.private().static().string({ optional: true })

I think Blu suggested to match zod api with full chaining

envField.private().static().string().optional()

I'm in favor to keep at least server/client and public/private as chained calls and then the type as an object

Apr 18 '24 12:04 florian-lefebvre

I agree with @bholmesdev that we should align the APIs. I think I tend to prefer making public/private/client/server properties of the options object rather than chaining. This mostly comes down to the discussion above about what order chaining should happen.

In a fluent API (another name for chaining) the expectation is that they can be chained in any order as they are operating on a common this value. If that's not the case and ordering matters I would find using chaining to be unintuitive.

If we go with an options object we can make the types work by making them be a discriminated union where private and client can't be used together.

Apr 18 '24 13:04 matthewp

Some questions I have on the client and server options are:

Should it be required that you provide at least one of these?
If not required then what is the default? That it be applied to both?

I think I lean towards making at least one of these be required rather than picking a default, because accidentally bringing in values into the client, even if "public" would be a footgun.

Apr 18 '24 13:04 matthewp

On the topic of getSecret() in the client, I think it should throw if you call that function. The only reason for it existing at all is because of TypeScript. There's no use-case for calling it on the client, so we should throw.

Apr 18 '24 13:04 matthewp

In a fluent API (another name for chaining) the expectation is that they can be chained in any order as they are operating on a common this value. If that's not the case and ordering matters I would find using chaining to be unintuitive.

I don't think we should strictly follow this. Instead I think we should follow a zod-like API so it feels unified. Zod also has certain cases where certain chains can only be used after another, for example z.string().datetime(). If we opt for an pure-options based config, then it doesn't feel as unified (to me).

Also, Zod supports .optional() and .default() instead of { default/optional: boolean }.

EDIT: The fluent API wiki also doesn't explicitly mention that methods need to / can be in any order.

Apr 18 '24 13:04 bluwy

No comments from me, looks neat. I like the builder API for the .public() and stuff, always looks cute. I agree with bluwy regarding fluent vs options

Apr 18 '24 14:04 Princesseuh

I think we should have the fluent vs. options API discussion externally. It will probably block this RFC from being fully accepted but shouldn't affect implementation very much. I'll start a thread elsewhere.

Apr 18 '24 15:04 matthewp

Hmmm... one thing that is not quite clear to me is how you would define an env variable for both client and server.

Something like PUBLIC_SOME_THIRD_PARTY_API="https://example.com/api" that gets imported into custom_fetch.ts that then gets used by both server and client

May 10 '24 15:05 Tc-001

I agree it's not super clear but a client variable can be imported on both client and server

May 11 '24 16:05 florian-lefebvre

Oh I see! Then you probably need to change this: https://github.com/withastro/roadmap/pull/894/files#diff-3c58a592f2b3662c3cda9ddfdd25a71de2a989fff0a5fcebfcd805dee36be9baR68

Otherwise looks really nice!

May 11 '24 17:05 Tc-001

I've updated the RFC to match the current implementation PR state and make things clearer

May 12 '24 06:05 florian-lefebvre

@florian-lefebvre thank you for this RFC. I think you mostly covered everything. However, something important that is missing is how an adapter gets to provide their "getter" function: process.env VS Deno.env.

May 13 '24 09:05 ematipico

Hello fellow Astronauts!

We over @dmno-dev have been very deep in this specific problem space for the last few months, building out a platform-agnostic configuration tool meant to be a unified system for your entire stack. You can read more at dmno.dev and specifically check out our Astro integration. While our solution may be overkill for some folks, we hope it becomes the go-to solution for anyone working in a monorepo or wanting some tools with a bit more than what is built-in.

Regardless, we are super happy to see the progress on this RFC and for Astro to have amazing first-class env var handling.

Apologies for being a little late to the party, as I know some excellent implementation work by @florian-lefebvre is already underway, but I wanted to share a few thoughts here (that I’ve shared with him already) from what we've learned while building out DMNO.

Please understand this is all meant to be constructive, simply to provoke discussion, and get the best possible outcome for Astro users.

Challenges of hybrid rendering

In any SSR app (output server or hybrid) it can be very difficult to reason about where/when certain code will be run and/or bundled. In fact it often changes because the parent page/component or the site settings rather than what is happening in the component itself.

This presents challenges on two fronts - preventing leaks, and reasoning about if config will be actually dynamic (vary at boot time) or not.

There just is no easy way to enforce that the script part of a component has access to secrets, but only when being run on the server... Or to ensure that a config item you want to vary dynamically at boot time was not used during a pre-render of a page, effectively making it static even though it was not "bundled" into the code.

When the line between client and server can be so blurry, I think it will lead to confusion to say an item is a “server” or “client” item, or vary behaviour depending on how it is accessed.

Instead I think we should rely on a clear config schema to express the user's intent, build the system to do the right thing whenever possible, and to protect the user from anything unexpected.

Config Schema

When we think about how to classify our config items, to me it really boils down to 2 concepts:

Is this data "sensitive" (vs "public") (alt terms "secret","private") meaning do we need to handle it sensitively and make sure it is never leaked This obviously implies you can never access the item from the client. But considering we are in a world of hybrid server and client rendering and items may be used in various code that could be run in both, it’s just not that simple, and it is often not obvious while authoring code whether it will be running in the client or server. The important thing is, we handle the data with care and never leak the value in ANYTHING sent over the wire.
Is the item "dynamic" (vs "static") (alt related terms “build” vs “boot” time, "runtime", “bundled”) meaning is the value is always loaded at boot time and never bundled into built or pre-rendered code Obviously if you are building a static site, everything will be replaced at build time… But if not, you may have items that you want to replace, and some which you may want to dynamically change using env vars at boot time. I do not believe this is as tied to whether something is sensitive as the current proposal and most frameworks (ex: PUBLIC_/NEXT_PUBLIC_) assume, likely because our tooling mostly originated in a SSG, totally static era.

The important part here is that these 2 concepts are fairly easy to reason about and totally decoupled

In DMNO this is accomplished by marking items with sensitive: true/false and dynamic: true/false along with project-level settings to define the default behaviour (ex: default_static vs default_dynamic).

It should be noted that most users probably won't care about this dynamic/static stuff, and the default behaviour can still be set to public_static, which matches what most are used to (sensitive items are dynamic and public items are static).

The Edge cases

sensitive+static - it is less common, but I see no reason to not let a user have a sensitive config item affect a build and tree-shaking Re: security - yes if someone has access to the source code, they could see a secret that has been "baked" at build time, but in the case of deploying code to a server you control, or publishing a docker container to a private registry, those secrets would not be compromised.
public+dynamic - this presents some challenges as they would need to be loaded by the client at runtime, but it certainly possible Re: performance - these can be fetched on demand and users can decide to use them sparingly, not on page load

You can read more about DMNO's handling of this in our dynamic config guide.

Access patterns

The current proposal has multiple methods of accessing certain config items which is supposed to help achieve safety. But this means you must think about which access pattern to use, and changing the schema of an item may necessitate changing how it is accessed throughout your code.

By using a combination of proxy objects and build-time rewrites, the user gets to access everything exactly the same way, and it all just works as intended. In DMNO, we inject 2 globals (which could of course be imported from virtual modules instead)

DMNO_CONFIG (could be ALL_CONFIG) - includes every config item
DMNO_PUBLIC_CONFIG (could be PUBLIC_CONFIG) - includes only non-sensitive items

NOTE - while the extra PUBLIC object is not strictly necessary, I do think it's helpful when browsing what items can be safely used on the client, and it's a bit closer to what folks are used to with the PUBLIC_ prefix.

The injected proxy objects mostly let us give the user more helpful error messages (ex: “that item doesn’t exist” vs “that item exists but it is sensitive”), and it handles loading of any dynamic-public items. Loading these dynamic config items in the client uses a blocking http request so that we can maintain the same DX, rather than needing to do something like await getConfigItem(‘DYNAMIC_THING’).

Protection from footguns

Once we have a full schema to express the user's intent, we can effectively protect the user from doing the wrong thing, without having to think too much:

we only replace "static" items at build time, both on the client and the server
ALL_CONFIG is not available on the client, although we do inject a proxy object that gives helpful error messages
any dynamic+public items accessed on the client load from a blocking http request, on demand via that proxy object
We inject a middleware to detect any leaked secrets before it goes out over the wire
We can inject a vite build hook to detect any leaked secrets in any built assets
We can check if any dynamic config items were used during a pre-render and throw or warn the user

`PUBLIC_` prefix

One more note on the current proposal. Once we've introduced a proper schema, I don't think it is helpful to enforce a naming convention. I'd recommend keeping one or the other as it just adds another source of truth and opportunity for confusion. Personally with the rest of the system keeping you safe, I'd recommend relying on the schema, as it means you only update the schema, not where the config is accessed.

In DMNO, our type system allows extendable types to be marked as sensitive - for example any instance of a StripeSecretTokenDataType is always going to be sensitive - so we must use the schema

Hopefully this is helpful, at the very least to kick off some discussion. And please if this sounds interesting, head over to our Astro integration guide and give it a try. We're happy to get deep into the weeds with folks and to help in any way we can!

Also please don't hesitate to reach out by email or on the Astro Discord.

Cheers!

May 23 '24 05:05 theoephraim

In doing a larger review of other tools, it looks like sveltkit is thinking along very similar lines to what I have proposed above - see https://kit.svelte.dev/docs/modules#$env-dynamic-private They let you import from 4 different virtual modules for the combos of public/private and static/dynamic

They use the prefix only as the marker of public/private, and the import method to help control static/dynamic behaviour.

May 27 '24 16:05 theoephraim

Yep we looked at it during the stage 2 rfc but thanks for bringing it again!

May 28 '24 05:05 florian-lefebvre

Maybe if this is something y'all have already thought through, and maybe I'm overestimating it's importance, but I think you want to make sure users can still benefit from build-time injection and the constant folding / tree-shaking that it enables.

I think anything that looks like import { SOME_VAR } from 'astro:env' is going to make this very difficult.

On the other hand, using anything like import AstroEnv from 'astro:env' and then using AstroEnv.SOME_VAR makes it fairly trivial to add extra entries in the vite's define configuration. Of course you probably won't want to automatically enable this for all env items have some rules about which ones can be replaced, based on the schema, or (less optimally) on how it was imported/used.

Jun 09 '24 05:06 theoephraim

Just to make sure I understand correctly, if I have this:

import { FOO } from 'astro:env/client'

if (FOO) {
	// do something
}

and FOO is false, you expect the whole thing to be removed on build.

I have to admit I'm not familiar enough with vite to know if it currently works, we're doing some pretty simple stuff for public variables so maybe it works already? I think @bluwy may know

Jun 09 '24 08:06 florian-lefebvre

Using import { FOO }, import * as AstroEnv, or import AstroEnv (if we support this) will all work with Rollup's treeshaking, so I don't think we have to change the syntax to accommodate it.

Jun 10 '24 10:06 bluwy

Ah - right of course because it's not rewritten by vite/rollups define... However this does also mean it needs to be an actual constant and not some proxy or function call. Thanks for confirming!

Jun 10 '24 16:06 theoephraim

Very glad to hear about this, as I think it will allow me to unwind all the complexity made when Astro 4 summarily removed ability for import.met.env in astro.config, for a package adding Sanity live Presentation editing on Astro.

I have no idea why the discussion I raised a the time didn't inform me about this -- you 'd have gotten early feedback, and I would have had a great deal less trouble.

The trouble came because of the ferocious, non-traceable deployment-only errors which occur if you allow the client to one way or another access any code which had touched what looked innocent, the server-side variables when produced by the recommended method. This confuses Vite, in very bad ways.

Jun 18 '24 04:06 narration-sd

I'd like to ask -- when do you foresee this becoming a more-than-experimental feature (months, etc.)??

Jun 18 '24 04:06 narration-sd

This feature should go out of experimental in a few minors, hard to tell at this time

Jun 18 '24 06:06 florian-lefebvre

New related RFC https://github.com/withastro/roadmap/discussions/956

Jun 18 '24 14:06 florian-lefebvre

New related RFC #957

Jun 18 '24 15:06 florian-lefebvre