RFC: Parser Utility for Typescript
Is this related to an existing feature request or issue?
No response
Which AWS Lambda Powertools utility does this relate to?
Other
Summary
Parser Utility for Typescript
Powertools for python has a parser utility that uses pydantic as the underlying library. There is a similar need on the Typescript side.
Zod will be a great fit for the parser utility in typescript. It has a lot of similarities with pydantic and would be great fit for Powertools.
- A parser Typescript class, with over-loadable methods that that has ability to parse models or envelopes
- There will be Zod Model package for all the lambda event sources that are listed here
- There will also be a envelope package that will house all the built-in envelope for typescript like here
- Also the ability to create built-in envelopes in Zod using generics
Use case
Parsers for Powertools Typescript
Data model parsing is one of the widely used utilities while building services / lambdas. When it comes to Typescript, there are very few libraries that does this job really well. Zod is definitely at the top of this list.
Proposal
Parser Utility
import { z } from 'zod';
export class Parser {
public parse<T extends z.ZodType<object>>(model: T, event: string, safeParse ?: boolean): object;
public parse<T extends z.ZodType<object>>(model: T, event: object, safeParse ?: boolean): object;
public parse<T extends z.ZodType<object>>(model: T, envelope: BaseEnvelope<T>, event: string | object, safeParse ?: boolean): object {
// model.parse(event) (default)
// model.safeParse(event) (safeParse = true)
// if envelope=true, then parse the model within the envelope
// for eventbridge
// envelope(model).parse(event)
return parsedBody // zod model;
}
}
Built-In Zod Schema
Sample EventbridgeSchema (Zod)
import { z } from 'zod';
// Event Bridge Base Event Schema
// Refer: https://github.com/DefinitelyTyped/DefinitelyTyped/blob/b1fe16547af8f9a4e786f57961d3b57d809aa7a5/types/aws-lambda/trigger/eventbridge.d.ts#L8
const eventBridgeEventBaseSchema = z.object({
id: z.string(),
version: z.string(),
account: z.string(),
time: z.string(),
region: z.string(),
resources: z.array(z.string()),
source: z.string(),
"detail-type": z.string(),
"replay-name": z.string().optional()
})
// EventBridge Event Wrapper Schema
/*
Extends the base EventBridge schema with the custom event zod schema
Example:
const orderEventModelSchema = eventBridgeEventSchema(order);
*/
const eventBridgeEventSchema = <Type extends z.ZodTypeAny>(schema: Type) => {
return eventBridgeEventBaseSchema.extend({
detail: schema
}).transform((v: any) => {
v['detailType'] = v['detail-type']
v['replayName'] = v['replay-name']
delete v['detail-type']
delete v["replay-name"]
return v;
});
}
Eventbridge Custom detail implementation with Zod model
npm install zod
// Sample schema for the event Detail
export const orderItem = z.object({
id: z.number(),
quantity: z.number(),
description: z.string(),
});
export const order = z.object({
id: z.number(),
description: z.string(),
items: z.array(orderItem),
});
// Wrap the event Detail schema in the EventBridge base schema
const orderEventModelSchema = eventBridgeEventSchema(order);
// Sample event parsing
const orderdata = {
version: '0',
id: '6a7e8feb-b491-4cf7-a9f1-bf3703467718',
'detail-type': 'OrderPurchased',
'replay-name': 'test-replay-name',
source: 'OrderService',
account: '111122223333',
time: '2020-10-22T18:43:48Z',
region: 'us-west-1',
resources: ['some_additional'],
detail: {
id: 10876546789,
description: 'My order',
items: [
{
id: 1015938732,
quantity: 1,
description: 'item xpto',
},
],
},
};
const parsedData = orderEventModelSchema.parse(orderdata);
console.log(JSON.stringify(parsedData));
Lambda Handler - parser Decorator function
import { Logger } from '@aws-lambda-powertools/logger';
import { LambdaInterface } from '@aws-lambda-powertools/commons';
import { parser } from '@aws-lambda-powertools/parser';
import { EventBridgeEnvelope } from '@aws-lambda-powertools/parser/envelopes';
// Persistent attributes added outside the handler will be
// cached across invocations
const logger = new Logger();
const order = z.object({
id: z.number(),
description: z.string(),
items: z.array(orderItem),
});
class Lambda implements LambdaInterface {
// Decorate your handler class method
@logger.injectLambdaContext()
@parser(model, EventBridgeEnvelope)
public async handler(
_event: OrderModel,
_context: Context
): Promise<void> {
logger.info(`Received order model ${_event}`);
}
}
const myFunction = new Lambda();
export const handler = myFunction.handler.bind(myFunction); //
Lambda Handler - parser middy midddleware
import { parser } from '@aws-lambda-powertools/parser/middleware';
import middy from '@middy/core';
import { EventBridgeEnvelope } from '@aws-lambda-powertools/parser/envelopes';
const order = z.object({
id: z.number(),
description: z.string(),
items: z.array(orderItem),
});
const lambdaHandler = async (_event: any, _context: any) => {
logger.info('This is an INFO log with some context');
};
export const handler = middy(lambdaHandler).use(parser(order, envelope));
Out of scope
Potential challenges
- Zod is one of the libraries out there, but rest of the community might have a different opinion
- Feedback has to be requested if this will be a viable option for the community for parser (or) something else
- Undersrtand more about
genericsandtransformsin Zod as it would be much needed for the final cut implementation
Dependencies and Integrations
No response
Alternative solutions
No response
Acknowledgment
- [X] This feature request meets Lambda Powertools Tenets
- [ ] Should this be considered in other Lambda Powertools languages? i.e. Python, Java
I @Muthuveerappanv, thank you so much for taking the time to write this RFC.
Parser utility is definitely something we want to look into as it would help us advance towards our goal of feature parity with Powertools for Python, so this RFC is more than welcome.
I have to admit that I am not very familiar with Zod as a library aside from having read about it in the past. I've read good things about it, especially when it comes to TypeScript support, so at least on the surface it seems like a sensible suggestion.
Before committing to it I would like to have more info about it both from the technical standpoint but also in terms of project health and adoption.
On the technical side, I'd like to understand:
- what's the problem it actually solves vs implementing the same behavior without a library (it might be obvious but as mentioned I don't know much about it)
- what's the impact of using the library at runtime in a Lambda environment
- what's the compatibility/requirement story around Node and TypeScript versions
- whether or not this is something that can be used / makes sense also in JavaScript-only codebases
On the project/governance side:
- what's the maturity of the project?
- is it actively maintained and expected to be for the foreseeable future?
- what's their release cadence?
- what's their history with patch releases?
In terms of the content of the RFC, I also have a couple of followup questions:
- I see that there's a second package (
zod-to-json-schema) mentioned but never used, is this on purpose? - What would be the UX/DX of using this parser as a user? Would it be possible to have an example of usage with a function handler?
- I see that decorators have been mentioned as out of scope. I'd be curious to understand the reasoning. From my perspective this utility seems like a perfect fit for decorator & middleware-based usage.
Thank you again for the RFC, looking forward to flash it out!
technical side
what's the problem it actually solves vs implementing the same behavior without a library (it might be obvious but as mentioned I don't know much about it)
Zod uses a schema first approach, making it strictly typed, with all of the internal lambda triggers models and helps a great deal with custom models too, making it easiser (difficult) for developers to stick with a fully typed implementation of lambda based services. also takes care of validation, transformation in one-library.
what's the impact of using the library at runtime in a Lambda environment
Zod is very light-weight, so from a size prespective its unzipped 567 kB and has 0 dependencies. Other than that, it doesn't have any impact on the lambda function itself, in terms of overloading dependencies. (there is a whole bunch of libraries build around Zod, but that wont makets if way into powertools for this RFC)
what's the compatibility/requirement story around Node and TypeScript versions
Please refer Requirements section
TypeScript 4.5+! & compiler {strict=true} in tsconfig.json
whether or not this is something that can be used / makes sense also in JavaScript-only codebases
will not add much value in javascript only codebases
project/governance side
what's the maturity of the project?
its a very well documented and maintained project. Proper release cycles and clear release notes and bug fixes
is it actively maintained and expected to be for the foreseeable future?
yes, there are multiple maintainers as well
what's their release cadence? what's their history with patch releases?
The release history should give you a good overview - https://github.com/colinhacks/zod/releases
terms of the content of the RFC
I see that there's a second package (zod-to-json-schema) mentioned but never used, is this on purpose?
It was just print the json schema after parsing, its not part of the RFC, will remove
What would be the UX/DX of using this parser as a user? Would it be possible to have an example of usage with a function handler?
Added in the main comment - https://github.com/awslabs/aws-lambda-powertools-typescript/issues/1334#issue-1599934094
I see that decorators have been mentioned as out of scope. I'd be curious to understand the reasoning. From my perspective this utility seems like a perfect fit for decorator & middleware-based usage.
I was thinking we will work on the different zod models and then parser utility & address decorators and middlewares separately. but we can wrap them up as well if we agree on the UX, etc. I'm fine with that appraoch too. Updated - https://github.com/awslabs/aws-lambda-powertools-typescript/issues/1334#issue-1599934094
Thank you for clarifying the points and updating the RFC by addressing my points, Muthu. I appreciate it.
I guess for me the next step would be to familiarize myself with Zod and dive a bit deeper into how it works.
In the meanwhile I'd like to encourage other readers to read the RFC and weigh in.
This issue has not received a response in 2 weeks. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.
Is it intentional that this library has two different RFCs for Validation (#508) and Parsing (this issue)? They seem very similar to me. I found this resource which defines the differences between validation and parsing but it seems to me like they should be one feature for this library. Appreciate the clarification in advance and am curious of other perspectives on this.
Hi @bestickley, yes this is intentional.
It's true that there's an overlap between parsing and validating, however we have two RFCs and we intend to offer two separate utilities because we expect different types of customers (or workloads) to lean into one or the other.
Based on the experience of Powertools for AWS (Python), we have seen that there's a good amount of customers who have invested in developing JSON schemas or simply are used to work with those. These customers, and by extension, workloads that are migrating to Lambda without major rearchitecting, might want to reach for a Validation utility that is able to process the schemas they already have.
On the other hand, newer or greenfield workloads, might want to go directly with a Parser utility and get both validation and parsing in one utility. Parsing however doesn't just provide a two-in-one experience, but also allows a degree of expressivity that the JSON schema spec simply doesn't support, as well as allowing transformation and advanced type-casting.
Additionally, depending on the validation and parsing modules that we end up using there's a chance that choosing between the two utility will involve some level of performance tradeoff. At this stage it's too early to speak of this, but that I wouldn't be surprised if it becomes another deciding factor
Ultimately, one of our tenets is to allow for progressive adoption & enhancement. Offering two separate utilities in this context allows us to serve customers at different stages of their Serverless journey.
Hey all,
quick update on the RFC. The proposal looks good and we will start breaking it into tasks and move forward with implementation. The key features are:
- built-in schema for popular events from other AWS services
- decorator, function wrapper, middy middleware
- envelopes, use schema to extract payloads from SQS, EventBridge and other envelopes.
I have started scoping the work and listing the issues/tasks to create and I have a few points/questions that I would like to discuss. None of these is a blocker against a Parser utility based on Zod, however I think it'll be useful down the line to have these points recorded in the RFC.
The points are not in any specific order:
1. Models vs Envelopes
Python Parser has Models that you can extend, at the same time it has the concept of envelopes which at least on the surface seems to have some overlap. In both sections of the docs they show similar events (EventBridge) and both appear to be two ways of defining the model/schema of the internal body/details field.
What’s the actual difference? The proposal in the RFC seems to conflate the two entities in one. Is this a result/construct of how Pydantic works or is it something that is functionally different? If so, how does this translate to Zod? And how does one brings their own envelope like in Python Parser?
2. Model vs Schema naming
Python uses the Model wording, as far as I can tell this is also how Pydantic calls them. In Zod the equivalent entity is called schema.
Should we use model and align with Python, or instead use schema to align with the Zod ecosystem? Our tenet of “They follow language idioms and their community’s common practices.” would suggest the latter but I think it’s important to agree on this since the beginning.
3. Re-exporting Zod
Python Parser re-exports Pydantic so that customers can do from aws_lambda_powertools.utilities.parser.pydantic import xyz. I don’t know why they made this choice but I wonder if we should do the same or not.
Given that we are planning on including Zod as dependency there’s an argument to be made in favor of following a similar strategy. At the same time, as far as I know this is not a common practice in the JS/TS ecosystem and I cannot think of any benefit of doing so, while there’s a non-zero chance that doing so will have impact on bundling and tree-shaking.
Thoughts?
4. Data model validation
Pydantic has a notion of validation (link) which is also explicitly called out in the Powertools Parser docs and that seems to be treated as a separate feature from the actual parsing.
From what I can see this is a choice made by Pydantic rather than Powertools Parser (from here):
Although validation is not the main purpose of Pydantic, you can use this library for custom validation. Pydantic is primarily a parsing and transformation library, not a validation library. Validation is a means to an end: building a model which conforms to the types and constraints provided. In other words, Pydantic guarantees the types and constraints of the output model, not the input data. This might sound like an esoteric distinction, but it is not. If you're unsure what this means or how it might affect your usage you should read the section about Data Conversion below.
Does Zod make a similar distinction? If so, what’s the equivalent of this in Zod? Does it make sense to have this distinction (in Pydantic this is done via class decorators which is for sure not compatible with how Zod works)?
5. Naming & Implementation
The proposal in the RFC uses parser both as name of the decorator and Middy middleware. In the name of consistency with Python Parser I would instead consider using eventParser (camelCase version of event_parse from Python Parser) for both of these.
Likewise, the RFC seems to suggest implementing this as a class. I would consider instead using an architecture similar to the one we used in the Idempotency utility, in which we have a parse function that has the bulk of the logic and then separately expose a Middy middleware, a decorator, etc.
6. Function wrapper
Generally speaking we try to have our utilities cover three types of usages: 1/ class method decorators, 2/ middy middleware, 3/ manual usage (aka classic function-based usage).
For this specific utility, which is intended to primarily target parsing the event received by a function handler, I’m confident that 1 and 3 from above make total sense. Number 2 on the other hand, I’m not entirely sure.
Based on a first assessment of the implementation it looks like we are going to have a parse function that is going to be used under the hood by decorator, middleware, etc. Both decorator and middleware will be an extremely thin wrapper around parse and will mostly call ZodSchema.parse(event).
With this in mind, does it really add value to have a function wrapper versus just having customers call that at the top of their function, i.e.
export const handler = (rawEvent: unknown) => {
const event = ZodSchema.parse(rawEvent);
// ... rest of the code
}
As it stands decorator, middleware, and wrapper function are just half (or less) of the value of the Parser utility, and a lot of the value is in the schemas/models that we offer. If this is true then having a wrapper function doesn't really add much.
If instead we make these APIs enhance Zod's experience with things like: 1/ handling JSON strings (which Zod doesn't do natively/in a straightforward way), 2/ enhancing error handling/extraction (which can be boilerplate-y in Zod), and other things, then having a function wrapper does make sense.
7. Testing strategy
For other utilities we have strived to reach unit test 100% coverage and have integration tests in each utility. Given that this utility relies on a certain set of inputs (the schemas and maybe envelopes) and that there isn't any AWS API interaction we should discuss the testing strategy.
For unit tests, I think that unless Zod makes this impossible, we should continue having 100% test coverage for our code. This however implies that we have examples for AWS events that we want to support.
How do we plan to acquire these events? And do we want to make any effort to programmatically keep them up to date?
For integration tests, does it make any sense at all to have integration tests for this utility? For Batch Processing, which works similarly, we have opted for not having them so there's an argument in favor of doing the same here.
Having integration tests in which we simply load the utility in a Lambda and we send artificial events as part of the test wouldn't test/prove any additional behavior that we are already not covering with the unit tests. At the same time, deploying all the kind of resources needed to generate real events (and their failure modes) would require a significant effort which I'm not sure it's justified by the value add.
Thoughts?
Great points to foster the direction of this feature and resolve additional unknowns.
- Models vs Envelopes
Python Parser has Models that you can extend, at the same time it has the concept of envelopes which at least on the surface seems to have some overlap. In both sections of the docs they show similar events (EventBridge) and both appear to be two ways of defining the model/schema of the internal body/details field.
What’s the actual difference? The proposal in the RFC seems to conflate the two entities in one. Is this a result/construct of how Pydantic works or is it something that is functionally different? If so, how does this translate to Zod? And how does one brings their own envelope like in Python Parser?
I agree that this is somehow confusing. The built-in models are necessary so we can extend them with custom models of the payload. They also bring additional functionality to parse the payload based on the event source, i.e. SQS message inside kinesis event. For instance, the detail type of EventBridge envelope is RawDictOrModel while the body is Union[str, Type[BaseModel], BaseModel].
As for the envelopes, there is a strong argument that in many situations, we are only interested in the payload of the event. But it takes several steps to get there. 1/ Understand the event structure, /2 get the right field (was it body, message, Records, records?) 3/ call zod. We are fighting the inconsistency by making it transparent and simplify the experience. You tell us the event envelope (which we already implemented) and give us your payload schema, we do the rest.
- Model vs Schema naming
Python uses the Model wording, as far as I can tell this is also how Pydantic calls them. In Zod the equivalent entity is called schema. Should we use model and align with Python, or instead use schema to align with the Zod ecosystem? Our tenet of “They follow language idioms and their community’s common practices.” would suggest the latter but I think it’s important to agree on this since the beginning.
As you mentioned, following the tenet, we should stick 100% to the language domain of our ecosystem and keep it consistent. The trade-off I see is that it'd be more difficult for developers who build applications in python and typescript AND use same powertools features.
- Re-exporting Zod
Python Parser re-exports Pydantic so that customers can do from aws_lambda_powertools.utilities.parser.pydantic import xyz. I don’t know why they made this choice but I wonder if we should do the same or not. Given that we are planning on including Zod as dependency there’s an argument to be made in favor of following a similar strategy. At the same time, as far as I know this is not a common practice in the JS/TS ecosystem and I cannot think of any benefit of doing so, while there’s a non-zero chance that doing so will have impact on bundling and tree-shaking.
We had similar discussions on the SDK re-export for parameters and decided to not re-export. I agree with your argument to not include it.
- Naming & Implementation
The proposal in the RFC uses parser both as name of the decorator and Middy middleware. In the name of consistency with Python Parser I would instead consider using eventParser (camelCase version of event_parse from Python Parser) for both of these.
Likewise, the RFC seems to suggest implementing this as a class. I would consider instead using an architecture similar to the one we used in the Idempotency utility, in which we have a parse function that has the bulk of the logic and then separately expose a Middy middleware, a decorator, etc.
Yes, having a thing layer with core logic in a base function is the best approach, we had similar learnings from Idempotency.
- Function wrapper If instead we make these APIs enhance Zod's experience with things like: 1/ handling JSON strings (which Zod doesn't do natively/in a straightforward way), 2/ enhancing error handling/extraction (which can be boilerplate-y in Zod), and other things, then having a function wrapper does make sense.
I think this is the direction we can aim for. But I don't have any specifics on configuration or the context we can pass to a wrapper to provide more functionality yet. I'd suggest to focus on decorator and middy first.
- Testing strategy For unit tests, I think that unless Zod makes this impossible, we should continue having 100% test coverage for our code. This however implies that we have examples for AWS events that we want to support.
How do we plan to acquire these events? And do we want to make any effort to programmatically keep them up to date?
We can keep the event structure similar to data_classes in python and use just few examples for out tests (not load json files).
Having integration tests in which we simply load the utility in a Lambda and we send artificial events as part of the test wouldn't test/prove any additional behavior that we are already not covering with the unit tests. At the same time, deploying all the kind of resources needed to generate real events (and their failure modes) would require a significant effort which I'm not sure it's justified by the value add.
I agree it'd be too much overhead to have all the required services to send real events. For this case we need to collect real examples, and it's ok to start with few. It will grow over time when we find more edge cases.
Hey Alex, thanks for the exhaustive answers, I agree on all points.
I think we can start diving deeper into the implementation details of the utility. I have changed the status of the issue to status/confirmed and moved it to its own milestone & category on the board.
Early next week I'll open a first set of issues to start tracking the work. After that, and once we get the next release (last of v1.x) out, you can start working on this.
Hi all, zod is a CPU-heavy library which can be a performance bottleneck. They're tracking some improvement tickets but I haven't seen much of a progress. I suggest you explore alternative lightweight libs such myzod (that's my choice for lambdas, otherwise I use zod in non-lambda code).
Hi @byF, any chance that you could point to some benchmarks ran on one of the current managed Lambda runtimes?
We'd like to take a look so that we can better understand the impact.
@dreamorosi sorry, I don't have any particular Lambda runtime-based benchmarks at my disposal. There is a general benchmark available: https://moltar.github.io/typescript-runtime-type-benchmarks/. I can also point you to a general zod issue regarding perf https://github.com/colinhacks/zod/issues/205. Anecdotally, I saw a noticeable jump in CPU usage and bundle size after switching to zod.
Hey @byF , thanks for raising this point. I have looked into the issue you have mentioned and also the benchmarks. There are a lot of validation libraries with various performance benchmarks. For the parser utility we needed to decide on one based on a combination of open source health, security, popularity, feature set, and adoption rate. We think zod fits the criteria, but we might be wrong. With a many different choices there will always be a situation where a more performant library pops up on the radar and people want other libraries to support it (i.e. valibot).
This is not an attempt to defend zod. I think there are other great project like valibot or typia that we might support in the future. In an ideal world we would support most of the popular validation libraries, where you could bring your own parser and schema.
In next step I will run performance tests (#1955) to understand the impact of the parser utility, so we can be transparent and add this information to our documentation.
For those who are looking for performance, I suggest typia, the library is really fast because they basically generate the most optimized code at build time instead of generate/parse during runtime.
You can use the types defined on @types/aws-lambda and then generate the validation/assertion/parsing/etc...
You don't even need to ship the typia, you can basically generate those functions at build time and ship the generated code from typia.
We just launched the first beta version of the utility based on Zod.
It's available starting from version v2.1.0 and we are looking at gathering feedback over the next few weeks to correct any issue and remove any sharp edges.
We encourage you to give it a try and provide feedback.
⚠️ COMMENT VISIBILITY WARNING ⚠️
This issue is now closed. Please be mindful that future comments are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.