aws-lambda-graphql icon indicating copy to clipboard operation
aws-lambda-graphql copied to clipboard

Feature: Trigger GraphQL execution immediately on published events (without writing to a store)

Open andyrichardson opened this issue 3 years ago • 12 comments

About

Hey there, first off - thanks for the awesome lib!

I'm working with a team who are currently using a self-made implementation of serverless subscriptions and we'd really like to use this library instead.

One thing that is holding us back right now is the use of polling for events.

Current functionality

So if I'm not mistaken, in the case of a new event being published, the following happens:

  1. A publish event is triggered pubsub.publish('SOME_EVENT')
  2. This event is written to some kind of persistence layer (e.g. MemoryEventStore)
  3. The persistence layer is then polled
  4. Upon polling event, all new events are then sent to an event handler (e.g. MemoryEventProcessor)
  5. Subsequently the event is handed to resolvers and the

Expected functionality

If we're working with push/event based systems, I'm confused as to why events would need to be persisted and polled.

My expectation was that an event publish (1.) would immediately trigger some kind of event handler (4.) without the need for polling or persistence.

andyrichardson avatar Jan 28 '21 16:01 andyrichardson

The file you linked to is a test fixture.

Events are picked up based on which type of managers you use. For example, if you're using dynamodb, when something is published, it's added to the table which triggers a call to the lambda letting it know there is a new item in the table: https://github.com/michalkvasnicak/aws-lambda-graphql/blob/master/docs/serverless.yml#L58-L65

cranberyxl avatar Jan 28 '21 17:01 cranberyxl

Thanks for the response!

The file you linked to is a test fixture

My bad! How does an event handler get called when an event is written to the event store in memory?

it's added to the table which triggers a call to the lambda

Totally, but it looks like this still involves polling under the hood.

Is there a particular reason we use a store for events as opposed to triggering an event handler immediately in the case of a push?

From what I can tell, this is what the project seems to currently be doing

New event -> Write to event store -> Poll event store (e.g. DynamoDB stream) -> Trigger handler

But with a push based workflow, I don't understand the need to write events to a store of any kind if we can instead trigger the handler immediately.

New event -> Trigger handler

The implementation I'm currently using doesn't have the same abstractions as this project but is able to write to dispatch handlers immediately on a new event without the need for an event store.

andyrichardson avatar Jan 29 '21 10:01 andyrichardson

This is no polling. Dynamo streams are serverless, see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html

A lambda is invoked when something is added to a dynamo table.

cranberyxl avatar Jan 29 '21 14:01 cranberyxl

That's the link I shared - see quote below

AWS Lambda polls the stream and invokes your Lambda function synchronously when it detects new stream records

andyrichardson avatar Jan 29 '21 14:01 andyrichardson

That all happens in the AWS black box, this library doesn't write the code for it.

cranberyxl avatar Jan 29 '21 14:01 cranberyxl

I'll rename the issue because this is less about the polling and more about triggering events without needing to write to a store.

So ignoring the polling, I'm wondering - when an event is published, why do we write to a store, which subsequently causes a read, which subsequently calls the same lambda that triggered the write?

andyrichardson avatar Jan 29 '21 14:01 andyrichardson

@andyrichardson in that case you need an event store that performs execution on publish() call.

https://github.com/michalkvasnicak/aws-lambda-graphql/blob/master/packages/aws-lambda-graphql/src/MemoryEventStore.ts#L10

https://github.com/michalkvasnicak/aws-lambda-graphql/blob/master/packages/aws-lambda-graphql/src/DynamoDBEventStore.ts#L72

Both event stores just store the event, but in your case you need to combine it with MemoryEventProcessor. So you need new event store, that contains the logic from memory event processor and triggers that logic on publish() call, so you can await the execution.

michalkvasnicak avatar Feb 01 '21 07:02 michalkvasnicak

Thanks for the response @michalkvasnicak 🙏

So funnily enough, I've been doing exactly that:

  • Use MemoryEventStore + MemoryEventProcessor
  • Use DynamoDB variants of everything else
  • Call dispatch on pubsub

I found that calling dispatch didn't have any effect and acted like a no-op.

There didn't seem to be any attempts to get subscribers from dynamodb following a dispatch.

Once the event is written to the MemoryEventStore, what is the sequence of cascading events that would lead to the memory event processor being called?

The lack of callbacks and push to the event store is what led me to suspect there was a need for polling 🤔

andyrichardson avatar Feb 01 '21 15:02 andyrichardson

So you need new event store, that contains the logic from memory event processor and triggers that logic on publish()

Sorry I misread this - so the built in memory event store is working as intended (no dispatch)?

I might be missing something but I'm curious, how come there is a pattern of writing events to a store as opposed to solely consuming events and forwarding them on to the event processor?

I can see why this might be useful for much smaller projects where all published events are exclusive to the service that is consuming them, but for the majority(?) of use cases, messages are likely to be dispatched from external services (AWS SNS/SQS, Kafka, etc)

andyrichardson avatar Feb 01 '21 15:02 andyrichardson

Memory* parts are not intended to be used in AWS dev, they're used only in local dev mode (so yes they're working as intended). For your use case you need to write new event store by implementing https://github.com/michalkvasnicak/aws-lambda-graphql/blob/d20ed3cc81323617a1235765d43968f49c7b8521/packages/aws-lambda-graphql/src/types/events.ts#L4 and also need to copy the logic from MemoryEventProcesor to publish method of your new event processor.

I might be missing something but I'm curious, how come there is a pattern of writing events to a store as opposed to solely consuming events and forwarding them on to the event processor?

I'm not sure whether I understand your question. You can publish your messages from any source that is able to invoke your lambda event processor handler. For example you can use AWS Kinesis, SQS, SNS as the source of your events or you can invoke your lambda directly. So for example you can have some external application that publishes events and your event processor handles them and publishes them to subscribers.

If your question is mainly about why the event is firstly stored to store (for example DynamoDB) and then is asynchronously processed from DynamoDB stream, it's because you can have hundreds of subscribers for an event and you don't want to send messages to them directly because it can cause your lambda to timeout.

michalkvasnicak avatar Feb 01 '21 15:02 michalkvasnicak

I suspect that @michalkvasnicak has answered your question already, but it sounds like the question you're asking is: "Why bother using a datastore for storing and triggering events, when you could just immediately publish them instead?" And the answer is scalability.

  • As the app you are building grows, perhaps you'll want to publish events from other sources, not just your GraphQL lambda. Meaning you'd then need to trigger the events lambda manually from each source rather than simply writing a row to a DB.
  • As michalkvasnicak indicated, doing it in a single lambda execution could cause timeouts, where spreading it out across multiple lambdas will prevent you from running into this limitation.
  • What if your lambda crashes halfway through execution and your memory store is erased, it's much harder to trace which messages were sent/unsent and re-execute them.

alaycock avatar Feb 01 '21 18:02 alaycock

Hi All!

I know this is diverging off topic a bit. But I'm interested in a similar setup where dynamo is only required to store the subscriptions and not events.

What do you think about async invoke another lambda if you are worried about timeouts/reliability but still want an immediate response?

Then other systems could just do the same, rather than writing to a DB store. (Which I figure it's about the same complexity to call a lambda or write a dynamo record)

Another potential issue (depending on the use case) may be out of order messages.

RyanHow avatar Feb 02 '21 01:02 RyanHow