safe-transaction-service
safe-transaction-service copied to clipboard
Refactor webhooks to use a queue system
What is needed?
Don't send webhooks from the transaction service. Send them to a queue, so services can subscribe and get the needed webhooks.
Background
- Current webhook implementation requires the tx service to POST to the configured servers.
- Some servers can take some time to answer, blocking the tx service (a quick solution could be reducing the
timeout
). - Some third party services would like to integrate with our webhooks, but that could impact the performance of the tx service.
Proposed solution
- Send webhooks to a queue (rabbitmq, kafka...)
- A new service (webhook service) reads the webhooks and sends them to 3rd party clients
- From that queue, CGW can also read the webhooks
I found this pretty interesting, i'm not an expert on it, but Transaction Service feels a bit overloaded with synchronous work, so moving the webhooks execution responsibilities to another service seems a good idea. I'm not very familiar with RabbitMQ, but using Kafka you can filter/redirect/(de)multiplex any messages. It's an option if we want to have a more flexible solution in order to pipe some webhooks data to CGW and some to a new service which executes "external" webhooks. From my experience, having a self-managed Kafka cluster is pretty ops demanding, if we go with Kafka maybe we could use the AWS-managed one.
But for a first iteration maybe we could just use Rabbit with two separated queues, it's less flexible but it should fit well, and I think it's simpler to implement.
Sounds good @hectorgomezv . Just one clarification
From my experience, having a self-managed Kafka cluster is pretty ops demanding, if we go with Kafka maybe we could use the AWS-managed one.
I didn't know about that, thanks for the info
feels a bit overloaded with synchronous work
Webhooks are handled on a worker specific for webhooks and notifications, so they are asynchronous 😉
I'd like to advocate for this improvement.
Many apps built on top of Safe need to index transactions from the Transaction Service API. Because they don't have webhook access, they're putting stress on the API by constantly polling for new data.
Advocating for this as well. We send several hundred (and growing) requests per minute to see if there are new transactions for the safes our app monitors as we perform real time actions when there are new transactions queued/executed. We spend a lot of time working on not hitting rate limits.
With a webhook, our query volume would be cut down to single / double digit per minute queries with all other data coming via the webhook.
We should decide what library we are going to use https://www.rabbitmq.com/devtools.html#python-dev I think the more used is pika. @Uxio0 works for you?
Pika sounds good
I'm thinking about the strategy of reconnection if the channel is closed or if the connection is closed with RabbitMq
.
As we know the idea is to keep one instance and share connection and channel between all events, that means that every event published is sharing channel and connection.
We could find connection errors on:
- startup, first event create the instance and can't connect with RabbitMq
- publishing an event, we could find that channel is closed or the connection was closed.
During startup we could:
- try to connect waiting for
rabbitMq
coming back - raise the exception and discard the event
Publishing the message:
- try to reconnect, if we can create a new connection publish the message if not discard the message
- keep trying until connection come back a limited number of retries and with jitter strategy
In my opinion we could start the first version of this publisher discarding messages and then if were necessary because there are a lot of disconnections try to add an strategy more complex to handle the reconnections. What do you think @Uxio0 ?
Did you check if there's a connection manager? For example for node.js and amqplib there's https://www.npmjs.com/package/amqp-connection-manager
Did you check if there's a connection manager? For example for node.js and amqplib there's https://www.npmjs.com/package/amqp-connection-manager
Yes I didn't find anything similar that manage the connection by itself, I'm taking a look to geventconnection
of Pika, that allow to configure callbacks to some events like open_connection or close_connection, maybe we can handle the reconnection from this callbacks.
Yes I didn't find anything similar that manage the connection by itself, I'm taking a look to
geventconnection
of Pika, that allow to configure callbacks to some events like open_connection or close_connection, maybe we can handle the reconnection from this callbacks.
Ok, then we will have to implement it ourselves (something similar to what we did with Firebase
)
Yes I didn't find anything similar that manage the connection by itself, I'm taking a look to
geventconnection
of Pika, that allow to configure callbacks to some events like open_connection or close_connection, maybe we can handle the reconnection from this callbacks.Ok, then we will have to implement it ourselves (something similar to what we did with
Firebase
)
GeventConnection
adapter of pika
works with gevent
and let us to have a greenlet
that is running a loop to receive in the callbacks some events asynchronously as disconnection and channel closed, where we can handle connection retries.
I think this is the best approach, at this way we don't need to try to publish an event to see if the connection is ok and the channel is opened.