safe-transaction-service icon indicating copy to clipboard operation
safe-transaction-service copied to clipboard

Refactor webhooks to use a queue system

Open Uxio0 opened this issue 2 years ago • 4 comments

What is needed?

Don't send webhooks from the transaction service. Send them to a queue, so services can subscribe and get the needed webhooks.

Background

  • Current webhook implementation requires the tx service to POST to the configured servers.
  • Some servers can take some time to answer, blocking the tx service (a quick solution could be reducing the timeout).
  • Some third party services would like to integrate with our webhooks, but that could impact the performance of the tx service.

Proposed solution

  • Send webhooks to a queue (rabbitmq, kafka...)
  • A new service (webhook service) reads the webhooks and sends them to 3rd party clients
  • From that queue, CGW can also read the webhooks

Uxio0 avatar Aug 05 '22 09:08 Uxio0

I found this pretty interesting, i'm not an expert on it, but Transaction Service feels a bit overloaded with synchronous work, so moving the webhooks execution responsibilities to another service seems a good idea. I'm not very familiar with RabbitMQ, but using Kafka you can filter/redirect/(de)multiplex any messages. It's an option if we want to have a more flexible solution in order to pipe some webhooks data to CGW and some to a new service which executes "external" webhooks. From my experience, having a self-managed Kafka cluster is pretty ops demanding, if we go with Kafka maybe we could use the AWS-managed one.

But for a first iteration maybe we could just use Rabbit with two separated queues, it's less flexible but it should fit well, and I think it's simpler to implement.

hectorgomezv avatar Aug 05 '22 10:08 hectorgomezv

Sounds good @hectorgomezv . Just one clarification

From my experience, having a self-managed Kafka cluster is pretty ops demanding, if we go with Kafka maybe we could use the AWS-managed one.

I didn't know about that, thanks for the info

feels a bit overloaded with synchronous work

Webhooks are handled on a worker specific for webhooks and notifications, so they are asynchronous 😉

Uxio0 avatar Aug 05 '22 10:08 Uxio0

I'd like to advocate for this improvement.

Many apps built on top of Safe need to index transactions from the Transaction Service API. Because they don't have webhook access, they're putting stress on the API by constantly polling for new data.

isvidler avatar Aug 15 '22 21:08 isvidler

Advocating for this as well. We send several hundred (and growing) requests per minute to see if there are new transactions for the safes our app monitors as we perform real time actions when there are new transactions queued/executed. We spend a lot of time working on not hitting rate limits.

With a webhook, our query volume would be cut down to single / double digit per minute queries with all other data coming via the webhook.

jierlich avatar Aug 24 '22 15:08 jierlich

We should decide what library we are going to use https://www.rabbitmq.com/devtools.html#python-dev I think the more used is pika. @Uxio0 works for you?

moisses89 avatar Apr 26 '23 15:04 moisses89

Pika sounds good

Uxio0 avatar Apr 26 '23 16:04 Uxio0

I'm thinking about the strategy of reconnection if the channel is closed or if the connection is closed with RabbitMq. As we know the idea is to keep one instance and share connection and channel between all events, that means that every event published is sharing channel and connection. We could find connection errors on:

  • startup, first event create the instance and can't connect with RabbitMq
  • publishing an event, we could find that channel is closed or the connection was closed.

During startup we could:

  • try to connect waiting for rabbitMq coming back
  • raise the exception and discard the event

Publishing the message:

  • try to reconnect, if we can create a new connection publish the message if not discard the message
  • keep trying until connection come back a limited number of retries and with jitter strategy

In my opinion we could start the first version of this publisher discarding messages and then if were necessary because there are a lot of disconnections try to add an strategy more complex to handle the reconnections. What do you think @Uxio0 ?

moisses89 avatar May 11 '23 13:05 moisses89

Did you check if there's a connection manager? For example for node.js and amqplib there's https://www.npmjs.com/package/amqp-connection-manager

Uxio0 avatar May 11 '23 14:05 Uxio0

Did you check if there's a connection manager? For example for node.js and amqplib there's https://www.npmjs.com/package/amqp-connection-manager

Yes I didn't find anything similar that manage the connection by itself, I'm taking a look to geventconnection of Pika, that allow to configure callbacks to some events like open_connection or close_connection, maybe we can handle the reconnection from this callbacks.

moisses89 avatar May 12 '23 09:05 moisses89

Yes I didn't find anything similar that manage the connection by itself, I'm taking a look to geventconnection of Pika, that allow to configure callbacks to some events like open_connection or close_connection, maybe we can handle the reconnection from this callbacks.

Ok, then we will have to implement it ourselves (something similar to what we did with Firebase)

Uxio0 avatar May 12 '23 09:05 Uxio0

Yes I didn't find anything similar that manage the connection by itself, I'm taking a look to geventconnection of Pika, that allow to configure callbacks to some events like open_connection or close_connection, maybe we can handle the reconnection from this callbacks.

Ok, then we will have to implement it ourselves (something similar to what we did with Firebase)

GeventConnection adapter of pika works with gevent and let us to have a greenlet that is running a loop to receive in the callbacks some events asynchronously as disconnection and channel closed, where we can handle connection retries. I think this is the best approach, at this way we don't need to try to publish an event to see if the connection is ok and the channel is opened.

moisses89 avatar May 15 '23 11:05 moisses89