sentry-php icon indicating copy to clipboard operation
sentry-php copied to clipboard

Relay spawns automatically

Open smeubank opened this issue 2 years ago • 5 comments

Before we figure out how our PHP SDK can communicate with relay running as a sidecar, or what we want that instance of relay to do exactly, we need to find a way to have it spawn automatically.

The general discussion around this should be to answer:

How does the user get to it?

  • getting the sidecar and starting it up
    • in PHP it could download the code and spawn a process
      • but isn't that like a malware?
    • most/many on the market follow a similar pattern
      • a daemon running, which the user must install on the server and a PHP
        • extension needed to get deeper hooks into the language
        • is there an understanding of the lifetime of the extension?
        • What is the capability it has to hold onto things?

Further questions

  1. how do we daemonize relay?
    • might need relay per upstream, not per DSN but per sentry instance
    • Is this a process or is this an extension?
    • likely both: process as daemon to keep state, and extension to hook into language internals
  2. how does it know which binary to get?
    • should we ship the binary with the SDK?
      • how is this being solved elsewhere on the market?
    • should relay be standard ambient, or an extension to fetch?

smeubank avatar Apr 21 '22 13:04 smeubank

How does the user get to it?

* getting the sidecar and starting it up
  
  * in PHP it could download the code and spawn a process
    
    * but isn't that like a malware?

Yes it is. And this exact "feature" is what created the fork inside Swoole (into OpenSwoole), exactly because it's a security ~nightmare~ issue. See https://github.com/swoole/swoole-src/issues/4434

    * a daemon running, which the user must install on the server and a PHP
      
      * extension needed to get deeper hooks into the language
      * _is there an understanding of the lifetime of the extension?_
      * _What is the capability it has to hold onto things?_

An extension is the way that APM like NewRelic use, and it spawns another process, so it has basically open capabilities.

Further questions

1. how do we daemonize relay?
   
   * might need relay per upstream, not per DSN but per sentry instance
   * _Is this a process or is this an extension?_
   * likely both: process as daemon to keep state, and extension to hook into language internals

Why per-upstream? Agree that the extension gives you the possibility to hook into language internals.

2. how does it know which binary to get?
   
   * should we ship the binary with the SDK?

I would avoid that. Also: if you just need Relay, can't you just say that you need it deployed and that's all? As a sidecar i.e. in a k8s environment, I would need a container to put in the pod, alongside PHP-FPM.

     * how is this being solved elsewhere on the market?
   * should relay be standard ambient, or an extension to fetch?

An extension is another possible way to proceed, but it should not be to heavyweight. Maybe we could leverage FFIs too?

Jean85 avatar Apr 21 '22 13:04 Jean85

in PHP it could download the code and spawn a process

Please, no! Whatever thing downloads and installs things in the background without the user consent should be treated as a security issue and an unwanted behaviour. It's not important who is the owner of what, it's simply wrong. I wouldn't trust ever something that I find out it downloaded something on my behalf, without telling me it.

a daemon running, which the user must install on the server and a PHP

I don't have a full picture of what you are trying to achieve, but I don't understand why we need Relay. IIRC, the main point that led to thinking that an extension may be useful was that PHP is not asyncronous and so it cannot send events in the background like other languages do (Go, Java, etc). If that's the main driver for creating such extension, then I would expect it to simply implement the sending of events so that it can be done in the background, no matter what is the server at the other side

ste93cry avatar Apr 21 '22 19:04 ste93cry

HI @Jean85 thanks for the feedback!

Yes it is. And this exact "feature" is what created the fork inside Swoole (into OpenSwoole), exactly because it's a security nightmare issue. See https://github.com/swoole/swoole-src/issues/4434

i put there more to call out something we WON'T do, for exactly these reasons

An extension is the way that APM like NewRelic use, and it spawns another process, so it has basically open capabilities.

this was referenced as well, i think it provides a good example of how we might want to do it

Why per-upstream?

this was a point brought up by @jan-auer but actually I would take this out for now, as now part of the POC

if you just need Relay, can't you just say that you need it deployed and that's all?

An extension is another possible way to proceed, but it should not be to heavyweight. Maybe we could leverage FFIs too?

for both of these points: we want to reduce the need users would have to run it to a bare minimum, and it more closely resemble the solution we have for AWS Lambda Extensions https://github.com/getsentry/sentry-python/milestone/4 where we run relay in proxy mode

@ste93cry

Please, no!

agreed :) was writing up a summary of a meeting we had on this, and wanted to write that with the malware point that we won't do it, I'll edit the original to be more explicit

I don't have a full picture of what you are trying to achieve,

so we want to have sidecars available for server-side high throughput environments to relieve some stress from the applications themselves. Then aggregation logic and sending to sentry can be abstracted away. For a language like PHP due to it memory constrains there's already more obvious value. Then there are a number of other features like auto session tracking, 429 rate limiting, and client reports which would also become possible in PHP world. We don't want to solve all those problems yet though. For now we just focus on relay as a sidecar, and that it should be lightweight in size and difficulty to setup, which then for the 429 issue linked would also cut down those round trips.

Then the larger goal, is to start with PHP (well...we technically already started with AWS Lambda) since there are clear problems we might be able to tackle. Then we would have a solution that we could scale to other platforms with high throughput requirements. However we want to tackle first with this issue just how best to get this sidecar running for PHP applications

If y'all are interested I would be happy to include you in a call together with some folks from Sentry and @stayallive

smeubank avatar Apr 21 '22 22:04 smeubank

If the end users will still have the option to deploy Relay manually, that would be a great strategy IMHO. In that way, I can make the conscious trade-off between deploying it locally (like inside the same pod in k8s) or "globally", once for all my infra, depending on the network topology and distribution.

Jean85 avatar Apr 22 '22 06:04 Jean85

How does the user get to it?

  • getting the sidecar and starting it up
    • in PHP it could download the code and spawn a process
      • but isn't that like a malware?

I think we all agree, this might not be the best idea to pursue, since that very much borders malware but also has technical challenges with race conditions and every request needing to check if a process is live and otherwise spawn it might also have performance downsides.

  • most/many on the market follow a similar pattern
    • a daemon running, which the user must install on the server and a PHP
      • extension needed to get deeper hooks into the language

Yes this is indeed how Datadog, New Relic & Tideways function (which I believe are the more popular options available for PHP). They have a daemon/proxy application running, usually one instance per server that accepts reports from the PHP extension. You can compare the daemon/proxy process with how Relay functions at the moment and it's used to offload potentially slow HTTP traffic to a service like Sentry

  - _is there an understanding of the lifetime of the extension?_

There is documentation where the lifetime of a PHP extension is described: https://wiki.php.net/internals/extensions#extensions_lifetime.

image

  - _What is the capability it has to hold onto things?_

As far as I can tell from the document linked above the extension has a similar lifetime of a PHP request (it can do things before and after) but because of the single threaded nature of PHP there is not much use of keeping much state inside the extension since it's at most shared with a single PHP process which potentially only lives for a single request.

Further questions

  1. how do we daemonize relay?

In PHP land if we run daemons (like queue workers) it's usually left up to the user how to do it but supervisord is a very common tool for this.

Another option which other APM tools employ is to have (linux) packages available that install register the daemon as a service so it starts up with the host machine.

  • might need relay per upstream, not per DSN but per sentry instance

This is an interesting problem, but that might be up to the user to have a Relay instance per Sentry instance, so far people usually don't use multiple Sentry instances inside a single application so the use case for having multiple relays on the same host is probably only required for a few people or people that run multiple applications on a single host.

  • Is this a process or is this an extension?

Considering the information above, this is a process and not an extension.

  • likely both: process as daemon to keep state, and extension to hook into language internals

Also yes, but for the scope we are looking into it's not needed to hook into language internals (although to improve the APM experience this will certainly be needed in the future).

  1. how does it know which binary to get?
    • should we ship the binary with the SDK?

I don't think that is a good idea considering how PHP ships it's SDK, we could however ship an easy tool or CLI helper that pulls down the binary if needed which might be convienient, but that could also be a install script hosted by Sentry instead of bundled by the SDK.

 - how is this being solved elsewhere on the market? 

As mentioned above, using install scripts or with a (linux) package. If wanted I could compile a list with installation instructions (or links to them) for the other APM solutions.

  • should relay be standard ambient, or an extension to fetch?

That one I don't know 😅

I hope this helps @smeubank, let me know if I should clarify or expand on some items here!

stayallive avatar Apr 26 '22 14:04 stayallive

Closing this for now, we might revisit this in the future.

cleptric avatar Oct 20 '22 15:10 cleptric