roadrunner icon indicating copy to clipboard operation
roadrunner copied to clipboard

[🐛 BUG]: Request field ordering is random

Open ITernovtsii opened this issue 1 year ago • 21 comments

No duplicates 🥲.

  • [X] I have searched for a similar issue in our bug tracker and didn't find any solutions.

What happened?

Sending the same HTTP request (multipart/form-data) to Roadrunner is sending randomly ordered parameters to PHP.

It's causing issues with GraphQL specs validation, like in this bundle - https://github.com/overblog/GraphQLBundle/blob/master/src/Request/UploadParserTrait.php#L68

Version (rr --version)

2023.1.4 / 2023.1.5 / 2023.2.0-beta.3

How to reproduce the issue?

  1. Send Request to PHP application:
curl 'http://localhost/' -H 'Accept: application/json' -H 'Content-Type: multipart/form-data; boundary=----BND' \
  --data-raw $'------BND\r\nContent-Disposition: form-data; name="var1"\r\n\r\nfirst\r\n------BND\r\nContent-Disposition: form-data; name="var2"\r\n\r\nsecond\r\n------BND--\r\n'
  1. Observe parameters are randomly displaying var1 or var2 as the first parameter in $buffer variable here: https://github.com/roadrunner-php/goridge/blob/4.x/src/StreamRelay.php#L88

.rr.yaml file content:

version: "3"

rpc:
    listen: tcp://127.0.0.1:6001
server:
    command: "php public/index.php"
    env:
        - APP_RUNTIME: App\Runtime\Runtime
http:
    enable:     true
    address:    0.0.0.0:8080
    max_request_size: 20
    middleware: [ "headers" ]
    uploads:
        forbid: [".php", ".exe", ".bat"]
    headers:
        cors:
            allowed_origin: "*"
            allowed_headers: "*"
            allowed_methods: "GET,POST,PUT,DELETE"
            allow_credentials: true
            exposed_headers: "Cache-Control,Content-Language,Content-Type,Expires,Last-Modified,Pragma"
            max_age: 600
    pool:
        num_workers: 24
        max_jobs: 0
        allocate_timeout: 60s
        destroy_timeout: 60s
        supervisor:
            watch_tick: 1s
            ttl: 1800s
            idle_ttl: 10s
            max_worker_memory: 128
            exec_ttl: 300s
logs:
    mode: production
    level: error
    encoding: json
    file_logger_options:
        log_output: "var/log/rr.log"
        max_size: 50
        max_age: 24
        max_backups: 10
        compress: true

Relevant log output

No response

ITernovtsii avatar Jul 03 '23 21:07 ITernovtsii

Hey @ITernovtsii 👋🏻

  1. There is no enable option for the HTTP plugin.
  2. Have you tried raw_body http plugin option?

rustatian avatar Jul 03 '23 22:07 rustatian

Hi @rustatian

  1. Thanks, I removed this option. Looks like it was not cleared after migration from an older version.
  2. raw_body option didn't help. I enabled it and fields not sent to PHP at all: Screenshot from 2023-07-04 11-10-24 At the same time, if I'm sending a file together with parameters - I see it included in uploads, but still no other fields. Screenshot from 2023-07-04 11-09-21

ITernovtsii avatar Jul 04 '23 08:07 ITernovtsii

I guess it should be a method named getRawBody or something like this.

rustatian avatar Jul 04 '23 08:07 rustatian

To be clear, I know why they're random. To preprocess all the form-data fields, I use a hashmap. In the hashmap the order is not guaranteed.

I double checked the RFC, and yes, the bug is valid and you're right, we should not reorder the results if we treat the http plugin as an intermediary between the user and the actual script. Will be fixed in the first 2023.2.x bugfix release. Thanks for the report 👍🏻

rustatian avatar Jul 04 '23 08:07 rustatian

Ok guys, I was wrong, intermediary in the HTTP context means proxy, gateways or tunnels. RR can't be treated as an intermediary. Also, Go does not keep order of the PostForm or MultipartForm values. As you can see, the Go stdlib also uses hashmap, so, it's not possible to have initially ordered read. I have read RFC's and have not found a single paragraph that says anything about order (except intermediaries) 😢

rustatian avatar Jul 13 '23 14:07 rustatian

Sorry for commenting on an old closed issue, but wanted to include issues I've run into as well (in case others come across I like myself from google).

I was hoping to use roadrunner with my slack bot app. But all the requests started to 401. When i dug into it, its because the mutation of the request body meant I could not longer verify the request against the request signature provided by slack.

I tried the http.raw_body option but that did nothing but cause request body json's keys to be randomly ordered. one request they're in one order, the next in a completely different order.

So for me, roadrunner is a non-starter, which really is unfortunate.

M-Porter avatar Aug 23 '23 07:08 M-Porter

Hey @M-Porter 👋🏻 No problems at all on commenting on a old issue 😃 Yeah, I know this problem. Let me think what can I do. I'll reopen it. It's wrong when a user wants to use a tool and I don't fix the bug ❤️ I'll add it to 2023.3 milestone.

rustatian avatar Aug 23 '23 09:08 rustatian

@M-Porter Could you please provide a sample of the Slack request?

rustatian avatar Aug 23 '23 10:08 rustatian

@rustatian absolutely! Currently away on a business trip so I may not get it in the next couple days so i'll make a git repo with repro steps this weekend

M-Porter avatar Aug 23 '23 16:08 M-Porter

@M-Porter Sure, thank you 👍🏻 P.S.: I think I found a way how to provide constant values order of the multipart requsts 😃 I have plans to release the next 2023.3.0-beta.2 release with this fix next Thursday.

rustatian avatar Aug 23 '23 17:08 rustatian

@M-Porter Friendly ping 😃

rustatian avatar Aug 31 '23 00:08 rustatian

hey @rustatian

Here's the repro

  • For the PHP worker, you can use the example on the rr website, just add echo $request->getBody()->getContents(); above the line $psr7->respond(new Response(200, [], 'Hello RoadRunner!'));.
  • Request
    curl --location --request POST 'http://localhost:8080' \
    --header 'Content-Type: application/x-www-form-urlencoded' \
    --data-urlencode 'token= gIkuvaNzQIHg97ATvDxqgjtO' \
    --data-urlencode 'team_id= T0001' \
    --data-urlencode 'team_domain= example' \
    --data-urlencode 'enterprise_id= E0001' \
    --data-urlencode 'enterprise_name= Globular%20Construct%20Inc' \
    --data-urlencode 'channel_id= C2147483705' \
    --data-urlencode 'channel_name= test' \
    --data-urlencode 'user_id= U2147483697' \
    --data-urlencode 'user_name= Steve' \
    --data-urlencode 'command= /weather' \
    --data-urlencode 'text= 94070' \
    --data-urlencode 'response_url= https://hooks.slack.com/commands/1234/5678' \
    --data-urlencode 'trigger_id= 13345224609.738474920.8088930838d88f008e0' \
    --data-urlencode 'api_app_id= A123456'
    

Running the curl multiple times will lead to various ordering of the data when received by the server. Also to note, i know the timings of the requests are close together, but slowing them down or changing intervals doesn't change the randomness of the data.

2023-08-31T18:34:19+0000	INFO	server      	{"token":" gIkuvaNzQIHg97ATvDxqgjtO","channel_id":" C2147483705","user_name":" Steve","user_id":" U2147483697","team_id":" T0001","enterprise_name":" Globular%20Construct%20Inc","channel_name":" test","enterprise_id":" E0001","response_url":" https://hooks.slack.com/commands/1234/5678","trigger_id":" 13345224609.738474920.8088930838d88f008e0","command":" /weather","api_app_id":" A123456","team_domain":" example","text":" 94070"}
2023-08-31T18:34:19+0000	INFO	http        	http log	{"status": 200, "method": "POST", "URI": "/", "remote_address": "127.0.0.1:50087", "read_bytes": 392, "write_bytes": 17, "start": "2023-08-31T18:34:19+0000", "elapsed": "2.711665ms"}
2023-08-31T18:34:20+0000	INFO	server      	{"team_domain":" example","team_id":" T0001","api_app_id":" A123456","text":" 94070","user_id":" U2147483697","enterprise_id":" E0001","command":" /weather","trigger_id":" 13345224609.738474920.8088930838d88f008e0","token":" gIkuvaNzQIHg97ATvDxqgjtO","enterprise_name":" Globular%20Construct%20Inc","user_name":" Steve","response_url":" https://hooks.slack.com/commands/1234/5678","channel_id":" C2147483705","channel_name":" test"}
2023-08-31T18:34:20+0000	INFO	http        	http log	{"status": 200, "method": "POST", "URI": "/", "remote_address": "127.0.0.1:50088", "read_bytes": 392, "write_bytes": 17, "start": "2023-08-31T18:34:20+0000", "elapsed": "2.801471ms"}
2023-08-31T18:34:21+0000	INFO	server      	{"team_domain":" example","user_name":" Steve","api_app_id":" A123456","token":" gIkuvaNzQIHg97ATvDxqgjtO","channel_name":" test","text":" 94070","channel_id":" C2147483705","trigger_id":" 13345224609.738474920.8088930838d88f008e0","user_id":" U2147483697","enterprise_name":" Globular%20Construct%20Inc","command":" /weather","enterprise_id":" E0001","team_id":" T0001","response_url":" https://hooks.slack.com/commands/1234/5678"}
2023-08-31T18:34:21+0000	INFO	http        	http log	{"status": 200, "method": "POST", "URI": "/", "remote_address": "127.0.0.1:50089", "read_bytes": 392, "write_bytes": 17, "start": "2023-08-31T18:34:21+0000", "elapsed": "2.815061ms"}
2023-08-31T18:34:21+0000	INFO	server      	{"response_url":" https://hooks.slack.com/commands/1234/5678","user_id":" U2147483697","team_domain":" example","enterprise_id":" E0001","command":" /weather","text":" 94070","token":" gIkuvaNzQIHg97ATvDxqgjtO","channel_id":" C2147483705","api_app_id":" A123456","channel_name":" test","team_id":" T0001","trigger_id":" 13345224609.738474920.8088930838d88f008e0","enterprise_name":" Globular%20Construct%20Inc","user_name":" Steve"}
2023-08-31T18:34:21+0000	INFO	http        	http log	{"status": 200, "method": "POST", "URI": "/", "remote_address": "127.0.0.1:50090", "read_bytes": 392, "write_bytes": 17, "start": "2023-08-31T18:34:21+0000", "elapsed": "2.334316ms"}

but one would expect it to always appear as this:

{
  token: "gIkuvaNzQIHg97ATvDxqgjtO"
  team_id: "T0001"
  team_domain: "example"
  enterprise_id: "E0001"
  enterprise_name: "Globular%20Construct%20Inc"
  channel_id: "C2147483705"
  channel_name: "test"
  user_id: "U2147483697"
  user_name: "Steve"
  command: "/weather"
  text: "94070"
  response_url: "https://hooks.slack.com/commands/1234/5678"
  trigger_id: "13345224609.738474920.8088930838d88f008e0"
  api_app_id: "A123456"
}

M-Porter avatar Aug 31 '23 18:08 M-Porter

@M-Porter Thank you 👍🏻

I'm just wondering (and since there is no mentions about order (urlencoded, multipart) in the HTTP standard) why it should be ordered? Basically, I can read these key-value pairs in any (even random) order. Why should it matter? Just curious 😃 If I have the original order, why not resemble (or verify) the original order (order in url)?

rustatian avatar Aug 31 '23 19:08 rustatian

@M-Porter

Ok guys, I was wrong, intermediary in the HTTP context means proxy, gateways or tunnels. RR can't be treated as an intermediary. Also, Go does not keep order of the PostForm or MultipartForm values. As you can see, the Go stdlib also uses hashmap, so, it's not possible to have initially ordered read. I have read RFC's and have not found a single paragraph that says anything about order (except intermediaries) 😢

I also mentioned this here. Even within the standard Go library, it is not possible to read the multipart/urlencoded args in some 'original' order. Because Go stdlib uses (on purpose) hashmaps (associated array) under the hood. And this is not possible to read the data from the hashmap in any order (unless you have an array with order and read keys from this array and values from the hashmap).

EDIT: I'm just trying to understand this case.

rustatian avatar Aug 31 '23 19:08 rustatian

I understand that from Slack:

On each HTTP request that Slack sends, Slack adds an X-Slack-Signature HTTP header. The signature is created by combining the signing secret with the body of the request sent using a standard HMAC-SHA256 keyed hash.

But how is it possible to make this feature without the standard?

However, I have a solution for you. Simple and elegant. Thanks to @roxblnfk ❤️ You can create a super simple Go middleware: sample and then use velox to build your RR binary (in docker or locally). Or you can clone RR and build it yourself: https://roadrunner.dev/docs/customization-build/current/en

So the basic logic is: check the X-Slack-Signature header value, compute that hash from the RAW body, compare, decide what to do. You'll even save a lot more time for the valid requests, since the middleware works on the RR side without touching the workers.

rustatian avatar Aug 31 '23 19:08 rustatian

But how is it possible to make this feature without the standard?

I'm just wondering (and since there is no mentions about order (urlencoded, multipart) in the HTTP standard) why it should be ordered?

I'm not asking you to reinvent the wheel, I'm asking you to not rotate the tires on my car when getting my car washed. Does the standard state that you HAVE to mutate the request? I don't think it would. That would be strange for it to.

I'm asking that roadrunner does not mutate the request body and just pass it along as it was received. Roadrunner is acting as a man-in-the-middle attacker.

So the issue isn't that the map or array is in a different order, it's that the raw request body bytes are different than what they were when roadrunner received them.

If the http request being sent is this:

POST / HTTP/1.1
Host: localhost:8080
Content-Type: application/x-www-form-urlencoded
Content-Length: 24

hello=world&whats=up&a=b

The php application should receive that exactly

POST / HTTP/1.1
Host: localhost:8080
Content-Type: application/x-www-form-urlencoded
Content-Length: 24

hello=world&whats=up&a=b

M-Porter avatar Aug 31 '23 20:08 M-Porter

I've written A LOT of slack applications in go/rust/elixir/php/ruby. Roadrunner is the only application runner which mutates the underlying raw request body.

			body, err := io.ReadAll(r.Body)
			if err != nil {
				log.Error().Err(err).Msg("error reading request body")
				http.Error(w, http.StatusText(http.StatusInternalServerError), http.StatusInternalServerError)
				return
			}
			r.Body = io.NopCloser(bytes.NewBuffer(body))

			reqSig, err := hex.DecodeString(strings.TrimPrefix(r.Header.Get(slackRequestSignature), "v0="))
			if err != nil {
				log.Error().Err(err).Msg("error decoding signature")
				http.Error(w, http.StatusText(http.StatusInternalServerError), http.StatusInternalServerError)
				return
			}
			sigBaseString := fmt.Sprintf("v0:%d:%s", rts.Unix(), string(body))

			hash := hmac.New(sha256.New, []byte(signingSecret))
			hash.Write([]byte(sigBaseString))

			if !hmac.Equal(hash.Sum(nil), reqSig) {
				log.Info().Msg("invalid hash")
				http.Error(w, http.StatusText(http.StatusUnauthorized), http.StatusUnauthorized)
				return
			}
    body = conn.assigns[:raw_body]
    sig_basestring = "v0:#{timestamp}:#{body}"

    generated_signature =
      :crypto.mac(
        :hmac,
        :sha256,
        Application.get_env(:up_next, :slack_signing_secret),
        sig_basestring
      )
      |> Base.encode16(case: :lower)

M-Porter avatar Aug 31 '23 20:08 M-Porter

@M-Porter

Roadrunner is the only application runner which mutates the underlying raw request body.

Mutates - means that the original request would not be the same as what RR provides to the PHP worker. But RR doesn't mutate the request. You have an access to all fields that are present in the original request. So, the keys are the same. Next - values. RR doesn't change the values as well, it just unpacks them and sends them to the worker with the associated keys (performing some JSON escaping, but this is not a RR feature, but JSON standard). So your thesis about 'mutation' is not valid.

Next, about the order. As I mentioned, there is no standard for preserving order. Golang stdlib does not preserve order because --- there is no standard about it.

You can read the values preserving the order using the multipart package (also Go stdlib), but this case should not be solved that way. Instead of blaming RR, I would kindly suggest to understand the problem.

The problem and solution: You should check the body before it is actually parsed. Because Slack hashes the body. As I suggested to you (and you have Golang expertise) you can put the snipped you sent into the middleware and... problem solved.

In the next releases we will move away from JSON to Protobuf and provide the original body without escaping.

rustatian avatar Aug 31 '23 21:08 rustatian

Hi! Also waiting for the fix. We use a library for handling "GraphQL multipart requests" and the spec says that fields are ordered https://github.com/jaydenseric/graphql-multipart-request-spec#multipart-form-field-structure . Because RR reorders multipart/form-data fields Parser module of this GraphQL lib chooses an incorrect execution path and in production one of 7 or 10 file upload requests fails.

maksim-pankov avatar Feb 12 '24 12:02 maksim-pankov

Hey @maksim-pankov 👋 The show-stopper here is that I need to rewrite our current multipart parsing algorithm. This feature unblocks the protobuf payloads (instead of JSON) between RR and PHP. I have plans to dig into this week.

rustatian avatar Feb 12 '24 13:02 rustatian

Thanks for your fast response. I've patched GQL lib, so our users can upload files now, so there is no hurry. We'll wait for the new release with protobuf for IPC.

maksim-pankov avatar Feb 13 '24 12:02 maksim-pankov

Hey everyone 👋 In RR v2024.1.0 (week or two from now), and raw_body: true http option, you'll receive pure and untouched body with any content as it arrived to RR.

rustatian avatar Mar 28 '24 20:03 rustatian