playwright icon indicating copy to clipboard operation
playwright copied to clipboard

[BUG] Har files recorded using webkit / chromium are missing postData information

Open jcaracciolo opened this issue 2 years ago • 15 comments

System info

  • Playwright Version: [v1.38.1 and older]
  • Operating System: [Windows, MacOS]
  • Browser: [Chromium, WebKit]
  • Other info:

Source code

  • [X] I provided exact source code that allows reproducing the issue locally.

Code Example

      context.routeFromHAR(this.harPath, {
          update: true,
          updateContent: 'embed'
      });

Steps

  • Setup a context with routeFromHar function being called with the update flag
  • Run a test with one or more POST requests and a body using chromium or webkit

Expected

Har files should have postData value in the POST requests recorded as they do when using firefox. This helps in the future differentiate between POST requests with the same url when replaying the har functionality.

Resulting HAR generated using firefox

       {
         "method": "POST",
          "url": ----?market=US&language=en-US&,
          "httpVersion": "HTTP/2.0",
          "cookies": [],
          "headers": [
            { "name": ":authority", "value": ----},
            { "name": ":method", "value": "POST" },
            { "name": ":path", "value": ----?market=US&language=en-US& },
            { "name": ":scheme", "value": "https" },
            { "name": "accept", "value": "*/*" },
            { "name": "accept-encoding", "value": "gzip, deflate, br" },
            { "name": "accept-language", "value": "en-US" },
            { "name": "content-length", "value": "224" },
            { "name": "content-type", "value": "application/json" },
          ],
          "queryString": [
            {
              "name": "market",
              "value": "US"
            },
            {
              "name": "language",
              "value": "en-US"
            },
          ],
          "headersSize": -1,
          "bodySize": -1,
          "postData":  {
            "mimeType": "application/json",
            "text": xxxx
             "params": []
          }
        },

Actual

Har file has missing the postData value, if there are multiple POST requests to the same resource when replaying the har it will only resolve in the first one.

Resulting HAR generated using Chromium or webkit

       {
         "method": "POST",
          "url": ----?market=US&language=en-US&,
          "httpVersion": "HTTP/2.0",
          "cookies": [],
          "headers": [
            { "name": ":authority", "value": ----},
            { "name": ":method", "value": "POST" },
            { "name": ":path", "value": ----?market=US&language=en-US& },
            { "name": ":scheme", "value": "https" },
            { "name": "accept", "value": "*/*" },
            { "name": "accept-encoding", "value": "gzip, deflate, br" },
            { "name": "accept-language", "value": "en-US" },
            { "name": "content-length", "value": "224" },
            { "name": "content-type", "value": "application/json" },
          ],
          "queryString": [
            {
              "name": "market",
              "value": "US"
            },
            {
              "name": "language",
              "value": "en-US"
            },
          ],
          "headersSize": -1,
          "bodySize": -1
        },

Related Issues

Most likely it is due to something related to these issues, but i see them happening on normal 'application/json' payloads. request.postData() resolves to null when page uses fetch of FormData with a Blob #24077 Request object does not contain postData for file/blob #6479

jcaracciolo avatar Oct 27 '23 22:10 jcaracciolo

To triager: this looks like an internal customer in case of follow-up questions.

pavelfeldman avatar Oct 27 '23 22:10 pavelfeldman

https://bugs.chromium.org/p/chromium/issues/detail?id=1058404 Might be related to this issue as well

jcaracciolo avatar Oct 27 '23 23:10 jcaracciolo

Can we get something we could use to reproduce this locally?

pavelfeldman avatar Oct 28 '23 17:10 pavelfeldman

Might be a dupe of https://github.com/microsoft/playwright/issues/6479?

mxschmitt avatar Oct 29 '23 09:10 mxschmitt

This functionality is covered with the tests here: https://github.com/microsoft/playwright/blob/7bab19d807e1999e521b8e10fccfccf34cf2a741/tests/library/har.spec.ts#L150. So we are probably hitting some edge-case. Would be nice to be able to reproduce it.

pavelfeldman avatar Oct 30 '23 21:10 pavelfeldman

I setup a new repo, and was not able to repro until i tested what was said on the issue https://bugs.chromium.org/p/chromium/issues/detail?id=1058404

Here is the repro: https://github.com/jcaracciolo/playwright-postdata-issue/blob/main

Whenever a Request is doing .clone(), everything in the website works as expected, however when recording a har file, postData is missing. This may be the issue, our codebase is pretty big so figuring out if and where this is happening may be troublesome, specially if it is happening on some dependency.

It seems to be unrelated to playwright then, but maybe it serves as a reference for people looking out for this issue. Would be nice to have a workaround.

jcaracciolo avatar Oct 31 '23 03:10 jcaracciolo

I'm not sure there is a good workaround, but thanks for the confirmation and for sharing the repro. We'll ping Chromium folks on the upstream issue.

pavelfeldman avatar Oct 31 '23 22:10 pavelfeldman

Found a workaround for the clone issue that i uploaded to the example. An Init script with this code would get around that issue:

  await context.addInitScript({
    path: path.resolve(__dirname, 'initScript.js'),
  });

InitScript

var oldRequest = Request;
Request = function(url, config) {
    const req = new oldRequest(url, config)
    req.clone = () => {
        return new Request(url, config);
    }
    return req;
};
Request.prototype = oldRequest.prototype;
Request.prototype.constructor = Request;

This is not ideal because data is being saved twice but if it could be useful until the underlying bug is fixed.

I am still seeing the issue on my internal repository, but it should be due to another type of request modification that I am still trying to pin down.

jcaracciolo avatar Nov 20 '23 18:11 jcaracciolo

The Chromium issue was just updated and closed - apparently it's working as expected!

https://bugs.chromium.org/p/chromium/issues/detail?id=1058404#c21

It looks to me it is working as expected (at least from the CDP perspective). The postData gets encoded as a kEncodedBlob after it is cloned so it is not available as the string at the time of the event. Decoding a blob of data is potentially an expensive operation that requires asynchronous processing, therefore, the flag is set that the request has post data but the post data attribute is omitted. In this case, the client should call to https://chromedevtools.github.io/devtools-protocol/tot/Network/#method-getRequestPostData to fetch and decode post data. The expectation is encoded in a test here https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/web_tests/http/tests/inspector-protocol/network/get-request-blob-data.js;l=4;drc=77578ccb4082ae20a9326d9e673225f1189ebb63 I am not sure why the data is converted into an encoded blob on cloning but I assume there are reasons for this (at least it is not done by CDP but seems to happen in the browser).

I just tested it, and indeed postData is missing, but hasPostData is true. Calling Network.getRequestPostData for this request properly returns the data.

I'm not quite sure however where in playwright would be the appropriate place to fetch the decoded postData.

olivierbeaulieu avatar Dec 27 '23 19:12 olivierbeaulieu

This PR into Puppeteer looks relevant: https://github.com/puppeteer/puppeteer/pull/11598/commits/2125e312b3196df248cbf810e30ad6153e4c4f2c

Would this make sense to be ported over to Playwright?

ElliotChong-MS avatar Jan 08 '24 19:01 ElliotChong-MS

I managed to create a reproduction of the issue here: https://github.com/microsoft/playwright/compare/main...olivierbeaulieu:playwright:postdata-clone

I've also tried to implement the solution, which almost works, but there's a bug to resolve.

Basically we need to call Network.getRequestPostData, which is async. However, if Fetch.enable hasn't been called and Playwright is not currently intercepting requests, it's possible that Network.responseReceived will be received before Network.getRequestPostData is completed, in which case things can break

Any guidance on what the right approach would be here @pavelfeldman?

olivierbeaulieu avatar Apr 24 '24 03:04 olivierbeaulieu

FYI, this issue seems to have been solved in Chrome 125.0.6422.26? I cannot find any release notes from Chrome on that subject, but the Chrome version shipped in Playwright 1.44.0 now contains postData in recorded HARs.

olivierbeaulieu avatar May 29 '24 16:05 olivierbeaulieu

FYI, this issue seems to have been solved in Chrome 125.0.6422.26? I cannot find any release notes from Chrome on that subject, but the Chrome version shipped in Playwright 1.44.0 now contains postData in recorded HARs.

Indeed, it now contains postData but not the whole payload. It seems it contains only the first chunk or something like that.

ppath avatar Jun 04 '24 13:06 ppath

Indeed, it now contains postData but not the whole payload. It seems it contains only the first chunk or something like that.

Weird, I've been getting the whole payload (both JSON bodies and form/multipart bodies).

olivierbeaulieu avatar Jun 05 '24 14:06 olivierbeaulieu

@jcaracciolo

I observed this issue when sent POST request with header : { "name": "Content-Type", "value": "application/json" },

Issue got resolved when used header name case sensitive { "name": "content-type", "value": "application/json" }

pm4617 avatar Aug 20 '24 18:08 pm4617