chrome-har icon indicating copy to clipboard operation
chrome-har copied to clipboard

Include content text in response

Open richard1 opened this issue 7 years ago • 21 comments

entry.response.content.text is missing, so the HAR is missing the response body.

It looks like the Chrome DevTools Protocol allows querying for the response body by requestId, so this could be an approach.

richard1 avatar Mar 13 '18 23:03 richard1

Hi @richard1 I like the idea of getting content, it needs to be done in Browsertime. Now we can multiple accessed with the devtools protocol but I think (if I remember correctly) that we need to upgrade to Selenium 4 to get it in our NodeJS Selenium.

What's your use case @richard1 ? For me I want the plain HTML and don't care about the rest of the content, so maybe making it configurable what to get/keep?

Best Peter

soulgalore avatar Mar 14 '18 13:03 soulgalore

Hi @soulgalore - thanks for the quick response!

Yes, my use case is mainly the plain HTML as well. I'd ideally also like to be able to have external Javascript and JSON resources visible as well, so it'd be great to make it configurable on what responses to keep data for.

richard1 avatar Mar 14 '18 20:03 richard1

Ok, cool. For Firefox we have either all or none, but let me change that next week in Browsertime and then aim to do the same for Chrome when we get the opportunity.

soulgalore avatar Mar 14 '18 20:03 soulgalore

@soulgalore - Would love to see this feature as well. Is it still on the roadmap?

joshuabuildsthings avatar Jun 01 '18 23:06 joshuabuildsthings

@joshuabuildsthings well I still think it would be great but we haven't jacked in the functionality in Browsertime yet. If I understand correctly we will not be able to do it standalone for the Chrome-HAR, we will need to get it through the Chrome API, we need to create an issue in Browsertime (I haven't done that yet).

soulgalore avatar Jun 02 '18 20:06 soulgalore

Getting the response.content.text optionally would be very helpful. Is there any progress here?

rvbyron avatar Oct 16 '18 19:10 rvbyron

@rvbyron (still) waiting on Selenium 4 to be released as stable to be able to get the info from Chrome using Browsertime (using sendDevToolsCommand to the driver).

soulgalore avatar Oct 16 '18 20:10 soulgalore

This would be extremely useful. However it is already possible to capture these events when you use puppeteer (my current workflow). What can be done to bring implementation forward? Should somebody start a PR?

Fohlen avatar Oct 22 '18 09:10 Fohlen

As far as I can tell one would need to include the response data here, https://github.com/sitespeedio/chrome-har/blob/master/index.js#L462 Should I open a PR to get it rolling?

Fohlen avatar Oct 22 '18 10:10 Fohlen

@Fohlen yes please do! Also, make a test case with an attached trace file. We also need to add a property so we switch on/off the functionality since this can make the HAR file huge.

soulgalore avatar Oct 22 '18 13:10 soulgalore

@soulgalore I actually realised that response.body is not part of the actual Chrome DevTools Protocol specification (see https://chromedevtools.github.io/devtools-protocol/tot/Network#type-Response). However it can be accessed via the API (https://chromedevtools.github.io/devtools-protocol/1-3/Network#method-getResponseBody). What I will do now is add a response.body property that arbitrarily maps towards the entry.response.content.text in the HAR. Is that OK?

Fohlen avatar Oct 22 '18 17:10 Fohlen

Hi @Fohlen sorry, I missed answering. So you add a mapping and then in puppeteer you will add those fields so if it's there, we will use it? Yep works fine, just make you add a CLI parameter so getting the body is turned off by default (keeping old behavior), ok?

soulgalore avatar Oct 26 '18 09:10 soulgalore

If you want to get fancy, returning text based on a comma separated list of mimeType would allow us to retrieve text for html, css, json and javascript files, while eliminating jpeg and png files. However, I'll be ecstatic to get text regardless and I can trim the binary converted to text data I don't need.

rvbyron avatar Oct 26 '18 17:10 rvbyron

Being able to choose mimeType would be great. Wouldn't necessary want to totally exclude images though, as I could also see situations where getting the binary data would be useful; for example, auto minification pipelines.

joshuabuildsthings avatar Oct 26 '18 18:10 joshuabuildsthings

@joshuabuildsthings I'm not in any way suggesting to always exclude binary data, I was saying an include parameter containing a list of mimeTypes would allow more finite control to save bandwidth/memory. It would default to all mimeTypes if you don't specify. An exclude parameter could make things easy too. Then to top it all off, those could be regular expressions in case you want to include/exclude all images (e.g. exclude='image/.*').

As for binary data, (you might already be aware) if you go to the debug->network tab and save a har file with content there, you will see that it indeed saves all content including binary data to the text field. So, there is precedent there.

rvbyron avatar Oct 27 '18 15:10 rvbyron

@rvbyron - Sorry for confusion. What you're proposing sounds ideal all around.

joshuabuildsthings avatar Oct 28 '18 04:10 joshuabuildsthings

@soulgalore So, it looks like there was talk above about how to implement this feature, has it been implemented? If so, can you give an example of how to access it? It would be a very handy feature in certain cases.

rvbyron avatar Nov 16 '18 19:11 rvbyron

Hey @rvbyron @Fohlen said he maybe could implement it. For me to implement it: still waiting on Selenium 4 see https://github.com/sitespeedio/chrome-har/issues/8#issuecomment-430381519

soulgalore avatar Nov 16 '18 20:11 soulgalore

hey @soulgalore sorry for not checking back with you in a while. So implementing is quite trivial, but our stack actually moved away from HAR and thus I don't have the time and urge to implement and maintain these changes. @rvbyron if you want to implement you can have a look at my puppeteer code which should give you a fair sample on how it works.

Fohlen avatar Dec 21 '18 16:12 Fohlen

Aren't #41 and #42 fixing it?

AgainPsychoX avatar May 30 '19 18:05 AgainPsychoX

Maybe?, I've not used it with puppeteer. I've added support in Browsertime from 5.0 but then add it to the HAR from the outside, using CDP to get the content.

soulgalore avatar May 30 '19 18:05 soulgalore