harviewer icon indicating copy to clipboard operation
harviewer copied to clipboard

don't show base64 data to user

Open kmike opened this issue 7 years ago • 12 comments

Hey,

I started to use base64-encoded HAR content recently - it is not possible to guarantee that content can be passed in JSON otherwise, even for content with html or json mime types. HTML can use encoding other than utf-8, and even data which is sent with application/json content-type can be binary if server wants.

But this switch to 'base64 by default' makes it less easy for harviewer: e.g. for HTML both 'Response' and 'HTML' tabs display base64-encoded data. 'Highlighted' gets a decoded version, but for large HTML pages it is very slow. There is a similar issue for JSON files: 'Response' tab displays confusing base64-encoded data. 'Response' tab for images also shows base64 version of the binary data.

I think it is better to either remove tabs with base64-encoded data, or to try decoding it more aggresively. I'm not sure what's the use case for showing base64 to user; user may think it is a bug (which I think happened already for Splash). Also, there is no visual distinction between base64-encoded data and non-base64-encoded data, so e.g. a true base64 response will look the same as a HTML response which HAR generating software encoded to base64 in order to store without data loss.

kmike avatar Aug 15 '16 17:08 kmike

Hi, do you have an example HAR?

gitgrimbo avatar Aug 15 '16 19:08 gitgrimbo

Yep! habr.ru.har.zip

kmike avatar Aug 15 '16 19:08 kmike

Thanks. How was this HAR generated? I don't recognise the following browser ...

    "browser": {
        "comment": "PyQt 5.5.1, Qt 5.5.1",
        "name": "QWebKit",
        "version": "538.1"
    },

or User Agent header:

Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/538.1 (KHTML, like Gecko) server.py Safari/538.1

gitgrimbo avatar Aug 16 '16 07:08 gitgrimbo

@gitgrimbo it was generated using Splash. Splash uses HAR as a data export format; it also embeds harviewer in a script debugging page.

kmike avatar Aug 16 '16 07:08 kmike

ui

kmike avatar Aug 16 '16 07:08 kmike

Thanks. I see what you mean. Pasting image here for reference.

First row shows a base64-encoded HTML response in the Response tab. Second row shows a similar HTML response, but decoded in the Highlighted tab.

harviewer-92-response-and-highlighted-tabs


To discuss a couple of your points:

  • The Response tab currently shows raw content. I'm not sure this default behaviour should be changed in case the user wants to see this raw content, but perhaps a Decoded Response tab could be added, or a Decode button placed in the Response tab.
  • Yes, the Syntax Highlighting can be slow using the current implementation (Syntax Highlighter 3.0.83). When I upgraded to 3.0.83 I considered using a different implementation, but I played safe. If I get time I'll take a look at some alternatives. On the plus side, HAR Viewer didn't originally have a separate Highlighted tab, and so the user had to pay the cost of slow highlighting all the time. Now at least you have the option not to click on the Highlighted tab to avoid the slowdown.

gitgrimbo avatar Aug 16 '16 07:08 gitgrimbo

Thanks for looking at it!

Regarding Response tab: currently it doesn't show raw content of the webpage or raw response content, it shows data stored in HAR JSON as-is. This is not the same as raw response content because 'encoding' HAR argument is not handled (see http://www.softwareishard.com/blog/har-12-spec/#content). This is useful for debugging HAR files, but not for debugging received responses. This base64 encoding is a technical detail of how the data is stored in HAR, not something specific to a website. That's why I think Response tab should show response content; currently it doesn't show it.

kmike avatar Aug 16 '16 08:08 kmike

Yeah I think you're right. So is it true that whenever the encoding field of content is present in a HAR, it should always be decoded (as the encoding was only added by the HAR-creator, and had nothing to do with the original response)?

I'm trying to think of any exception to that rule.

gitgrimbo avatar Aug 16 '16 08:08 gitgrimbo

Yeah, I think it is good to always decode text if encoding is present. The exception could be unknown encoding (only base64 is mentioned is standard). Another tricky case is binary (or any non-utf8) data; it is not clear how to show it in a decoded form.

kmike avatar Aug 16 '16 08:08 kmike

Hi @kmike, I've uploaded this branch for you to try, http://gitgrimbo.github.io/harviewer/issue-92/.

It simply tries to decode every HAR entry for the Response tab.

It does the right thing for the first two HTML entries in your example HAR. But the third seems to have charset issues; the title displays as follows:

harviewer-92-char-encoding-title

And now images and other binary files are also shown in their decoded raw state. I'm not sure if this is a good or bad thing to be honest.

If you could take a look I'd appreciate it, and maybe think of any reasons why every entry should not be decoded in this way as I'm not sure I've thought about all the possibilities here.

gitgrimbo avatar Aug 21 '16 20:08 gitgrimbo

Using the tips from here, https://developer.mozilla.org/en/docs/Web/API/WindowBase64/Base64_encoding_and_decoding, I think the issue is a UTF8/UTF16 thing.

Following the tips the text now renders correctly.

harviewer-92-char-encoding-title-2

gitgrimbo avatar Aug 21 '16 21:08 gitgrimbo

Images should probably be encoded in base64 too. Is there a way to view the decoded base64 with the appropriate type? i.e. if it's a base64-ed image, let the user view the image, if it's base64-ed HTML, let the user view the resulting (decoded) page.

hydrargyrum avatar Aug 29 '21 08:08 hydrargyrum