Update browser logs in crawl results
@JannisBush raised an idea to run privacy analysis based on the browser audits and console logs.
We are trying to check if the data is available in console_log and audit_issues in payload.
Console Domain has been deprecated, we used messageAdded event.
The uptodate data will be available in Log and Runtime domains:
- Runtime events
executionContextCreatedandexecutionContextDestroyed. The Runtime.consoleAPICalled is currently missing. - Log events are currently being processed.
Suggestions:
- add
Runtime.consoleAPICalledevents toconsole_log - deprecate
Consoleevents
Analysts then will be able to use it with custom_metrics or directly with SQL.
@pmeenan does this make sense? Do you have any concerns?
Does Runtime.consoleAPICalled log something that Log.enable doesn't currently catch? All of the logging events should already be captured by the existing Log.entryAdded events (Console is still there just for backwards compatibility)
Log.enable collects the browser-caused messages, e.g., messages about invalid headers such as (Error with Permissions-Policy header: Unrecognized feature: 'abc').
Runtime.consoleAPICalled collects the developer-caused messages from the console API, e.g., console.log("Log!").
The deprecated Console.messageAdded also collects the messages from the console API, however, with less information. For example, Console.messageAdded always reports source=console-api whereas Runtime.consoleAPICalled distinguishes type=log, debug, info, error, warning, dir, ..., also Runtime.consoleAPICalled contains a stackTrace and a timestamp not contained in Console.messageAdded
(according to the documentation it should also collect the browser-caused messages, but it seems like it does not collect them anymore at least if Log.enable is activated).
I agree with the suggestion to replace Console.messageAdded with Runtime.consoleAPICalled.
I would also suggest to separate the two types of messages into two separate arrays console_api and browser_logs instead of using one shared console_log array. While it is possible to separate the two during processing as the entries in the array have different formats, I would assume that most people are only interested in either the browser-caused messages or the developer-caused messages at a time and would not need to filter for one subset each time when they use the logs.
So, if we do this then I probably want to keep both because the format of the consoleAPICalled data will be quite a bit harder to process.
Breaking the browser logs out into a separate browser_logs makes total sense and I can do that irregardless.
Is there significant value in having the stack trace and timestamp because that will balloon the data pretty significantly? for example:
"console_api": [
{
"type": "log",
"args": [
{
"type": "string",
"value": "Console.log message"
}
],
"executionContextId": 1,
"timestamp": 1748958767619.001,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 6,
"columnNumber": 20
}
]
}
},
{
"type": "table",
"args": [
{
"type": "object",
"className": "Object",
"description": "Object",
"objectId": "3737340583851650503.1.1",
"preview": {
"type": "object",
"description": "Object",
"overflow": false,
"properties": [
{
"name": "Time Stamp",
"type": "number",
"value": "1748958767619"
},
{
"name": "OS",
"type": "string",
"value": "Linux x86_64"
},
{
"name": "Browser",
"type": "string",
"value": "Mozilla"
},
{
"name": "Language",
"type": "string",
"value": "en-US"
}
]
}
}
],
"executionContextId": 1,
"timestamp": 1748958767619.162,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 7,
"columnNumber": 20
}
]
}
},
{
"type": "startGroup",
"args": [
{
"type": "string",
"value": "URL Info"
}
],
"executionContextId": 1,
"timestamp": 1748958767619.284,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 13,
"columnNumber": 20
}
]
}
},
{
"type": "log",
"args": [
{
"type": "string",
"value": "Protocol"
},
{
"type": "string",
"value": "https:"
}
],
"executionContextId": 1,
"timestamp": 1748958767619.345,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 14,
"columnNumber": 20
}
]
}
},
{
"type": "log",
"args": [
{
"type": "string",
"value": "Host"
},
{
"type": "string",
"value": "https:\/\/test.patrickmeenan.com"
}
],
"executionContextId": 1,
"timestamp": 1748958767619.371,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 15,
"columnNumber": 20
}
]
}
},
{
"type": "log",
"args": [
{
"type": "string",
"value": "Path"
},
{
"type": "string",
"value": "\/console\/"
}
],
"executionContextId": 1,
"timestamp": 1748958767619.388,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 16,
"columnNumber": 20
}
]
}
},
{
"type": "startGroupCollapsed",
"args": [
{
"type": "string",
"value": "Meta Info"
}
],
"executionContextId": 1,
"timestamp": 1748958767619.399,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 17,
"columnNumber": 20
}
]
}
},
{
"type": "log",
"args": [
{
"type": "string",
"value": "Date Fetched"
},
{
"type": "number",
"value": 1748958767619,
"description": "1748958767619"
}
],
"executionContextId": 1,
"timestamp": 1748958767619.409,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 18,
"columnNumber": 24
}
]
}
},
{
"type": "log",
"args": [
{
"type": "string",
"value": "OS"
},
{
"type": "string",
"value": "Linux x86_64"
}
],
"executionContextId": 1,
"timestamp": 1748958767619.426,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 19,
"columnNumber": 24
}
]
}
},
{
"type": "log",
"args": [
{
"type": "string",
"value": "Browser"
},
{
"type": "string",
"value": "Mozilla"
}
],
"executionContextId": 1,
"timestamp": 1748958767619.454,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 20,
"columnNumber": 24
}
]
}
},
{
"type": "log",
"args": [
{
"type": "string",
"value": "Language"
},
{
"type": "string",
"value": "en-US"
}
],
"executionContextId": 1,
"timestamp": 1748958767619.472,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 21,
"columnNumber": 24
}
]
}
},
{
"type": "endGroup",
"args": [
{
"type": "string",
"value": "console.groupEnd"
}
],
"executionContextId": 1,
"timestamp": 1748958767619.491,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 22,
"columnNumber": 20
}
]
}
},
{
"type": "endGroup",
"args": [
{
"type": "string",
"value": "console.groupEnd"
}
],
"executionContextId": 1,
"timestamp": 1748958767619.507,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 23,
"columnNumber": 20
}
]
}
},
{
"type": "log",
"args": [
{
"type": "string",
"value": "%cHello World!"
},
{
"type": "string",
"value": "color: #f709bb; font-family: sans-serif; text-decoration: underline;"
}
],
"executionContextId": 1,
"timestamp": 1748958767619.521,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 24,
"columnNumber": 20
}
]
}
},
{
"type": "timeEnd",
"args": [
{
"type": "string",
"value": "time: 72.410888671875 ms"
}
],
"executionContextId": 1,
"timestamp": 1748958767691.975,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 33,
"columnNumber": 20
}
]
}
},
{
"type": "assert",
"args": [
{
"type": "object",
"className": "Object",
"description": "Object",
"objectId": "3737340583851650503.1.2",
"preview": {
"type": "object",
"description": "Object",
"overflow": false,
"properties": [
{
"name": "num",
"type": "number",
"value": "3"
}
]
}
},
{
"type": "string",
"value": "the # is not even"
}
],
"executionContextId": 1,
"timestamp": 1748958767692.142,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 36,
"columnNumber": 24
}
]
}
},
{
"type": "assert",
"args": [
{
"type": "object",
"className": "Object",
"description": "Object",
"objectId": "3737340583851650503.1.3",
"preview": {
"type": "object",
"description": "Object",
"overflow": false,
"properties": [
{
"name": "num",
"type": "number",
"value": "5"
}
]
}
},
{
"type": "string",
"value": "the # is not even"
}
],
"executionContextId": 1,
"timestamp": 1748958767692.235,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 36,
"columnNumber": 24
}
]
}
},
{
"type": "count",
"args": [
{
"type": "string",
"value": "default: 1"
}
],
"executionContextId": 1,
"timestamp": 1748958767692.38,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 40,
"columnNumber": 24
},
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 39,
"columnNumber": 20
}
]
}
},
{
"type": "count",
"args": [
{
"type": "string",
"value": "default: 2"
}
],
"executionContextId": 1,
"timestamp": 1748958767692.414,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 40,
"columnNumber": 24
},
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 39,
"columnNumber": 20
}
]
}
},
{
"type": "count",
"args": [
{
"type": "string",
"value": "default: 3"
}
],
"executionContextId": 1,
"timestamp": 1748958767692.429,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 40,
"columnNumber": 24
},
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 39,
"columnNumber": 20
}
]
}
},
{
"type": "count",
"args": [
{
"type": "string",
"value": "default: 4"
}
],
"executionContextId": 1,
"timestamp": 1748958767692.45,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 40,
"columnNumber": 24
},
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 39,
"columnNumber": 20
}
]
}
},
{
"type": "count",
"args": [
{
"type": "string",
"value": "default: 5"
}
],
"executionContextId": 1,
"timestamp": 1748958767692.463,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 40,
"columnNumber": 24
},
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 39,
"columnNumber": 20
}
]
}
},
{
"type": "count",
"args": [
{
"type": "string",
"value": "default: 6"
}
],
"executionContextId": 1,
"timestamp": 1748958767692.48,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 40,
"columnNumber": 24
},
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 39,
"columnNumber": 20
}
]
}
},
{
"type": "count",
"args": [
{
"type": "string",
"value": "default: 7"
}
],
"executionContextId": 1,
"timestamp": 1748958767692.496,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 40,
"columnNumber": 24
},
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 39,
"columnNumber": 20
}
]
}
},
{
"type": "count",
"args": [
{
"type": "string",
"value": "default: 8"
}
],
"executionContextId": 1,
"timestamp": 1748958767692.509,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 40,
"columnNumber": 24
},
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 39,
"columnNumber": 20
}
]
}
},
{
"type": "trace",
"args": [
{
"type": "string",
"value": "console.trace"
}
],
"executionContextId": 1,
"timestamp": 1748958767692.525,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 42,
"columnNumber": 20
}
]
}
}
],
Here is a comparison of the old Console.messageAdded and Runtime.consoleAPICalled message for console.assert(5 % 2 === 0, { 5 }, 'the # is not even');
Yes, the old format is much shorter but also contains some issues:
- Not clear that it came from an assert (
level=errorvstype=assert) [object Object]is not really helpful- Line and column numbers are not the same as in the other event. However, this might be a off-by-one bug in
Runtime.consoleAPICalled? view-source:https://test.patrickmeenan.com/console/ looks like line 37 is correct?
I agree that saving the full Runtime.consoleAPICalled events might get out of hand both for storage and for processing:
- Not sure if the stacktrace is helpful. For most analysis it is probably not necessary, but it could enable answering some advance questions.
- The timestamp is only a single field, not sure if helpful, but should not hurt to keep.
- The
textis probably often more helpful thanargs. - If we do not have storage issues how about saving the full messages in the following format:
type, text, executionContextID, timestamp, (context), args, (stackTrace). This would enable even advanced analyses but most simple queries can only usetypeandtextand simple ignoreargsandstackTraceand the other entries. - For generating
textwe could either use the oldConsole.messageAddedevents (but we would need to match them to their correspondingRuntime.consoleAPICalledevents) or create a useful text entry from the args directly. Probably something likeJSON.stringify(args). - Here is the C++ code creating the
textfrom the args forConsole.messageAdded: https://source.chromium.org/chromium/chromium/src/+/main:v8/src/inspector/v8-console-message.cc;l=455;drc=aaceb0ff7dcbc44c661bae12fba36279b3dee7c4;bpv=1;bpt=1 - Here is how Playwright converts the args to a more useful text representation: https://github.com/microsoft/playwright/blob/9cfbf4f7912341142c749302046defabaa5e5149/packages/playwright-core/src/server/electron/electron.ts#L117 + https://github.com/microsoft/playwright/blob/990fb68f73adc481319fba419e5fd3131416657a/packages/playwright-core/src/server/console.ts#L46
{
"source": "console-api",
"level": "error",
"text": "[object Object] the # is not even",
"url": "https://test.patrickmeenan.com/console/",
"line": 37,
"column": 25
}
{
"type": "assert",
"args": [
{
"type": "object",
"className": "Object",
"description": "Object",
"objectId": "3737340583851650503.1.3",
"preview": {
"type": "object",
"description": "Object",
"overflow": false,
"properties": [
{
"name": "num",
"type": "number",
"value": "5"
}
]
}
},
{
"type": "string",
"value": "the # is not even"
}
],
"executionContextId": 1,
"timestamp": 1748958767692.235,
"stackTrace": {
"callFrames": [
{
"functionName": "",
"scriptId": "7",
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"lineNumber": 36,
"columnNumber": 24
}
]
}
},
The timestamps aren't relative to anything else in the test data so I'm hesitant to waste space with them but I can add them if necessary. Order is maintained so they shouldn't be needed for ordering.
How about something like this:
"console_api": [
{
"type": "log",
"text": "Console.log message",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "6:20"
}
},
{
"type": "table",
"text": "[Object]",
"data": [
{
"Time Stamp": "1748977603947",
"OS": "Linux x86_64",
"Browser": "Mozilla",
"Language": "en-US"
}
],
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "7:20"
}
},
{
"type": "startGroup",
"text": "URL Info",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "13:20"
}
},
{
"type": "log",
"text": "Protocol https:",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "14:20"
}
},
{
"type": "log",
"text": "Host https:\/\/test.patrickmeenan.com",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "15:20"
}
},
{
"type": "log",
"text": "Path \/console\/",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "16:20"
}
},
{
"type": "startGroupCollapsed",
"text": "Meta Info",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "17:20"
}
},
{
"type": "log",
"text": "OS Linux x86_64",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "19:24"
}
},
{
"type": "log",
"text": "Browser Mozilla",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "20:24"
}
},
{
"type": "log",
"text": "Language en-US",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "21:24"
}
},
{
"type": "endGroup",
"text": "console.groupEnd",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "22:20"
}
},
{
"type": "endGroup",
"text": "console.groupEnd",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "23:20"
}
},
{
"type": "log",
"text": "%cHello World! color: #f709bb; font-family: sans-serif; text-decoration: underline;",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "24:20"
}
},
{
"type": "timeEnd",
"text": "time: 68.0859375 ms",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "33:20"
}
},
{
"type": "assert",
"text": "[Object] the # is not even",
"data": [
{
"num": "3"
}
],
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "36:24"
}
},
{
"type": "assert",
"text": "[Object] the # is not even",
"data": [
{
"num": "5"
}
],
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "36:24"
}
},
{
"type": "count",
"text": "default: 1",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "40:24"
}
},
{
"type": "count",
"text": "default: 2",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "40:24"
}
},
{
"type": "count",
"text": "default: 3",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "40:24"
}
},
{
"type": "count",
"text": "default: 4",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "40:24"
}
},
{
"type": "count",
"text": "default: 5",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "40:24"
}
},
{
"type": "count",
"text": "default: 6",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "40:24"
}
},
{
"type": "count",
"text": "default: 7",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "40:24"
}
},
{
"type": "count",
"text": "default: 8",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "40:24"
}
},
{
"type": "trace",
"text": "console.trace",
"stack": {
"url": "https:\/\/test.patrickmeenan.com\/console\/",
"line": "42:20"
}
}
],
About the timestamps, yes they are probably not useful as only the relative order should be important and that is preserved.
How exactly is the text, data, and stack created from the original message?
- What about messages with multiple objects?
- What about messages with a deeper callstack?
- What about messages without a URL (e.g.,
eval("console.log('eval')")
I would also keep the executionContextId as this enables to distinguish messages caused by the top-level (inline + external scripts) from messages caused by iframes and similar.
As long as the format and the conversions from the original CDP event are described, any format that enables easy analysis is probably fine. Maybe we should get feedback from some more people?
What kind of analysis could be interesting and would our format enable these?
- Stats on the amount and types of messages: works
- Count of the most common messages: works (on the stringified output)
- Find pages that have specific messages: works (using the stringified output)
- ...
The objects are serialized into the data array and each object gets a separate entry. The actual code that creates the text/data entries is here.
Basically, text is created by appending entries that have a value present, separated by spaces. Entries that don't have a value are collected into data entries, reporting the data from the preview and inserting the description into the text version.
The call stack is just grabbing the call site - is there a need for the full stack? I'll have to check, but I assume an eval will get the URL of the main document or where the eval was called from but if there is no URL at all then there will be no stack.
I'll add the executionContextId as well (probably contextID to keep it short).
Trying to walk the line between having everything that would be useful for analysis at-scale without bloating the HAR too badly since it impacts query costs for every query (and storage costs).