ladybird icon indicating copy to clipboard operation
ladybird copied to clipboard

Fingerprint Resistance

Open JanosKovacs opened this issue 1 year ago • 5 comments

Existing browsers apply bandage patches to achieve fingerprint resistance. Developing a new browser engine from scratch means a new opportunity to make it fingerprint resistant without relying on hacks. However, in order for developers to be able to build a fingerprint resistant browser, it is required that they are aware of the methods browser fingerprinting scripts use. A while ago, it was suggested on Ladybird's Discord server to make a new issue about fingerprint resistance to provide feedback on this topic to the developers, so that is exactly what this issue is. To start the discussion, I will cover some basic ways browser fingerprinting can be done, but if others have more information on this topic, I suggest them to comment on this issue, unless the Ladybird developers can suggest a better way to give such feedback if they prefer issues not the be used this way.

What is browser fingerprinting?

Browser fingerprinting is the act of identifying a browser running on a machine without the use of cookies, local storage, or other intended methods for storing persistent data inside the browser that can be cleared by the user. There are various characteristics that can be detected using JavaScript (or in some cases even CSS) and that can differ based on the hardware and software configuration. The larger set of these characteristics we can gather, the less likely it is that the same browser running on another computer, or another browser running on the same computer will match. This enables creating a unique identifier, the browser fingerprint, for the browser that doesn't change when clearing the history or saved data. Browser fingerprints can change though when changing settings within the browser, if those settings affect some characteristics that the fingerprinting script are using to determine a fingerprint, although if its just a few characteristics changing, the rest are likely still enough to uniquely identify a single browser or a small set of browsers. Browser fingerprinting is a large issue when it comes to online privacy, as it can be used to track the activities of individual users across browsing sessions, even when they are not logged in and they clear the history and saved data on each browser exit. It can also be used by services to identify that two accounts belong to the same individual user.

Two methods to combat browser fingerprinting

  1. Reduce the number of possible fingerprints by making characteristics be the same across browsers on different hardware and software configurations. Should be preferred, when possible.
  2. Introduce a random value that affects the fingerprint, so that the fingerprint will differ after each try to fingerprint the browser.

Screen size

The screen size can be used as a characteristic for fingerprinting. JavaScript APIs allow for getting the size of the multiple areas. In fingerprint resistant browsers, it should be ensured that no information is revealed about areas outside of the viewport, including the size of the whole screen, and the browser window with the toolbar, status bar, sidebar, or other elements. The only area that JavaScript should be able to know the size of is the actual viewport that renders the HTML page. The other JavaScript APIs should probably also report the viewport size, as if it would be full screen. However, unless the browser is in windowed mode with a "random" size, this still leaves the size of the viewport to be used as a characteristic for fingerprinting, which can only be prevented by letterboxing that limits the viewport size to a set of common sizes, or to a random size that changes frequently. To prevent the OS theme's taskbar/panel or window decoration size affecting the viewport size, Ladybird could also slightly adjust the height of the toolbar to accommodate, so that the viewport size is the same regardless of the the taskbar/panel or window decoration size.

User-agent

In fingerprint-resistant browsers, the user-agent should not reveal much information about the user's system. It isn't a problem if it reveals the browser engine, however the browser version should only be incremented if there are massive additions/changes to the browser engine that would require websites being able to know if the browser supports them. It is debatable whether the browser should reveal information about the OS its running on, since some websites use this to automatically provide downloads for the right OS. However, the browser should at least not reveal the revision of the OS (e.g. Windows 10, Linux Ubuntu 24.04 with X11).

UI drawing

Current browsers like Firefox and Chrome draw HTML buttons and other form elements using the native style of the operating system, unless styled by CSS. This can reveal the OS revision/distro. Fingerprint-resistant browsers should draw their own UI that doesn't differ based on the OS.

Fonts

In fingerprint-resistant browsers, it is important that the browser uses its own set of fonts it ships with, and does not use fonts available on the OS. Otherwise, using JavaScript it would be possible to probe for specific fonts existing on the system, which can be used as a characteristic for fingerprinting. The set of installed fonts differs based on OS and OS revision/distro, and users can install custom fonts.

HTML Canvas

When drawing things like text to the HTML canvas using JavaScript, there can be slightly differences in things like the anti-aliasing based on the OS and graphics card/driver. The only way to truly deal with this is to run a deterministic drawing algorithm on the CPU, however this would result in bad performance, so an alternative is slightly randomizing the canvas area after JavaScript tries to read it out, in a way that's not noticeable by the human eye.

Device properties

There are JavaScript APIs that can check specific device information, such as the name of the graphics card. In fingerprint-resistant browsers, these should not reveal real information about the hardware, and should instead return the same, or randomized strings or data. Also, using JavaScript, the bit depth and sample size of playback or input audio devices can be determined. To avoid this, the same bit depth and sample size should be presented to websites, and the browser should resample the audio stream to match that of the OS. If multiple input/output devices are present, they should be combined into a single input/output stream and show up as a single device to the JavaScript APIs. The single audio device should show up as Stereo, and it should show up even if there is no audio device available on the OS, so it doesn't reveal the presence of one.

Language / time-zone information

Fingerprint-resistant browsers should always use US English and UTC0 timezone for fingerprinting resistance. This can be inconvinient to some international users, so it could be locked behind a toggle. An alternative approach is to use the language and timezone of the country associated with the user's IP address, however some countries have multiple languages (e.g. Switzerland) or time-zones (e.g. US).

JanosKovacs avatar Jan 12 '25 21:01 JanosKovacs

Don't track me fuck off ads, fuck off fingerprints

trazyn avatar Apr 15 '25 14:04 trazyn

@JanosKovacs I don't think that the format you've chosen for this issue is helpful for the project going forward in this forum. It is verbose and written like an overview of common fingerprinting methods.

If there are specific tasks and or code changes that would make the browser more fingerprint resistant, listing those items in a task list would be much more useful for contributors to reference as we develop the browser and web engine than re-reading this blog post

ADKaster avatar Apr 15 '25 23:04 ADKaster

@ADKaster I understand your point but @JanosKovacs is right.

This is not a "change that piece of code" but a design choice that needs to be in developers mindset. And needs to be communicated and agreed on.

To make it shorter than wall of text at beginning:

Whatever information you are returning to js, needs to have a little randomization, for canvas it is easy to explain, just add some noise on top of the render.

Same for audio,... those are simple tasks. The problem comes from font enumeration, screen size, etc.

Please look at this page: https://browserleaks.com

It is a good subset of what can be used, but by far not everything. Every information that you return back about the system is a small chunk of information, that is used to calculate some value that identifies you uniquely. Even if it is a boolean value, it is still 1 bit of information that contributes to end result.

Try to add a bit of randomness to all the calls that are returning static information about the system, without returning something that would corrupt the functionality on caller side. You have to agree on this on developers level. Later it is going to be very hard to add it on top and or to even identify where the changes should be made.

Another small idea, something like application firewalls but for js calls. In learning mode (off by default) you show user, what will be returned from js call (flag those that are obtaining information about environment) and allow result to be changed before returned to js. Than the classic "Always/Once/..." selection.

You have an unique opportunity here. Dont waste it.

stiray avatar May 04 '25 09:05 stiray

Whatever information you are returning from js, needs to have a little randomization, for canvas it is easy to explain, just add some noise on top of the render.

Correct me if I'm wrong, but isn't added noise in things like canvas rendering defeated by just taking the measurement multiple times and then fingerprinting based on the average?

Later it is going to be very hard to add it on top and or to even identify where the changes should be made.

I doubt that it's any harder to figure that out afterwards than it is before or while implementing the feature. What I think would be hard is cases where randomization is at odds with spec requirements. At that point, hardening the API also means changing the spec and updating WPT test cases to match that. Also, APIs that indirectly give system information, like measuring script execution speed, canvas rendering performance or image load times to figure out what sort of hardware setup a user is using.

Another small idea, something like application firewalls but for js calls. In learning mode (off by default) you show user, what will be returned from js call (flag those that are obtaining information about environment) and allow result to be changed before returned to js. Than the classic "Always/Once/..." selection.

APIs that expose information about your system already have this in the form of the permission system. (The popup you get when a page tries to access your camera, microphone, location, bluetooth, magnetometer...)

If there is JS calls that directly expose information about your system without one of those, that is probably a spec issue. If there is JS calls that indirectly expose this sort of information, you'd be asking users to disable or modify CSS or JS features that they probably don't know or understand, so it should really be mitigated by default through randomization, discretization or similar.

Psychpsyo avatar May 04 '25 12:05 Psychpsyo

This is not a "change that piece of code" but a design choice that needs to be in developers mindset. And needs to be communicated and agreed on.

In the same way that an issue on the issue tracker with 500+ words on why 'Secure by Design' best practices should be followed in the project is a waste of space, the path of this issue and related comments don't match the venue.

If you want to work on some sort of guidelines document, come chat with us on discord about adding something to Documentation/.

If you have specific patterns or mitigations that you have looked in the code to verify we are missing or mis-implementing, then we can talk about it in a GitHub issue. With details, not vague goals.

The GitHub issue tracker is not the proper venue for brainstorming or design discussion.

ADKaster avatar May 04 '25 14:05 ADKaster