TZP
TZP copied to clipboard
canvas: revamp
clean up canvas
- [x] remove
mozGetAsFile
- [x] remove
getContext
: 2d supported/not-supported - [x] remove
winding
,fillText
,strokeText
supported/not-supported - [x] catch
0
,null
,undefined
,false
,[]
, and""
(and of course errors)- record + display blocked, record specific block type in methods
- [x] spoofing: record untrustworthy + lie, display real result, record spoof type in methods
- [x] don't forget bypass possible with toDataURL vs toBlob
- [x] add
Object.keys(CanvasRenderingContext2D.prototype)
- [x] what to do with
winding
- A: we don't need it, entropy manifests in other ways - [x] remove keys: doesn't add to entropy: we already have isVer
- [ ] harden
RFP random
notation: see RFP characteristics (length, strings) - [ ] add toDataURL spoof fingerprint, also to be used in
RFP random
notation hardening - [ ] add
convertToBlob
? - [ ] add offScreen canvas?
- ~~make canvas more sophisticated~~
NOT DOING
, see make canvas smaller/faster- don't increase size of canvas but increase size of objects as this defines pixels more precisely
- use color gradients
- mathematical curves and shadows
- multiple text snippets, each rotated, twisted, curved etc
- example:
- see https://plaperdr.github.io/morellian-canvas/Preliminary%20analysis/webpage/canvas.html
- [x] make canvas tests smaller and faster
- as long as their is some entropy, that's fine
- the real test is checking the metric is protected and the fingerprinting characteristics
original post
when I clean up canvas section to color up lies, bypasses, methods, and add a toDataURL FP (in methods) etc
I noticed with cydec the other day, on the noise test the toDataURL
image is broken
So I added the full data for toBlob and toDataURL on the spoof test
- with cydec (this is the known test used in TZP), it is always
2be88c...
on the TZP test (in both FF chrome, at least on windows) it is always 984e23748f49e7aff3f1dab5....
- cydec is returning
null
Anyway: cydec is not "noise" being added, it's returning null
- no error caught
- 2 passes: they match, so not a lie yet
- known test does not match result, therefore it's a lie, so we assume noise added
This is what confuses me
// temp debug
let testdata = getFilledContext().canvas.toDataURL()
if (testdata == "") {console.debug(runNo, "empty string")}
if (testdata === null) {console.debug(runNo, "null")}
if (testdata === undefined) {console.debug(runNo, "undefined")}
// temp debug
let testdata = getKnown().canvas.toDataURL()
if (testdata == "") {console.debug("k, empty string")}
if (testdata === null) {console.debug("k, null")}
if (testdata === undefined) {console.debug("k, undefined")}
Why is the hash different
-
2be88ca4242c76e8253ac62474851065032d6833
for known -
984e23748f49e7aff3f1dab58ad7c26a649433cc....
for toDataURL - they're both
null
- ???
https://github.com/arkenfox/TZP/commit/84b1448ccd62887cd5c5c0ab19cc6333c6ff543b
- will expand to cover them all when I cleanup the rest
- still intrigued as to why the hashes differed
- do we need to handle
undefined
and empty strings ? <- @abrahamjuliot
do we need to handle undefined and empty strings
That could be useful to highlight in the output, but the hash will be unique in any case. For example, 0
, null
, undefined
, false
, []
, and ""
should not return the same hash.
I still don't see why the hashes differed between toDataURL known and toDataURL test. Anyhow, the hashes collected for known only tell me that the canvas is being tampered with, not how: and returning a consistent error even across browser sessions is not "noise added", but rather more entropy to be added under methods
0
,null
,undefined
,false
,[]
, and""
oooh, a list 👍
currently
getContext | 2d: supported
toDataURL | hash or blocked (nulls, errors, timeouts etc)
toBlob | hash/blocked
mozGetAsFile | not supported or hash/blocked
getImageData | hash/blocked
winding | supported
isPointInPath | hash/blocked
isPointInStroke | hash/blocked
fillText | supported
strokeText | supported
I'm going to strip out the mozGetAsFile (makes things simpler for bypasses etc), and I don't need the getContext 2d (which used to combine with webgl support): if it's not supported that will show in the hash results - right?
So my question would be do I need the winding
, fillText
, strokeText
in my results? CB has options for the two Text
and an extension could block those inputs - but that still results in the canvas results showing that up
CB set to block only inputs fillText and strokeText, but you also have to tick getContext for anything to apply
If you do the same but untick getContext but tick say toDataURL, then everything is "supported" but toDataURL returns undefined
So my gut reaction is that winding
, fillText
, strokeText
are not required for any entropy? @abrahamjuliot would you agree? Then I can just focus on five results, not ten .. does this sound like the right plan?
pretty sure I don't lose any possible entropy by reducing to five results
getContext 2d: if it's not supported that will show in the hash results - right?
Right, the hash result can detect support.
winding, fillText, strokeText
It appears these have long-standing support. Unless there is a preference that alters these, checking support might be unnecessary. If there is a preference that alters supported features, we could alternatively fingerprint functions and properties on CanvasRenderingContext2D
.
Object.keys(CanvasRenderingContext2D.prototype)
^^ That sounds better .. have some 🍰
if canvas keys are blocked, it would still affect the FP of the image (if image didn't error)
Technically, Object.keys
can be altered so that it blocks the argument CanvasRenderingContext2D.prototype
, and this would not affect the image or cause errors. Instead, we can use a for...in
loop to get the keys:
keys = []; for (const key in CanvasRenderingContext2D.prototype) { keys.push(key) }
If CanvasRenderingContext2D.prototype
is blocked, that would affect the image and cause errors.
So are we trying to bypass the keys tampering, or get it's entropy .. or both? Will come back to it, after I tidy up the current five items with logic for blocks (including nulls, errors etc), bypass (toDataURL vs toBlob), lies, methods (for spoofing and "blocks") .. and coloring stuff up
Some observations (super unlikely settings, but then who knows)
CB settings: one way to trigger the effect (basically blocking some inputs)
- General:
block everything
(orblock input
) - APIS: uncheck getContext, toDataURL, toBlob, getImageURL
My real canvas values are
toDataURL 1684d35fa0fe81e6f4091bcf624742e99fe01e2e6841d7c7ec98534a47d1402a
toBlob 1684d35fa0fe81e6f4091bcf624742e99fe01e2e6841d7c7ec98534a47d1402a
getImageData 8ef46cfc0049d834564c9d21b8d6c60e3c9733458404de50524710253e75962d
isPointInPath be524a87ffe08cdaa4512fef9e3595783f568823225593b406c9b3c62f81807c
isPointInStroke 942591bcf2f6bd7c4c49a061a3f1ad5c5d7de9c6ee2a9fc86f79a1b665a0acf7
here's what happens
The known tests pass, and we do not pick up the lies
for toBlob, toDataURL or getImageData. Those values are consistent on my PC. So lets look at CB tests
So the fillText
is being blocked from being input (note my canvas test does not use strokeText
). The known tests do not use text, and in fact do not do much at all, whereas the actual canvas test does a bunch of colors and shapes (with math: not sure if that math is FF entropy-worthy), and text, and winding. I assume the winding is not being applied either.
So is this a LIE or not? Initially I was like, WTF, why am I not picking this up. But now I realize it's not a lie. It's not being spoofed (it's consistent across reloads, sessions etc) and is actually revealing the correct values: it's just that some things are blocked (text, winding).
Now we might be able to pick up on the winding (not supported), but not sure on the fillTest, strokeText (they are supported), but to be fair, the image already holds entropy. Will be interested to see what the canvas keys adds when I'm ready.
@abrahamjuliot .. would you agree with my assessment?
^ of course what I can't detect is if someone blocks fileText
but then randomizes with persistent noise (not sure I can get CB to behave like that): the problem is the known test .. I guess that's just super unlikely edge cases?
keys: added (unsorted for now) looking at FF nightly vs chrome stable
FF has (and chrome doesn't)
- createConicGradient, mozCurrentTransform, mozCurrentTransformInverse, mozImageSmoothingEnabled, mozTextStyle
- FYI: createConicGradient is not in stable
chrome has (and Ff doesn't)
- direction, getContextAttributes, imageSmoothingQuality
Other than that, they are the same, but the order is different
thoughts
- diffs between FF releases doesn't really matter (we have get version), but some might be behind a pref
- voila ... see
canvas.createConicGradient.enabled
.. but no-one would really flip these - so I think this really only enforces browser version per engine
- since it has almost zero perf, might as well keep it
I'm going to check older versions of FF to see how much volatility it had. And it'll be interesting to see of it changes with various extension spoofing configs (or RFP, Brave shields)
Also: winding was in the old test, and it does become unavailable in some configs: but I don't know how important that is to collect, or what blocks winding. A spoofed canvas won't reflect lack of winding, but a non-spoofed one would.
test away bro
edit: seems stable AF: RFP, brave shields, extensions (tried lots of configs) etc aren't affecting it
FYI: tested vanilla FF's (windows)
FF60-69
e61df00e8aa70a5cbb5533dee374b5aff116cfc5 [62 keys]
FF70-89
b7d5621a3a1a1b9e6fe366187e562426bea769f4 [63 keys]
added: getTransform
FF90+
d516bde278df1947c1d099d77b3d8ca8bbb4a4a3 [64 keys]
added: createConicGradient
@abrahamjuliot (and @kkapsner : you still there buddy? is everything OK?)
In order to be more consistent with layout and alignment of columns, I changed the few items that were using SHA-256, to SHA-1: namely audio and canvas
like here in audio https://github.com/arkenfox/TZP/blob/09fdc4cb467cee746dc14af34cd112e4eeea9f2b/js/audio.js#L259-L265
I'm concerned about using SHA-1 with crypto.subtle.digest
as it may get deprecated, or indeed not even work in some browsers (IDK about safari: but Edge doesn't seem to: that would be the edgeHTML
engine I assume, not the one based on blink)
So I have two options AFAICT
- just use my own
sha1()
and rehash the sha-256 value - work out how to convert the arrayBuffer into an array before hashing
- I'm sure it's something simple
@abrahamjuliot don't you do something like my first option, by minifying the hash?
you still there buddy? is everything OK?
Yes - I'm still here. But the last months were tough. Was not ill but the whole situation was very taxing.
Yes - I'm still here
Hang in there buddy. I'm rooting for ya ❤️ . Meanwhile, have some 🍰 and 🍻
convert the arrayBuffer into an array
Array.from(buffer)
// or
[...buffer]
minifying the hash
For section hashes and heavy arrays, I use sha-256 and slice the first 8 characters for the HTML output. I've seen some sites shorten sha-256 strings with an ellipsis separator. Some show the full hash in a pop-up title on mouse hover.
const sha256hash = '89455ebb9765644fb98068ec68fbad7fcaaf2768b2cb6e1bd062eee5790c00e8'
const getDisplayHash = (sha256hash, limiter = 8, separator = '...') => {
return sha256hash.slice(0, limiter)+separator+sha256hash.slice(-limiter)
}
getDisplayHash(sha256hash) // "89455ebb...790c00e8"
For sub item display hashes and a few light size metrics, I use a mini hash function.
// https://stackoverflow.com/a/22429679
const hashMini = str => {
const json = `${JSON.stringify(str)}`
let i, len, hash = 0x811c9dc5
for (i = 0, len = json.length; i < len; i++) {
hash = Math.imul(31, hash) + json.charCodeAt(i) | 0
}
return ('0000000' + (hash >>> 0).toString(16)).substr(-8)
}
hmmm, getDisplayHash .. I used to do that on canvas
- e.g.
[1] 829a659daf5413... [2] 29fa351ec8e334a......
- e.g.
noise detected [both] 4829fa829a659daf541351ec8...
- ^^ both totalling 64 chars to match a sha-256 length
After doing that I wasn't a fan of hiding the full hash, although the hash is meaningless in these cases
I guess my two issues are SHA-1
doesn't work on all browsers and can get dropped at any time, and SHA-1 can have collisions
- not that I care about edgeHTML, but breakage is not professional
- so therefore, I must use SHA-256 and display a short form
- or use my internal sha1() function
Canvas and webgl should really use sha-256 given they have high entropy
Hmmm ... decisions .. decisions ... ¯\_(ツ)_/¯
I need to drink on it for a while :beers:
Hmmm .. RFP random
characteristics
hah - https://bugzilla.mozilla.org/buglist.cgi?bug_id=1737038,1724331
@abrahamjuliot thanks for that mini function 👍
I've been on a little bit of a perf mission. Things like using map. I should use sets more too. using else if
where I can. Adding break in for loops. using inline instead of calling a global function. one thing I'm not sure on with perf is how async can speed things up
Got any tips?
FYI
I think I need to be careful on what chars can be in the string passed to mini - because it uses charCodeAt?
I was calling sha1()
122 times (or 123 if not nightly for component shims) taking 95ms
- the fastest code is when no code at all runs: so I eliminated calculating a hash
- e.g. a bypass one vs the fake one unless needed
- not hashing empty empty arrays or arrays with 'undefined'
- e.g. when a textmetric wasn't supported
- I switched canvas to SHA-1 instead of rehashing it
- the entropy is not that important, it's to detect protection, so if I get a collision so be it
- if SHA-1 is ever dropped from crypto, I can switch it back via the var
isSHA
- I also switched canvas known tests to mini (except nonFF toDataURL, toBlob still uses sha1 because I can't test for all mini values)
- i added the
mini()
which I use for simple compares (if needed) - I moved some functions to post FP: i.e items not in the fingerprint
- all the iframe UA tests (a hash for each pus the summary)
- worker and iframe tests are going to be treated as a different FP to top level doc
currently down to 92 (8 mini, 84 sha1) taking ~80ms, but I can do more
- the sixteen domrects can be switched to minis, and only a single sha1 generated if no lies (or 4 in non-FF)
- the three computed styles suck a lot of ms - we can do the same as domrect: if no lies we only need a single sha1
- math will be replaced with the monsta test PoC: all set to go: two minis for 2-pass compares, and a single sha1 for display
out of interest, I ran the entire thing with mini and it topped out at 14ms :)
click me for details
HASH STATS: [92 times | 78 ms]
- 1 : mini : _global isError
- 1 : sha1 : _global isEngine
- 0 : mini : _prereq navA
- 0 : mini : _prereq navB
- 1 : sha1 : feature errors
- 1 : sha1 : feature widgets
- 0 : sha1 : feature math m1hash
- 0 : sha1 : feature math m6hash
- 0 : sha1 : feature math mchash
- 0 : sha1 : feature math m1hash
- 0 : sha1 : feature math m6hash
- 0 : sha1 : feature math mchash
- 0 : sha1 : feature section result
- 0 : sha1 : ua
- 0 : sha1 : ua section result
- 0 : sha1 : screen section result
- 1 : sha1 : devices speech engines
- 1 : sha1 : headers section result
- 0 : sha1 : storage section result
- 1 : sha1 : domrect
- 2 : sha1 : domrect
- 1 : sha1 : domrect
- 0 : sha1 : domrect
- 1 : sha1 : domrect
- 1 : sha1 : domrect
- 0 : sha1 : domrect
- 0 : sha1 : domrect
- 1 : sha1 : domrect
- 0 : sha1 : domrect
- 1 : sha1 : domrect
- 0 : sha1 : domrect
- 1 : sha1 : domrect
- 0 : sha1 : domrect
- 0 : sha1 : domrect
- 0 : sha1 : domrect
- 0 : sha1 : domrect section result
- 0 : sha1 : media canplay
- 0 : sha1 : media istype
- 1 : sha1 : media canplay
- 0 : sha1 : media istype
- 0 : sha1 : media section result
- 0 : sha1 : languages collation
- 0 : sha1 : languages timezone offsets
- 0 : sha1 : languages language & locale
- 0 : sha1 : languages timezone
- 2 : sha1 : languages date/time & format
- 0 : sha1 : languages geo
- 0 : sha1 : language section result
- 1 : sha1 : css colors
- 1 : sha1 : css colors
- 0 : sha1 : css colors
- 1 : sha1 : css colors
- 0 : sha1 : css system fonts
- 16 : sha1 : css computed style 0
- 7 : sha1 : css computed style 1
- 6 : sha1 : css computed style 2
- 1 : sha1 : css section result
- 0 : sha1 : misc component shims
- 9 : sha1 : misc iframe props
- 0 : sha1 : misc nav keys
- 0 : sha1 : misc section result
- 1 : sha1 : elements keys
- 0 : sha1 : elements mathml
- 0 : sha1 : elements lineheight
- 0 : sha1 : elements section result
- 0 : sha1 : devices media devices
- 0 : sha1 : devices section result
- 2 : sha1 : fonts textmetrics width
- 1 : sha1 : fonts textmetrics actualBoundingBoxAscent
- 0 : sha1 : fonts textmetrics actualBoundingBoxDescent
- 1 : sha1 : fonts textmetrics actualBoundingBoxLeft
- 1 : sha1 : fonts textmetrics actualBoundingBoxRight
- 2 : sha1 : fonts gylphs offset
- 2 : sha1 : fonts gylphs bounding
- 2 : sha1 : fonts gylphs client
- 1 : sha1 : fonts fontsScroll
- 1 : sha1 : fonts fontsOffset
- 0 : sha1 : fonts fontsClient
- 0 : sha1 : fonts fontsPixel
- 1 : sha1 : fonts fontsPixelSize
- 1 : sha1 : fonts fontsPerspective
- 0 : sha1 : fonts fontsTransform
- 0 : sha1 : fonts section result
- 0 : mini : canvas [k] todataurl
- 1 : mini : canvas [k] getimagedata
- 0 : mini : canvas [k] ispointinpath
- 0 : mini : canvas [k] ispointinstroke
- 0 : mini : canvas [k] toblob
- 0 : sha1 : canvas section result
- 0 : sha1 : audio get
- 1 : sha1 : audio copy
- 0 : sha1 : audio section result
async
There's performance gains if the functions contain asynchronous operations like setTimeout, promise based APIs, fetch requests, etc. The async syntax is mostly a cleaner way of writing promises.
tips
I think you're ahead of this tip by moving non fingerprinting functions to post FP.
I run the SubtleCrypto.digest()
hashing functions in one Promise.all
, separate from the fingerprint functions, and then I perform all HTML template modifications in a final patch operation to reduce blocking code during other operations.
In short, there are 5 operations I run with their own performance time.
- Get iframe and prototype lies, then pass iframe contentWindow and lie results to fingerprinting functions
- Fingerprint (includes async operations like worker, webrtc, and voices)
- Continue final fingerprinting (compare navigator with worker results)
- Then, perform hashing
- Finally, patch the HTML template with the results. I mostly use the mini hash function here to compress sub results.
Thanks
I'm down to 72 calls and ~50ms (can still remove 5 math and 11 domrect which might shave off 5+ ms - new math coming soon) - as long as you're not lying or blocking some multiple methods (7 x fonts, 3 x css styles - in which case I have to compute each lie and maybe a bypass)
I'm toying with the idea of using minihashes for some sections, but it just looks weird - but it can make sense - I need to drink on it
by moving non fingerprinting functions to post FP
Yup, that made some perf more consistent as well - it's really just the perf.now test (11 x 13ms) and the iframe + workers
- I replaced the woff function and it's now in the FP - check it out, speed AF - bonus, it picks up on a pref change when you rerun
If I was designing this from scratch again, it would be a bit different - removing the iframes and workers to run post doc makes sense when I expand them - i.e treat top level doc separate
timing
The only one that bugs me is device speech engines gets held up somewhere
- RFP off - run outputDevices last, the perf is like 8m / run outputDevices first and it ends up down the stack at 100ms (rerun is 8ms)
- RFP on - run outputDevices last (or pretty much not early) and the entire TZP perf goes out the window, double/triple
Useless app
updated: https://arkenfox.github.io/TZP/tests/canvasrfp.html - much faster and you can now set a size via console (see details at the top where it says click me)
You can also bypass the run buttons by using run_checks(number)
. If there's no known matching patterns for a size (I've only bothered with 16x8 and 16x8), the summary will basically provide you with the guts of the RFP rules for that size (if you use a large enough number)
for example, before I added the rules for 16x8 - I ran run_check(1000000)
- 1 million, and got back all 12 possible combos from the 4 toDataURL lengths sizes and last 10 chars (the middle slice was stable) - ... BUT ... some of those were as little as 1 result from a million. Also, don't do that unless you have grunty machine. On this 11 yrs old, it took about a minute to run, but then basically made FF unusable for about 4 minutes.
So basically, all combos are possible: lengths x middlechars x last 10chars
anyway, to cap a few things off. Canvas (TZP refactor) is now no longer checking for entropy - we already know entropy exists and we already protect against it. So this is pointless to try and exploit in TB or RFP - so TZp only checks for protection - and records the unprotected (known) hash or untrustworthy. If untrustworthy it also records if RFP and it also records if persistent vs not persistent.
I might add some degree of randomization: such as channels and % of pixels changed: low, high, medium - but this is nasty when trying to draw lines if you want consistency
FWIW: I think subpixel collection is far more dangerous: font, glyph, mathml, lineheight and other objects - transformed etc, is about the worst we can get (maybe outside webgl)