element-desktop icon indicating copy to clipboard operation
element-desktop copied to clipboard

Illegal instruction on Skylake CPUs on Linux kernel with GDS/downfall mitigation enabled

Open hardfalcon opened this issue 1 year ago • 9 comments

Steps to reproduce

On an Archlinux machine with an Intel Skylake CPU and the kernel configured to mitigate the "Downfall"/"Gather Data Sample" CPU vulnerability (booted with gather_data_sampling=force), element-desktop 1.11.51 run with electron 27.1.3 crashes with an illegal instruction a few seconds after being started (that Electron version is necessary because older Electron releases contain 1-2 bugs that themselves trigger illegal instruction crashes caused by flawed AVX/AVX2 detection routines).

AFAIK this only affects Skylake CPUs because they are the only CPU generation that is both new enough to be affected by GDS/Downfall but still old enough that Intel didn't publish a Microcode update mitigating the vulnerability without breaking AVX/AVX2 support.

Outcome

Excerpt from a backtrace:

Thread 1 "electron" received signal SIGILL, Illegal instruction.
sha2::sha512::x86::sha512_compress_x86_64_avx (state=0x7fffffff7e80, block=0x7fffffff7e00 b"rspwczbifweef~dwes", '6' <repeats 110 times>) at src/sha512/x86.rs:64
Downloading source file /build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sha2-0.9.9/src/sha512/x86.rs
64      src/sha512/x86.rs: Directory not empty.                                                                                                                                                                                               

I suspect this crash to be caused by seshat-node 3.0.1, which appears to use version 0.2.6 of the cpufeatures crate, which contains a flawed detection routine for AVX/AVX2 support. Version 0.2.8 of the cpufeatures crate contains a fix for this.

I've verified that outdated versions of the cpufeatures crate can cause this type of crash by building a test program that calculates a SHA512 hash using the sha2 crate with both cpufeatures 0.2.7 and cpufeatures 0.2.8. Running the resulting binaries on the affected machine yields the expected results: the binary built with cpufeatures 0.2.7 crashes with an illegal instruction, whilst the binary built with cpufeatures 0.2.8 works fine and does not crash.

I routinely build my own packages of element-desktop, and I'd like to verify my suspicion by building a version of element-desktop where the seshat dependency is built using an updated version of the cpufeatures create, but I gotta admit that I'm not sure how to do that, because I'm not really familiar with the Electron/Node and Rust ecosystems.

Operating system

Archlinux testing (kernel: linux-hardened 6.6.7.hardened1-1.1)

Application version

element-desktop 1.11.51, run with electron 27.1.3

How did you install the app?

custom-built package derived from Archlinux's official element-desktop 1.11.51 package, but updated to use electron 27

Homeserver

not relevant

Will you send logs?

Stack traces: Yes Rage shake logs: No

hardfalcon avatar Dec 14 '23 23:12 hardfalcon

I suggest opening an issue with Seshat

t3chguy avatar Dec 15 '23 10:12 t3chguy

AFAICT there are multiple dependencies of element-desktop and/or element-web that use the Rust sha2 crate and cpufeatures<0.2.8, and I'd prefer to find out which of these are actually causing the illegal instruction (for example, it appears that curve25519-dalek is another potential candidate for causing this issue), before having to file a plethora of issues across the whole chain of nested dependencies.

What would be the correct way to test this? Forking the dependency repo in question, applying the fix there, and then run something like

yarn add 'matrix-seshat@https://github.com/hardfalcon/seshat#f2d629a2c605d6aa38254ff83980dd90c6080829'

and build element-desktop from that? I'm asking because that's what I've tried, but since that didn't make the crash go away, I'm wondering if I'm doing something wrong, or if the cause of the crashes is hiding somewhere else (for example within curve25519-dalek). The fact that the sha2 crate is used all over the place by an endless list of dependencies and subdependencies makes it quite difficult for me to figure out where the crashes are coming from.

hardfalcon avatar Dec 15 '23 10:12 hardfalcon

@hardfalcon you would need to add it to hakDependencies, rather than dependencies

https://github.com/element-hq/element-desktop/blob/develop/package.json#L124

t3chguy avatar Dec 15 '23 10:12 t3chguy

Thanks for the hint. Is there a dedicated (yarn) command for this, or should I just manually replace that line in package.json with something like this?

"matrix-seshat": "https://github.com/hardfalcon/seshat#f2d629a2c605d6aa38254ff83980dd90c6080829",

hardfalcon avatar Dec 15 '23 10:12 hardfalcon

@hardfalcon its a project-specific thing so yarn does not understand it. Replacing that line should work.

t3chguy avatar Dec 15 '23 10:12 t3chguy

Thanks for the help! :)

hardfalcon avatar Dec 15 '23 10:12 hardfalcon

It appears this doesn't work after all (the stdout timestamp prefix and the last 3 lines are from the way I run the PKGBUILD to build the package for Archlinux):

stdout 2023-12-15_12:07:19 $ yarn run hak
stdout 2023-12-15_12:07:20 $ ts-node scripts/hak/index.ts
stdout 2023-12-15_12:07:21   • loaded configuration  file=package.json ("build" field)
stdout 2023-12-15_12:07:21   • loaded configuration  file=package.json ("build" field)
stdout 2023-12-15_12:07:21 hak check: matrix-seshat
stdout 2023-12-15_12:07:21 hak check: keytar
stdout 2023-12-15_12:07:21 hak fetch: matrix-seshat
stdout 2023-12-15_12:07:21 Fetching matrix-seshat@https://github.com/hardfalcon/seshat#f2d629a2c605d6aa38254ff83980dd90c6080829
stdout 2023-12-15_12:07:22 Error: ENOENT: no such file or directory, open '/build/.npm/_cacache/tmp/git-cloneF9XP9x/package.json'
stdout 2023-12-15_12:07:22     at async open (node:internal/fs/promises:633:25)
stdout 2023-12-15_12:07:22     at async readFile (node:internal/fs/promises:1242:14)
stdout 2023-12-15_12:07:22     at async withTempDir (/build/element.io/src/element-desktop-1.11.51/node_modules/@npmcli/fs/lib/with-temp-dir.js:21:14) {
stdout 2023-12-15_12:07:22   errno: -2,
stdout 2023-12-15_12:07:22   code: 'ENOENT',
stdout 2023-12-15_12:07:22   syscall: 'open',
stdout 2023-12-15_12:07:22   path: '/build/.npm/_cacache/tmp/git-cloneF9XP9x/package.json'
stdout 2023-12-15_12:07:22 }
stdout 2023-12-15_12:07:22 error Command failed with exit code 1.
stdout 2023-12-15_12:07:22 info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
stdout 2023-12-15_12:07:22 error Command failed with exit code 1.
stdout 2023-12-15_12:07:22 info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
stdout 2023-12-15_12:07:22 ==> ERROR: A failure occurred in build().
stdout 2023-12-15_12:07:22     Aborting...
stderr 2023-12-15_12:07:22 ==> ERROR: Build failed, check /mnt/archbuild/ernstprr-testing-x86_64/hardfalcon/build

hardfalcon avatar Dec 15 '23 11:12 hardfalcon

@hardfalcon it worked fine for me

image

Seems like the error is from withTempDir so maybe related to some isolation/jailing?

t3chguy avatar Dec 15 '23 11:12 t3chguy

No matter what I try, I can't seem to get this to work, not even by patching scripts/hak/fetch.ts such that the substring serhat:https://github.com/ is replaced on the fly with github:. Note that I'm trying to build element using the 1.11.51 release tarball and not the raw element-hq/element-desktop git repo.

hardfalcon avatar Dec 15 '23 17:12 hardfalcon