ring icon indicating copy to clipboard operation
ring copied to clipboard

Initial ppc64le Support

Open erichte-ibm opened this issue 5 years ago • 12 comments

Revive PR #819 (by q66) in reduced-scope by implementing powerpc64le support in ring. This effort supports only little endian machines using POWER8 and later processors for now. The goal is to enable ring to target modern powerpc64le machines.

Only the bare minimum needed to build ring and pass ring tests is included; there is no new optimized code for algorithms already implemented in C/Rust. This has been tested on a POWER9 Blackbird, and on a POWER8 machine.

erichte-ibm avatar Sep 18 '20 21:09 erichte-ibm

Codecov Report

Merging #1057 (5c496c5) into main (4b87b67) will decrease coverage by 0.01%. The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1057      +/-   ##
==========================================
- Coverage   96.02%   96.01%   -0.01%     
==========================================
  Files         133      133              
  Lines       19913    19913              
  Branches      199      199              
==========================================
- Hits        19121    19120       -1     
- Misses        754      756       +2     
+ Partials       38       37       -1     
Files Coverage Δ
src/cpu/arm.rs 92.39% <ø> (ø)

... and 2 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Jan 22 '21 20:01 codecov[bot]

Updated the branch on top of current main, fixed some issues reported by CI.

erichte-ibm avatar Jan 29 '21 20:01 erichte-ibm

Updated the branch on top of current main. Removed algorithm-specific assembly to match commit 501fc4eeaa , only the minimum required BN assembly from OpenSSL is included.

erichte-ibm avatar Apr 02 '21 20:04 erichte-ibm

Rebased branch on top of current main.

Please let us know if there is anything you need from us to aid you in moving this forward.

erichte-ibm avatar Feb 04 '22 20:02 erichte-ibm

Did you ever hear from @briansmith?

ghost avatar Apr 29 '22 08:04 ghost

@erichte-ibm @briansmith is there anything I could help to get this PR wrap up?

runlevel5 avatar Apr 29 '22 11:04 runlevel5

Did you ever hear from @briansmith?

We are still waiting to hear back, though we have be actively rebasing every week or so to try to keep this PR up to date.

@erichte-ibm @briansmith is there anything I could help to get this PR wrap up?

Reviews and testing would be wonderful! So far, we have tested on little endian P8/P9/P10. We'd love to know your use case and platform so that we can work together on any issues you might find.

erichte-ibm avatar May 06 '22 19:05 erichte-ibm

We'd love to know your use case

I've added support for powerpc64le to nixpkgs.

Nixpkgs is sort of like Gentoo except that it isn't a distribution, and technically isn't even Linux-specific.
  • It can run without root or needing to take over the whole machine, on top of any other Linux distribution, many of the BSDs, and even (with a lesser degree of functionality) MacOS.
  • It natively supports installing multiple different versions of a package or library, with zero additional effort or complexity.
  • It uses a very simple and elegant lazy functional language instead of Python.
  • It keeps all generated files (compiled binaries, etc) in an immutable-once-written, garbage collected "heap in the filesystem" (/nix/store).

Gentoo can do the first two things, sort of, sometimes, with Gentoo Prefixes, but those only work for certain packages and are still sort of an experimental feature.

Credit where credit is due, about half of the necessary effort was already in-tree as unfinished work; I did the other half to push it across the finish line. It successfully bootstraps now (if you apply all of my PRs, a few are not yet merged), building a complete Linux userspace from scratch. Nix does an outstanding job of bringing out the parallelism in this build process, which really shows off the throughput on my high-core-count Talos2 machines (those slick eDRAM cells you use don't hurt either; too bad GF never offered those to the rest of us).

Unfortunately Rust's recursive hashing scheme (Cargo.lock) makes it really awkward to carry out-of-tree patches to a widely-depended-upon library like Ring.

You need to use the cargo [patch] stanza, but unfortunately this requires recalculating the entire Cargo.lock during builds, which requires network access to https://crates.io. Network access during builds is a total non-starter for us. Recalculating the Cargo.lock of every package in nixpkgs that uses ring is theoretically possible, but unlikely to be acceptable... if that sort of thing were allowed it would quickly snowball into an exponential number of package updates for each small change to widely-used libraries.

I would really like to see this merged, even if it's hidden behind an off-by-default "yes I am crazy and I know what I'm doing" feature flag. We would expose it via an off-by-default flag with a similarly-scary name to our own users.

ghost avatar May 06 '22 22:05 ghost

You need to use the cargo [patch] stanza, but unfortunately this requires recalculating the entire Cargo.lock during builds, which requires network access to https://crates.io.

Does adding a feature still require a [patch] to enable that feature for ring somewhere in the downstream dependency chain? (e.g. projects that depend on something like rustls and not ring directly)

I would really like to see this merged, even if it's hidden behind an off-by-default "yes I am crazy and I know what I'm doing" feature flag. We would expose it via an off-by-default flag with a similarly-scary name to our own users.

A feature would be somewhat redundant since we check the target_arch in most places, however it would make it more explicit that we wish to use a "less supported" platform.

@briansmith Would adding an explicit feature for POWER support help get this reviewed/merged?

erichte-ibm avatar May 13 '22 19:05 erichte-ibm

You need to use the cargo [patch] stanza, but unfortunately this requires recalculating the entire Cargo.lock during builds, which requires network access to https://crates.io.

Does adding a feature still require a [patch] to enable that feature for ring somewhere in the downstream dependency chain? (e.g. projects that depend on something like rustls and not ring directly)

I don't think features are covered by the Cargo.lock checksum field, but even if they are, a [patch] stanza will not be required, because feature flags are "unioned up the dependency tree". Cargo will only build one copy of ring per binary, even if different parts of the binary express dependencies upon ring with different feature sets. Cargo will union all of those feature sets and build ring only once.

This means that it is enough to add an extra (possibly redundant) ring dependency at the top level with the scarily-named feature flag enabled. That is enough for every package throughout the entire dependency tree of that particular binary to be built with the feature enabled. The Cargo.lock file contains an entry for the top-level package, but that entry is special: it has no checksum field. This makes sense, because Cargo.lock is distributed alongside the top-level package's source code. Including a checksum would be redundant. So we can add feature flags there without breaking any checksums.

I've had an extraordinarily hard time finding a formal specification of that file format (precisely how the checksum is computed, an exhaustive list of the fields, grammar for the source quasi-urls, etc...). Do you know of one?

ghost avatar May 14 '22 00:05 ghost

Cargo will union all of those feature sets and build ring only once.

Good to know!

This means that it is enough to add an extra (possibly redundant) ring dependency at the top level with the scarily-named feature flag enabled

With that knowledge, this makes sense to us at least. Hopefully we can hear back from the maintainer at some point if they agree this is an acceptable approach.

erichte-ibm avatar Jun 03 '22 19:06 erichte-ibm

I would like to have a rust desk build for ppc64le; it would be appreciated if someone reviewed this pull request if there is nothing wrong with it.

bkeys avatar Aug 13 '22 15:08 bkeys

I'm thankful for this work, it's enabled me to listen to Spotify with Psst on my Talos II. So far, so good! I've tested it with some other projects as well.

rjzak avatar Sep 19 '22 21:09 rjzak

Confirmed that this project builds with hid-io/hid-io-core@5394d79da0b9fbc4a56bc104ca468e992be1241e using cargo patch stanza pointed to IBM/ring@3f8f04a1057e286f3b8c5b9e39c9f795b15743cc (IBM/[email protected]), but required reverting f06811a150fcded1555911678bbca6dcb5440cda due to API usage in rcgen 0.9. Additionally, PR branch proposed builds with multiple other rust packages depending on ring using latest ring HEAD. Thank you to @erichte-ibm and the rest of the maintainers of the ppc64le ring port for unblocking so many packages with rcgen, webpki, and rustls dependencies!

jaesharp avatar Oct 13 '22 05:10 jaesharp

Note for other users requiring ppc64le support: IBM/ring also implements a patchset for ring-0.16.20 ( https://github.com/IBM/ring/tree/ppc-0.16.20 ) and generally tracks main on ( https://github.com/IBM/ring/tree/main-ppc ) but the branch with which this PR is concerned ( https://github.com/IBM/ring/tree/initial-ppc64le-support-pr ) is kept fully up to date and thus also tracks IBM/ring@main-ppc .

jaesharp avatar Oct 31 '22 02:10 jaesharp

Wondering if the core team could help review this PR. It would mean a world for the POWER community

runlevel5 avatar Nov 10 '22 03:11 runlevel5

Apparently @briansmith replies to Hacker News comments about ring within minutes, but waits multiple years before acknowledging pull requests here on github.

Perhaps Hacker News would be a better place to get his attention?

Seriously, I've run out of ideas here; that's all I can come up with.

ghost avatar Nov 11 '22 20:11 ghost

First, I think that with the changes to base.h, PR #1558 will get PowerPC working.

In order to have PowerPC-specific code paths (optimizations), at a minimum we need to add PoewrPC to the coverage jobs in GitHub Actions to ensure that all the new code paths are actually being tested. This probably requires modifying the Rust toolchain to add support for the profiler builtins to the Linux PowerPC targets, like https://github.com/rust-lang/rust/pull/104304 does for s390x. Also, we should get some assurance that somebody (Google?) is regularly fuzzing the PowerPC code. We also need to verify that this PowerPC code matches exactly what's in the latest BoringSSL.

We also should have some performance justification; i.e. we should have the ability to benchmark the before/after. I can help somebody with this. In this case, we should add benchmarks for bn_mul_mont.

Given all that, I think we should take the baby step of an initial PR that just changes base.h so that things will work after #1558 is merged.

briansmith avatar Nov 12 '22 00:11 briansmith

In order to have PowerPC-specific code paths (optimizations)

We also should have some performance justification

The justification is side channel leakage, not performance.

Cryptographic primitives are written in assembler to prevent a compiler from optimizing your constant-time primitives into faster variable-time instruction sequences.

ghost avatar Nov 12 '22 08:11 ghost

The justification is side channel leakage, not performance.

We need to find solutions to these challenges that don't rely on PerlAsm. I do think there is a role for assembly language to play, and I'm experimenting with using the Rust compiler's inline assembly mechanism. I highly recommend people do the work to get POWER/PowerPC inline assembly working and in Stable Rust so we can use it, as PerlAsm isn't a viable long-term solution for any target.

briansmith avatar Nov 14 '22 19:11 briansmith

I highly recommend people do the work to get POWER/PowerPC inline assembly working and in Stable Rust

What's required for this to happen?

rjzak avatar Nov 14 '22 19:11 rjzak

What's required for this to happen?

I don't know. Maybe it is already done? It would be great if somebody could check with the rust-lang/rust project and report back. (Also for std::simd for POWER/PowerPC.)

briansmith avatar Nov 14 '22 20:11 briansmith

What's required for this to happen?

I don't know. Maybe it is already done? It would be great if somebody could check with the rust-lang/rust project and report back. (Also for std::simd for POWER/PowerPC.)

std::simd works fine on PowerPC iirc, no special rustc support is required because it just generates generic llvm ir, no arch-specific ir.

programmerjake avatar Nov 14 '22 20:11 programmerjake

Maybe it is already done?

Just tested, inline assembly appears to work with a trivial test in nightly with #![feature(asm_experimental_arch)], and there is a tracking issue here for the experimental archs .

erichte-ibm avatar Nov 14 '22 20:11 erichte-ibm

Maybe it is already done?

Just tested, inline assembly appears to work with a trivial test in nightly with #![feature(asm_experimental_arch)], and there is a tracking issue here for the experimental archs .

last I checked PowerPC inline assembly mostly works but is missing support for some clobbers and has no vmx/vsx support. it may also have the wrong set of default clobbers.

programmerjake avatar Nov 14 '22 20:11 programmerjake

We also need to verify that this PowerPC code matches exactly what's in the latest BoringSSL.

Currently, this PR pulled code from OpenSSL, as BoringSSL does not appear to have support for POWER. I'm double checking now, but I think it was only the bn_mul_mont that was actually needed to support the build, as there wasn't a fallback before to use.

Given all that, I think we should take the baby step of an initial PR that just changes base.h so that things will work after https://github.com/briansmith/ring/pull/1558 is merged.

Agreed, and since it appears to have just been merged, I will work on rebasing onto current upstream and see if we can drop the perlasm.

In order to have PowerPC-specific code paths (optimizations), at a minimum we need to add PoewrPC to the coverage jobs in GitHub Actions to ensure that all the new code paths are actually being tested.

Is this needed for the initial baseline support, or specifically for any future optimizations?

erichte-ibm avatar Nov 14 '22 20:11 erichte-ibm

Currently, this PR pulled code from OpenSSL, as BoringSSL does not appear to have support for POWER.

Last I checked, BoringSSL definitely supported ppc64le, although not ppc64 or ppc.

pkubaj avatar Nov 14 '22 23:11 pkubaj

Currently, this PR pulled code from OpenSSL, as BoringSSL does not appear to have support for POWER.

Last I checked, BoringSSL definitely supported ppc64le, although not ppc64 or ppc.

Ah apologies, got ahead of myself there with the wording. It builds on POWER, but it appears to only include optimized assembly for x86 and ARM. Doing a quick grep, BoringSSL does has POWER assembly for AES/GHASH, but not for ECC or chacha/poly1305. For previous versions of this PR, those were sourced from OpenSSL.

erichte-ibm avatar Nov 14 '22 23:11 erichte-ibm

@erichte-ibm what's the progress of this PR please? AFAIK there is also another branch https://github.com/IBM/ring/commits/ppc-0.16.20 downstream.

runlevel5 avatar Sep 16 '23 02:09 runlevel5

To be honest, I don't know. Last I checked, ppc64le support still relies on some perlasm to work in ring, albeit less than the original pull request. If that is a blocker for initial support, then we can't make much progress here. There have been some changes though since the last time I really looked at this PR though, so when I get the chance in the next few days, I'll dive into refreshing this PR branch.

The other note was CI -- I've coincidentally be doing work on testing ppc64le code in Actions via docker/qemu, so if running tests / getting coverage is an issue as well, I can also look into adapting that here as well.

As for the IBM/ring ppc-0.16.20 branch, that exists for other projects to patch support so that they can compile on ppc64le. Right now, it's actually a cobbled mess of newer commits that causes some build issues. I will be pushing a revised version of the branch soon that is based directly on 0.16.20 that should fix it.

erichte-ibm avatar Sep 18 '23 14:09 erichte-ibm