xous-core
xous-core copied to clipboard
bcrypt is too slow
bcrypt with OWASP cost=10 parameter takes 5.7 seconds to run.
10 is the current recco; however, we are using cost=7 for a reasonable UX delay on password entry (0.8s).
This is a reminder to figure out a way to hardware-accelerate this function so that we can increase the strength of the password hash function.
Some resources to help understand the background behind this:
https://security.stackexchange.com/questions/139721/estimate-the-time-to-crack-passwords-using-bcrypt
In particular, for a "word class" DIY system in 2015 (an RTX2080 from 2021 is about 6x the speed):
| Cost | Hashes/sec |
| | [email protected] | 8xGTX TitanX |
|------|-----------------|--------------|
| 5 | 384.04 | 115,642.00 |
| 6 | 192.02 | 57,821.00 |
| 7 | 96.01 | 28,910.50 |
| 8 | 48.00 | 14,455.25 |
| 9 | 24.00 | 7,227.63 |
| 10 | 12.00 | 3,613.81 |
| 11 | 6.00 | 1,806.91 |
| 12 | 3.00 | 903.45 |
| 13 | 1.50 | 451.73 |
| 14 | 0.75 | 225.86 |
| 15 | 0.38 | 112.93 |
| 16 | 0.19 | 56.47 |
Worked examples:
- PCI-compliant: Tr0ub4dour&3 (e.g. 10^11 bits)
- Good password: correct horse battery staple (e.g. π^40.89316, aka 10^20.33003 bits)
| Cost | Time to crack - 8 way GTX TitanX |
| | PCI-compliant password | Good password |
|------|------------------------|-----------------|
| 5 | 5 days | 29M years |
| 6 | 10 days | 59M years |
| 7 | 20 days | 117M years |
| 8 | 40 days | 234M years |
| 9 | 80 days | 469M years |
| 10 | 160 days | 937M years |
| 11 | 320 days | 1,875M years |
| 12 | 641 days | 3,750M years |
| 13 | 1,281 days | 7,499M years |
| 14 | 2,562 days | 14,999M years |
| 15 | 5,124 days | 29,998M years |
| 16 | 10,249 days | 59,996M years |
Significantly, the "cost" difficulty between 7 and 10 is a factor of 8 extra protection; adding an extra word to your password dwarfs that.
Mainly, the cost is about delaying the cracking of "short" passwords, and the recommendation, as usual, is to not use a single word with crazy characters, but a phrase of some sort for strong secrets.
That being said: a cost of 7 is not entirely broken, but it would be nice to be in-line with modern recommendations; however in any case, the front line against password guessing is a strong password.
So, I was able to find what I believe is the correct source file for fiddling with this; how would you recommend trying to set about profiling it? For all I know, this runs at the speed it does because of the way the code is called, not the actual implementation itself.
Also, is there somewhere I can go to look at how the SHA hw acceleration works? (edit, this looks promising)
that is the correct file.
I'm not sure how to profile it on riscv - we don't have good profiling tools for performance yet on xous. but most likely to speed things up sufficiently it will require some custom fpga hardware, namely either a separate accelerator and/or more likely an instruction set extension to the cpu itself to speed up a few key ops in the bcrypt key scheduler. the cpu instruction extension also has the advantage that it is process local and does not require message passing overhead to invoke.
is bcrypt the derivation for any password stored on the device?
bcrypt is the KDF used for all the passwords on the device. Mainly because the alternate more "modern ones" are "memory hard" (to foil GPUs) but we don't have the requisite amount of RAM to do the computation ourselves. Of the more classic KDFs bcrypt is the worst performing on GPUs by some order of magnitude, but of course, because it's a work function our slower CPU puts a bound on how much work we can do to do the forward derivation.
@eau-u4f with the release few weeks ago there seems to be now a more user-friendly USB debugging support. Perhaps it is time to prepare for some very basic profiling?