Update: PBKDF2 work factors increased
Previously
- https://github.com/OWASP/CheatSheetSeries/pull/1055
- https://github.com/OWASP/CheatSheetSeries/issues/1043
What is missing or needs to be updated?
The current PBKDF2 recommendation is based on information from early 2023, and as such it might no longer reflect the current risks. Since PBKDF2 is still part of the NIST standard, it is important to keep these recommendations updated.
NIST is working on a revised standard though, so this problem will not be forever.
How should this be resolved?
Unsure, should we wait for actual tests with the NVidia 5000 series before we update these numbers? Can we make a rough estimate?
cc @Sc00bz
TL;DR PBKDF2 might increase by the smallest amount. bcrypt, Argon2, and scrypt will remain unchanged. But we should wait for benchmarks. Also there could be architecture changes that increase or decrease cracking speed.
The RTX 4090 was weird because it had a better cost/performance than all other GPUs (for computationally hard password hashing algorithms), but it was huge, power hungry, and costed 60% more. So I called it "1.5 GPUs". With the 50 series we return to normality, RTX 5080 will be a "GPU" which should be a little faster (~2.3%) than "RTX 4090 as 1.5 GPUs". Also the RTX 5080 will be the GPU for computationally hard, cache hard, and memory hard password hashing algorithms. Which simplifies things. Well unless AMD's new GPUs are better.
bcrypt might change but that depends on details for the 50 series compute capability "10.1". In this chart the "Maximum amount of shared memory per thread block", if it's again 96-99 KiB then no changes. If it's 163-227 KiB (like older data center cards) then bcrypt will likely be bumped from cost 9 to 10... Just remembered it's already cost 10 (it was a compromise).
I wonder if ~2.3% will change any... Oh maybe but not by much. I round up to the next 2 significate digit number of iterations needed to get 10 kH/s.
Current (RTX 4090 as 1.5 GPUs):
- PBKDF2-HMAC-SHA1: 1,300,000 iterations ("1,274,960")
- PBKDF2-HMAC-SHA256: 600,000 iterations ("591,047")
- PBKDF2-HMAC-SHA512: 210,000 iterations ("208,060")
According to RTX 4090 benchmark and adjusted to RTX 5080 specs:
- PBKDF2-HMAC-SHA1: 1,400,000 iterations ("1,304,842")
- PBKDF2-HMAC-SHA256: 610,000 iterations ("604,899")
- PBKDF2-HMAC-SHA512: 220,000 iterations ("212,936")
Note these are best guesses without benchmarks. These should not be updated until confirmed. There could be architecture changes that increase or decrease cracking speed. We won't know until probably a few weeks or a month after release.
Note these are best guesses without benchmarks. These should not be updated until confirmed. There could be architecture changes that increase or decrease cracking speed. We won't know until probably a few weeks or a month after release.
Sound good. We should approach this problem with empirical evidence. It's good to have this discussion on the agenda, and perhaps we're all lucky and the impact of the 50 series is limited.
Metrics!
We have benchmarks for SHA-512 hashing with Hashcat! https://www.phoronix.com/review/nvidia-geforce-rtx5080-linux/3
5090: 8878450000 h/s 4090: 6266200000 h/s 4080: 3940250000 h/s 5080: 3822050000 h/s
This means that the 5090 is 40% better in pure hashing compared to the 4090
Judgement
Not sure how we should approach this. Increasing the iteration count by 40% sounds rather harsh, and it's also of diminishing use.
A 4090 was 1.5 GPUs because of price, size, and power. A 5090 would be 2 GPUs. We're back to normal with the 5080 being 1 GPU.
They are using an old version of Hashcat. Also SHA-512 isn't always a good benchmark for PBKDF2 with HMAC SHA-512. There's early exit and more passwords generated when doing SHA-512.
Oh this 4090 got 7483.4 MH/s vs 6266.2 MH/s for SHA-512. So those benchmarks might not be tuned correctly.
Just my $.02. Originally, I was going to suggest that maybe it's prudent to wait on NIST if they have a revision of SP 800-132 in the works, but then I read through https://csrc.nist.gov/news/2023/proposal-to-revise-nist-sp-800-132-pbkdf and saw that the comment period ended on May 1, 2023 and almost 2 years later, still no revision published that I can find. And if one believes the comments collected on this at https://csrc.nist.gov/csrc/media/Projects/crypto-publication-review-project/documents/initial-comments/sp800-132-initial-public-comments-2023.pdf, it's probably best that someone try to gather some empirical evidence at least until NIST gets a revision of SP 800-132 out the door. I mean the original version published in 2010 is way out of date, and the way that federal jobs are being cut recently who knows how much longer NIST will even be around to do this. So let's not let the perfect become the enemy of the code. Either gather some empirical data or generate some. (I'm not volunteering. I don't have the hardware to do an adequate test.) Then if NIST ever does publish a revision of SP 800-132, we can come back and revise those figures.
We're stuck in between government compliance and common sense. It would be much better to abandon PBKDF2 completely, but we can't.
Just wondering, how does RHEL 10 use PBKDF2 and on what numbers do they base their implementation? The de-facto Linux standard would have the same internal discussion.
Hi! I'd like to work on this issue. Is it still open for contribution?
Hi! I'd like to work on this issue. Based on the available benchmarks for RTX 5080, I plan to update the PBKDF2 iteration values accordingly. Before proceeding, do we have any new empirical data, or should I use the estimates discussed above? Please let me know how I can contribute effectively!
Hi! While waiting for a response, I analysed PBKDF2 iteration adjustments based on the latest RTX 5080 & RTX 5090 benchmarks.
Benchmark Reference: https://www.phoronix.com/review/nvidia-geforce-rtx5080-linux/3
Updated PBKDF2 Iterations (Based on GPU Performance Scaling):
PBKDF2-HMAC-SHA1: ~792K (RTX 5080), ~1.83M (RTX 5090) PBKDF2-HMAC-SHA256: ~366K (RTX 5080), ~849K (RTX 5090) PBKDF2-HMAC-SHA512: ~128K (RTX 5080), ~297K (RTX 5090)
These values were calculated using the formula:
New Iterations =Old Iterations × ( New GPU Speed / Old GPU Speed)
Supporting References:
NIST PBKDF2 Standard (SP 800-132): https://csrc.nist.gov/publications/detail/sp/800-132/final OWASP Password Storage Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html Let me know if this approach looks good or if there are any other considerations we should factor in!
I just ask that @Sc00bz approve of any changes before we commit a PR. @Sc00bz has been driving these metrics and I would like his approval before any changes are made.
Thank you @aakarshgopishetty :)
Thanks for the response! I’ll wait for @Sc00bz’s approval before proceeding with any changes. Let me know if I need to provide additional benchmarks or clarifications.
I already looked at and replied why https://www.phoronix.com/review/nvidia-geforce-rtx5080-linux/3 is not a valid benchmark.
They are using an old version of Hashcat. Also SHA-512 isn't always a good benchmark for PBKDF2 with HMAC SHA-512. There's early exit and more passwords generated when doing SHA-512.
Oh this 4090 got 7483.4 MH/s vs 6266.2 MH/s for SHA-512. So those benchmarks might not be tuned correctly.
Also the RTX 5090 should be considered as 2 GPUs and the RTX 5080 as 1 GPU. Those estimates decrease the current settings. When the RTX 5080 should increase by just a little according to specs.
If we were to go with anything right now, it should be this which is based on specs and this 4090 benchmark: PBKDF2-HMAC-SHA1: 1,400,000 iterations PBKDF2-HMAC-SHA256: 610,000 iterations PBKDF2-HMAC-SHA512: 220,000 iterations
But it being a minor update there is no rush. So if we can get good benchmarks (i.e. PBKDF2-[hash] instead of [hash]) with proper tuning that would be preferred.
I was just informed that this exists:
PBKDF2-HMAC-SHA1: 26175.4 kH/s
PBKDF2-HMAC-SHA256: 11157.2 kH/s
PBKDF2-HMAC-SHA512: 4245.4 kH/s
bcrypt: 304.8 kH/s
Thus:
PBKDF2-HMAC-SHA1: 1,400,000 iterations ("1,308,770")
PBKDF2-HMAC-SHA256: 600,000 iterations ("557,860" but using RTX 4090. Since "RTX 5090 / 2" is slower than "RTX 4090 / 1.5")
PBKDF2-HMAC-SHA512: 220,000 iterations ("212,200")
bcrypt cost 9 ("8.97")
My official minimums are:
PBKDF2-HMAC-SHA1: 1,400,000 iterations
PBKDF2-HMAC-SHA256: 600,000 iterations
PBKDF2-HMAC-SHA512: 220,000 iterations
bcrypt cost 9
I'd like to investigate why PBKDF2-HMAC-SHA256 on current cards is slower. But that shouldn't impede updating this.
Oops I pasted the wrong link. The correct link is to a RTX 5090 benchmark https://gist.github.com/Chick3nman/09bac0775e6393468c2925c1e1363d5c not the old RTX 4090 benchmark.