libxcrypt icon indicating copy to clipboard operation
libxcrypt copied to clipboard

Issues from high-level review of yescrypt core code

Open zackw opened this issue 5 years ago • 1 comments

I looked over the yescrypt core code, mostly for integration problems with the libxcrypt environment. I didn’t do a thorough review; this code is already mature, and I’m not qualified to review cryptographic primitives anyway. I have some immediate concerns regarding the code in yescrypt-platform.c, but nothing else urgent. There are also a number of long-term changes that we might want to make, mostly in the direction of reducing code duplication between this hashing method and others.

I would like to ask @solardiz a number of questions in his role as upstream yescrypt maintainer; all direct questions below are aimed at him. I’m filing this as a bug on libxcrypt so I don’t lose it, but I’m happy to take the conversation to some upstream forum if you tell me where to go.

General commentary

I see _OPENMP ifdefs in yescrypt-opt.c. Does that actually work to the point where it would be worth the trouble for us to add a --with-openmp switch to the configure script and wiring it all up? Does this provide a useful performance gain for authentication uses, or is it just good for cracking?

I appreciate the detailed documentation comments on all the API functions in yescrypt.h. It would be nice to have complete documentation of all the flag bits as well (it says “refer to the description of yescrypt_kdf,” but that only covers YESCRYPT_WORM vs YESCRYPT_RW).

Down the road, I think it would make sense for libxcrypt to expose more of the tunability that is available via the yescrypt_* functions, by adding a new version of the crypt_gensalt interface that can take more parameters. It might even be desirable to bring back ROMs and other persistent state beyond what struct crypt_data can handle. I haven’t thought much at all about what that would entail API-wise, but because of that, I am fine with keeping around code that is currently unreachable.

Immediate concerns

These all relate to yescrypt-platform.c.

Nothing is arranging for HAVE_POSIX_MEMALIGN to be defined or not defined as appropriate. This is fixable on our end, since it’s the standard macro that autoconf would define if asked to check for posix_memalign, but I would like to ask where it came from in the first place—I don’t see any machinery in Openwall’s yescrypt repo to set it.

sys/mman.h is not guaranteed to exist just because __unix__ is defined. Normally I wouldn’t worry about that, but this library may be wanted in embedded no-MMU contexts. I would like to replace that with another autoconf test (using AC_CHECK_HEADERS to define HAVE_SYS_MMAN_H; AC_FUNC_MMAP does both more and less than we would actually need, having been written for the way GNU grep wants to use mmap), but how should we go about that such that it is not a problem to merge back?

I think it would be a good idea to restructure the logic in alloc_region so that it attempts to use posix_memalign and malloc if mmap fails. Would that be a change you would accept?

It’s weird that this file is included into yescrypt-opt.c. I see how it is desirable for init_region to be inline, but what would you think of creating a yescrypt-internal.h that defined that function and declared the other two (with yescrypt_ name prefixes) and then compiling this file separately?

It’s possible to determine the supported huge page size(s) at runtime by parsing /proc/meminfo and/or /sys/kernel/mm/hugepages. Down the road you may want to consider that instead of a pile of CPU-specific ifdefs.

Longer-term refactoring

The yescrypt code includes implementations of base64 encoding and decoding, fixed-endian-to-CPU conversions, SHA-2-256, Salsa20, HMAC, and PBKDF. With the exception of Salsa20, all of these have at least one other implementation in libxcrypt. I would like to move in the direction of giving each of these their own canonical implementation. This will both reduce code size and facilitate performance tuning—for instance, I can see people wanting ARM-specific optimizations down the road.

I have no problem with the canonical implementation being the one from yescrypt, but we would need them to be in their own files and have public headers. If we took the lead on factoring out these duplicated implementations, would you be open to the possibility of merging them upstream?

The most complicated thing to deduplicate will actually be the base64 coding, because bcrypt uses a slightly different alphabet from all the other hashes. And if we ever add Argon2, that has yet another one, because they decided to match MIME. The most bikesheddy thing to deduplicate will, alas, be the endian-to-CPU conversions. yescrypt is currently using names that collide with BSD <sys/endian.h> and working around that with macros, and I think that’s an argument for switching away from those names.

zackw avatar Aug 30 '18 14:08 zackw