shctx.c could use less assembler
It seems like shctx.c is written with several inline assembler parts to get atomic operations. GCC has a relatively rich set1 of these; possibly not everything that the file needs, but at least the cmpxchg() function could probably be replaced by __sync_val_compare_and_swap().
__sync_val_compare_and_swap gcc builtin appear in gcc v4.1.* . My embedded solution product, as a lot of embedded solutions, use a gcc v3 based toolchain.
If the assembled code is a problem, pthread_mutex usage is more portable.
#109 use atomic ops sync built-ins in fallback if current arch does not support x86 asm (built-ins atomics op implementation is less optimized).