john icon indicating copy to clipboard operation
john copied to clipboard

OpenCL support for eCryptfs and other formats with similar inner loop

Open kholia opened this issue 12 years ago • 3 comments

Algorithm is 65536X SHA-512.

@ukasz you recently worked with SHA-512. are you interested in this one too?

kholia avatar Jul 28 '13 18:07 kholia

Sure, that's easy one. But I don't think that we need CUDA support for this.

ukasz avatar Jul 28 '13 22:07 ukasz

Sounds good. Hoping that it would land soon in bleeding-jumbo.

kholia avatar Jul 29 '13 06:07 kholia

The main loop for ecryptfs is exactly the same as it is for several other formats, so perhaps we could have a piece of shared OpenCL code (shared kernel or just a shared function used by several kernels) and even a shared FPGA design/bitstream. Right now, we have this in OpenCL for bitcoin-opencl, but not for any others that use the exact same loop.

Exact same loop in:

bitcoin_fmt_plug.c:		SIMDSHA512body(key_iv, key_iv, &rounds, SSEi_HALF_IN|SSEi_LOOP);
blackberry_ES10_fmt_plug.c:		SIMDSHA512body(keys, keys64, &rounds, SSEi_HALF_IN|SSEi_LOOP);
ecryptfs_fmt_plug.c:		SIMDSHA512body(keys, keys64, &rounds, SSEi_HALF_IN|SSEi_LOOP);
pkcs12_plug.c:		SIMDSHA512body(sse_buf, (uint64_t*)sse_buf, &rounds, SSEi_HALF_IN|SSEi_LOOP);

Also similar in:

armory_fmt_plug.c:		SIMDSHA512body(x, lut[1][0].u64, lut[n][0].u64, SSEi_HALF_IN|SSEi_LOOP|SSEi_FLAT_OUT);
drupal7_fmt_plug.c:		SIMDSHA512body(keys, keys64, &Lcount, SSEi_MIXED_IN|SSEi_LOOP|SSEi_OUTPUT_AS_INP_FMT);

Drupal7's is very similar and could be shared code too. We don't have it in OpenCL, but we do have it on FPGA (ZTEX). A slight revision of the existing FPGA design (perhaps just a third program for the soft CPUs) should make them usable for the first four formats listed above.

Armory's is dissimilar in that it needs to save the output from each iteration, and in that it's only part of the total processing, whereas the other major part is inefficient on GPU. Yet perhaps SHA-512 is slow enough that a GPU+CPU design is possible, splitting the parts of processing, such that the CPU part for the current batch of candidates would overlap with the GPU part (and transfer to host) for the next. This would perhaps double or triple the speed.

solardiz avatar May 25 '24 14:05 solardiz