fpga-wpa-psk-bruteforcer
fpga-wpa-psk-bruteforcer copied to clipboard
Performance + Scalability?
How do the Cairnsmore1 boards compare to graphics cards? I'm wondering why more people don't switch to FPGAs.
Another thing - Is possible to scale a single .cap file over multiple boards?
This is nothing more than curiosity since I don't have a use at all.
Well I wrote some white paper about this, I should post it at some point. Every instance can produce a hash every 8192 clock cycles (2^13 cycles), on devices like Cairnsmore1 which have 4 FPGAs per board that's a hash every 2048 cycles. You can rather easily synthesize the verilog to run at 180MHz and you might be able to push it to 200MHz (I managed like 195MHz a bunch of times). For reference at 200Mhz that's 24khash/sec per device, ~100khash/sec per board. I just found a but in my math :O So according to this (https://gist.github.com/epixoip/a83d38f412b4737e99bbef804a270c40) NVIDIA GTX 1080 is capable of 300kh/sec, which is only three times faster, but at 8 times the power are maybe similar cost? Well if you can grab some old fpgas here and there you should be able to get it much cheaper. The only issue is that the verilog is quite big so you will need a bigish FPGA to fit it.
For scalability: the driver supports several devices and will shard data across them. Of course it assumes all FPGAs have the same processing power, which makes it more complicated to shard across different devices. The changes shouldn't be hard though. It balances job size so it never issues a super short job (it would be inefficient for jobs to be short, since it can take 1-2s to send more work to the board) but also never are very long, so you can kill it and continue another time.
Hope it answered your questions.
Thanks! I'm both a software and hardware engineer but I've never touched an FPGA(been meaning to though)
That explains why GPU rigs are more common then. I guess the power cost isn't that bad since they aren't running 24/7, and graphics cards can be sold to a large market of gamers at the end.
yeah they are potentially cheaper, and faster to program and optimize. You've probably seen how tricky is to get this working properly and efficiently. It requires a good architecture design to make sure you use as much area as possible while making sure the synthesis will be optimal and no resources are unused.
Makes sense - thanks for clearing up my confusion!
Not to necro an old thread -- but could this be flashed to a Digilent Basys 2 (Xylinx Spartan3E-100)