FPGA Based UNIX Crypt Hardware Password Cracker

The goal is to get a ~100 Euro unit to do 10 million key guesses per second. This device is built for the fun of building it and to see what's possible with current hardware.

UNIX Crypt requires 25 passes of a modified DES algorithm with each DES pass requiring 16 rounds to complete. This gives a total of 400 clock cycles to complete a single encryption, if each round is completed within one clock cycle. Doing this is easy since DES is very hardware friendly. A round only consists of permutations and a single lookup and xor operation.

To reduce the time needed below 400 cycles, multiple crypt cores are used on a single device. The cores are fed by a password candidate generator in a round robin fashion. While a pipeline approach might produce less overhead on huge devices, using multiple cores will scale better to the available amount of logic on smaller devices.
The current design supports up to 200 crypt cores, which is sufficient for devices currently available to end users.

Target Device

A Xilinx XC3S1000-4 is being used. It is available on a handy micro module from Trenz Electronic. The micro module is mounted on a carrier board for easy access to the FPGA's pins.

The onboard 1.2V core power supply only delivers up to 600mA, but calculations show an expected consumption of 1.6A. Thus an external power supply must be added.

Add Ons

A LM317 produces 1.24V for the core and a MAX232 provides a serial port for data exchange with a PC.

In addition to the big heatsink for the LM317 one is added to the FPGA. The FPGA will still heat up to about 60C. The LM317 reaches a temperature of 55C.

This is not surprising as most of the logic is used in crypt cores and due to the nature of cipher algorithms every bit has a 50% chance to toggle. The result is a worst case power consumption and heat production.

Performance Data

Device Comparison

Hardware MKeys/sec Crypt Cores Clock (MHz) Cost (Euro) Notes
XC3S5000-4 45.0 150 - (1)
XC3S1000-4 9.2 35 105 78
XC3S200-4 2.1 7 120 28
AMD64 3000+ 0.4 - 2000 120 (2)

(1) = estimated values as device is not easily obtainable
(2) = running 'john' password cracker with heavily optimized crypt implementation

Times for a Single Unit in Hours (XC3S1000-4)

Password Set 5 Chars 6 Chars 7 Chars 8 Chars Notes
a-z 0 0 0.2 6.3
a-z, 0-9 0 0.1 2.4 85.2
a-z, 0-9, (A-Z) 0 0.1 3.4 123
a-z, A-Z 0 0.6 31 1614.1
a-z, A-Z, 0-9 0 1.7 106.3 6592.4

Cost for Cracking Within One Hour in Euro (XC3S1000-4)

Password Set 5 Chars 6 Chars 7 Chars 8 Chars Notes
a-z 78 78 78 546
a-z, 0-9 78 78 234 5332
a-z, 0-9, (A-Z) 78 78 312 6448
a-z, A-Z 78 78 1984 67830
a-z, A-Z, 0-9 78 156 5564 276906

The prices listed are private end user prices in Germany. Companies can likely get the chips much cheaper in large quantities. Discounts used: 1-25: 78 Euro, 26-100: 62 Euro, 101-250: 52 Euro, >250: 42 Euro

License

Files found in the downloadable archives below are released under the GNU GPL.

Downloads

On September 20, 2006 a new computer crime law was proposed by the German Bundestag. It will prohibit to publish computer programs suitable to crack passwords.

As I do not have time to stay up to date on the issue, the full VHDL code will not be available for download anymore. You can still download the VHDL crypt implementation. (includes DES) It basically performs the same operation as the UN*X function, except that comparison with the target hash is integrated and not done externally.

Crypt Implementation (2006-09-09)

Comments
James (Sat, 13 Sep 2008):
Quite annoying about that idiotic german law. Found it all quite interesting anyway. Good luck.
Jack (Sat, 26 Sep 2009):
VHDL is not a \"computer\" program once implemented in hardware ;-)

Imagine your experiment, spread across many 19\" racks completely full of the lastest, biggest, fastest FPGA in 2009... Are you jealous or scared ?
achim (Mon, 18 Jan 2010):
have you heard of the COPACABANA project? http://www.copacobana.org/
Jeff (Tue, 2 Feb 2010):
How do you know when you have found the correct key?

Are you taking some known data, encrypting it, then brute forcing the encrypted data until you get the known data back?
Tony (Fri, 9 Apr 2010):
Jeff, I have the same question.
Rick (Sat, 10 Apr 2010):
It\'s not about finding the key. The hardware generates a password (think aaa, aab, aac, aad, etc.), encrypts it using the one-way hash, and then compares that to the value from the passwd file. There is no going backwards...
Jeff (Wed, 14 Apr 2010):
Yeah, I emailed Michael and he replied within a day with the same explanation given by Rick. Makes sense. I am doing a project based on DES, not UNIX Crypt, and was confused.

Right now we are able to fit 3 DES pipelines on the XC3S1000-4 running at 100MHz good for 300 million keys/sec. So for UNIX Crypt, one may want to consider a basic DES pipeline that iterates 25 times to put out a key every 25 cycles. That would work out to about 12 million keys/sec with UNIX Crypt.

We\'re currently trying to see how we can squeeze more DES pipelines onto the XC3S1000-4...the DES S-boxes are taking up most of the space :(
Alexander (Mon, 14 Jun 2010):
It\'s great to see this done, finally.

The performance number given for AMD64 3000+ at 2.0 GHz with John the Ripper - listed as 0.4 MKeys/sec - is plain wrong, though. Perhaps a wrong (non-optimal) make target was used. That CPU achieves around 1M c/s with JtR built as linux-x86-64 (or the like). And indeed much faster yet inexpensive CPUs are available now, e.g. a Q6700 does around 2.5M c/s per core, 10M c/s per chip with JtR 1.7.6 (and likely a bit more than that with future versions - there\'s faster code being developed/tested). Here are some benchmarks:

http://openwall.info/wiki/john/benchmarks
Alexander (Mon, 14 Jun 2010):
Core i7 920 2.66 GHz does 11.5M c/s (total for 8 simultaneous processes, or 10.5M c/s for 4 processes) with JtR 1.7.6. Core i7 priced at around $250 (retail) might have become more cost-effective than Q6600/Q6700 (which got to $200 a year ago, but are becoming a bit harder to find now). Q9300 at $160+ and Q9550 are also good - should do 10M c/s to 11M c/s with 4 processes. Considering these numbers, Q9300 might be the current winner for c/s per $ (speaking of CPUs only), but things keep changing.

In des56salt.vhd, the final permutation can probably be dropped to free up some space (and maybe fit more cores per chip). Instead, it can be \"undone\" in software when loading the password hashes to be cracked (this is what JtR does).
braka (Sun, 17 Oct 2010):
Hi, could you estimate what the speed would be on new Xilinx chips? Like on spartan-6 family, or virtex-4?
What speed would be on this board: http://shop.trenz-electronic.de/catalog/productinfo.php?cPath=147&productsid=631

Thanks!
Simon (Thu, 21 Jul 2011):
Hi, seem to be evolving. There is a post of a firm named SciEngines how continue the development of COPACOBANA, they aim to break DES in day. And they have 128 Spartan-6 in their RIVYERA. Have U thought of porting your code to their platform ?