The Crypto Gardening Guide and Planting Tips
Peter Gutmann, email@example.com
"Looking at all of the security protocols deployed in the last 10 years, you'd be forgiven for thinking that the only developments in crypto during that time (beyond basic algorithms) were HMAC and SPEKE"
There has been a great deal of difficulty experienced in getting research performed by cryptographers in the last decade or so (beyond basic algorithms such as SHA and AES) applied in practice. The reason for this is that cryptographers don't work on things that implementors need because it's not cool, and implementors don't use what cryptographers design because it's not useful or sufficiently aligned with real-world considerations to be practical. As a result, security standards are being created with mechanisms that have had little or no security analysis, often homebrew mechanisms or the standards editor's pet scheme. The problem is a lack of communication: Cryptographers often don't seem aware of the real-world constraints that their design will need to work within in order to be successfully deployed. The intent of this document is to cover some of those real-world constraints for cryptographers, to point out problems that their designs will run into when attempts are made to deploy them. Also included is a motivational list of extremely uncool problems that implementors have been building ad-hoc solutions for since no formal ones exist.
Question A: Does your design require more than one pass over the data?
Discussion: Designs with features such as a MAC or signature placed at the start of the data or a key identifier or IV at the end of the data require two-pass processing. Buffering the whole message is impractical in the absence of infinite amounts of infinitely cheap memory - some crypto hardware only has a kB or two of memory available for each encrypted channel, sometimes even less than that.
Resolution: Place all information needed before processing data at the start of the message, and all information available after processing data at the end of the message. As a corollary, order fields in the order in which they are used (there is nothing more awkward than an implementation that has to jump around in a message to gather data scattered at random). If in doubt, consider whether your design can be implemented on a system with a total of 1kB of memory, or alternatively whether it can process a 1GB data block on a machine with 128MB of memory.
See also: Question B.
Question B: Does your design force sequential processing?
Discussion: Most hardware performs crypto operations such as encryption and MAC'ing in parallel. A design that prevents MAC'ing of the message from commencing before the crypto operation has completed, or vice versa, more than halves throughput.
Resolution: Ensure that your design allows crypto operations such as encryption and MAC'ing to be performed in parallel.
See also: Question A, Question C.
Question C: Does your design require more than one continuous crypto operation per message?
Discussion: Crypto implementations are designed to stream data as quickly as possible, but typically have a high latency for initialising transactions. Starting a new transaction can be one to two orders of magnitude more expensive than continuing an existing operation, even worse for some older crypto hardware. In other words, an IV pre-processing step that requires a separate crypto operation can result in a 100-fold slowdown in processing.
Resolution: Ensure that your design performs only a single continuous crypto operation for each message block.
See also: Question D.
Question D: Does your design require out-of-band processing of additional data alongside the payload?
Discussion: Consider an operation that streams encrypted data through a system. If additional out-of-band data needs to be processed separately, this requires two distinct crypto operations. If the data is processed in-band, it leaves a gap between the end of the previous payload and the start of the current payload once the out-of-band data has been removed. Combining the disjoint segments without manual copying requires hardware scatter/gather support, which is unlikely to be present.
Resolution: Placing out-of-band data at the end of the payload allows it to be trivially skipped by overwriting it with the next message segment that arrives, with no extra overhead incurred when copying the data around.
See also: Question C.
Question E: Does your design require reloading keys in the middle of an operation?
Discussion: Reloading keys works for key-agile algorithms and software implementations, but in some implementations the cryptlogic can only hold one key at a time. Requiring a key reload/schedule in the middle of an operation will result in a severe performance hit.
Resolution: Avoid designs that require keys to be reloaded in the middle of an operation.
See also: Question C, Question F.
Question F: Does your design require reloading IVs in the middle of an operation?
Discssion: Some crypto APIs, examples being JCE and PKCS #11, don't allow IVs to be changed in the middle of an operation, requiring a complete reinitialisation of the cipher to change the IV.
Resolution: Avoid designs that require IVs to be reloaded in the middle of an operation.
See also: Question E, Question C.
Question G: How concrete is the description of your design?
Discussion: Crypto designs are often described as mathematical abstractions that, while easy to work with mathematically, require a significant amount of work to translate into an actual implementation. For example replying to a request for a PRF with "Use HMAC" is of no use to an implementor because it tells them almost nothing about how to solve the problem. What do they do if they want more than the HMAC blocksize worth of output? How is the salt handled? How is iteration of the PRF to defeat dictionary attacks handled? To illustrate some of the more obscure problems that can arise, what about diversifiers to ensure that the same input for two different encryption algorithms doesn't yield a key for the short-keylength algorithm which is a substring of the key for the long-keylength algorithm (yes, there are protocols that derive keys for different algorithms from the same keying material)?
Resolution: Ensure that your design, alongside the mathematical abstraction, contains enough concrete details that an implementation can be built from it. By extension, if your description is too abstract, implementors won't even read it, let alone try to figure out how to do something with it.
See also: Question H.
Question H: How flexible is your design?
Discussion: That question isn't what you think. A better way of phrasing it would be "How ambiguous is your design?". Flexibility in an abstract crypto design is a Good Thing. Ambiguity in a specification is a Bad Thing. Unfortunately, a cryptographer's flexibility is an implementor's ambiguity, or more bluntly an implementor's nightmare. An example of this is IPsec's IKE, which is so flexible/ambiguous that no two people can agree on what it should look like. As a result, even after years of work, there are still implementations that can't (or barely) interoperate, and even when they interoperate it's often only because implementors figured out what the other side was doing and adapted their code to match it.
Resolution: Once you've impressed everyone with the power and flexibility of your design, provide a sketch of a simple, straightforward, easy-to-get-right profile that implementors can work with. This is a standard feature of protocol specifications, either done explicitly (MUST/SHOULD/MAY) or implicitly when everyone ignores all but the most simple, straightforward part of the specification. Another way of looking at this is that if implementors are going to ignore much of your design in order to make implementation practical, you want to be the one deciding which bits get used and which don't.
See also: Question G.
Question I: How big a problem are you really solving?
Discussion: Many problems pointed out in crypto papers are relatively insignificant to non-cryptographers, or can be fixed with a trivial update of existing code rather than by changing the crypto design. For example, the "correct" solution to various attacks (real and theoretical) on PKCS #1 v1.5 padding is for implementors to switch to something better such as OAEP, Simple RSA, PSS, or whatever they're wearing in Santa Barbara this year. However, since the problem can also be resolved with "Don't do that, then", it's easier to stick with an existing solution rather than re-engineering everything to use a new protocol (see the Final Thoughts for a longer discussion on this).
Resolution: Unlike cryptographers, implementors probably won't appreciate the advantages of a design secure in the IND-CCAn+1 model where the previous was only IND-CCAn if it requires a complete redeployment of all of their products. Don't expect to see a new design widely adopted any time soon unless (a) it's being deployed in a greenfields development or (b) you've found a hole exploitable in O(1) time by an army of script kiddies.
See also: Question J, Final Thoughts
Question J: How big a change to existing systems does your design require? Discussion: A design requiring heavy amounts of re-engineering of deployed hardware and software is unlikely to be adopted in a hurry unless it fixes a major security problem being actively exploited, with an accompany media circus to get management buy-in. Even if it's reasonably important, it can take a significant amount of time to get it deployed. A typical timeline to deployment might be:
Resolution: Don't expect your design to take over the world. Even if it's a significant contribution, it will take years to see widespread deployment.
See also: Question I, Final Thoughts
Question K: How many different cryptographic algorithms/mechanisms does your design require?
Discussion: Many implementations only have a very limited number of algorithms and types of mechanism available for use. If you can create a design that works when the only crypto mechanism present is DES or 3DES/CBC (even the most limited smart card does DES/CBC), you have a truly universal design that can be implemented with anything. If your design requires a 128-bit block cipher combined with an HMAC implementation, you have a crypto conference paper.
Resolution: Create your design so that it can work with the least number of lowest-common-denominator algorithms. An IND-CPA design will *always* beat an IND-CCA2 design if the latter requires several uncommon algorithms or mechanisms in order to work, so that it can't be used with most of the implementations out there.
Question L: Does your design include test vectors?
Discussion: Most crypto designs are, by their very nature, quite complex, and getting a single bit wrong at any stage completely changes the outcome of the protocol. Test vectors for the various stages of the process, and for the final result, are essential in order to allow implementors to create a conforming implementation of your design. There is nothing more awkward to debug than a design calling for a 15-stage handshake requiring 2,500 lines of code, at the end of which the protocol merely reports "Yes" or "No".
In the absence of test vectors, it's up to implementors to fight it out over whose version is correct. In some cases, bugs in widely-deployed implementations have had to be reverse-engineered back into the design because it was easier to change the design than to get new implementations out.
Resolution: Include test vectors for the various stages of the process, and the final result (if nothing else, it proves that you've managed to implement the design yourself). Then get someone else to do an implementation from the design document and see if they come up with the same results.
A Smith and Wesson beats four aces
No matter how cool/interesting/useful/mandated in standards a new design is, it won't be used if it requires redeployment of all existing hardware and software for little apparent gain. Two illustrative examples of this are X9.42 DH and OAEP with AES.
An Internet standard RFC required that implementors support X9.42 DH key agreement, and provided RSA as an option (in IETF terms, "MUST X9.42, MAY RSA"). However, no existing software supported X9.42, no CAs would issue certificates for it, even if they did no-one wanted to renew all of their certificates ($$$) for an algorithm that provided no advantages over RSA, and no hardware (either crypto accelerators or smart cards) supported it (there was some token support after a few years, although even now there are problems being found with the X9.42 test vectors which indicate that no-one has really looked at them). As a result, even though the standard mandated use of X9.42, everyone treated it as if it said "MUST RSA, SHOULD NOT X9.42", pretending to do X9.42 while running business as usual with RSA.
OAEP is in a similar situation. In order to get it usefully deployed, it would be necessary to issue certificates that would only work with OAEP otherwise everyone would just keep using them with PKCS #1 v1.5. This would have exactly the same effect as X9.42.
An attempt was made to force OAEP through by tying it to AES. In other words, in order to use AES you also had to use OAEP. Apart from the list of issues already mentioned with X9.42 above, this was also going to be deployed in an area where it was necessary to use crypto hardware for performance reasons (this area is almost always glossed over in crypto designs). Most crypto hardware does RSA in hardware and either leaves the symmetric crypto for software to do (cheaper hardware) or leaves further key management to software and provides hardware acceleration for bulk data encryption once all further key processing has been performed (more expensive hardware). In addition a few rather specialised crypto engines do everything in hardware.
In all cases except the most primitive accelerators that consist of nothing but a bignum engine, moving to OAEP would require replacing the crypto hardware. Crypto hardware boxes typically cost $10,000-$20,000 each, assuming you can find one that supports OAEP. A server farm needs a great many of these $20,000 boxes. Smaller devices like smart cards may only cost $10-$20 each, but then you've got 100,000 of them to replace. Some devices can upload new firmware, but will zeroise their keys when this occurs for security reasons, requiring a complete redeployment from scratch of all keys and certificates ($$$).
As a result, a vendor would have two options: Supply a non-compliant solution that uses AES with PKCS #1 v1.5, or supply a compliant solution that uses AES with OAEP and requires rebuilding the entire infrastructure (hardware, software, and certificates) from the ground up.
The standards group abandoned the idea of tying OAEP to AES ("It's X9.42 all over again").
Nathan Bedford Forrest's motto for success was "Get there first with the most men". In crypto it's "Get there first with something adequate", because once you've got that in place, nothing will be able to displace it.
(Note: If you're in the media or telecoms industry this becomes "Get there first with something patented, proprietary, and broken, then send lawyers after anyone who points out problems", but this is a special case).
Problems that Need Solving * A universal PRF (before you say "Use HMAC", see Question G). Currently every single security protocol in active use has its own incompatible PRF, often an ad hoc design with no real security analysis. Implementors really, really, really hate having to solve the exact same problem ten times in a row, with a huge amount of interoperability testing required to ensure that all of the implementations are performing as required. There is a skunkworks project currently working on a black-box plug-in PRF suitable for use in any situation.
* A symmetric key wrap function capable of wrapping a key for a symmetric encryption algorithm using another symmetric algorithm. There are various ad hoc solutions, many of which only work for a single algorithm or a small class of algorithms, but no standard solution.
* A key wrap function where the wrapping key is derived from a password. The requirements for this are subtly different from a straight symmetric key wrap in that the threat model is rather different. For example a symmetric key wrap may use HMAC to ensure non-malleability, but for password-based key wrap this makes a dictionary attack rather easier (throw passwords at the HMAC, sidestepping the encrypted key altogether). There exists a (ad-hoc) design that has rather limited non-malleability in order to avoid potential dictionary attacks.
* A combined MAC+encryption mode for block ciphers. The cool, efficient modes are tangled in patents and therefore unusable. The unpatented CCM requires that the message length be known beforehand (see Question A) which severely limits it usefulness.
The author would like to thank the members of the ietf-tls mailing list, Steve Henson, Trevor Perrin, and jetmarc for their input in the preparation of this document.