Format Preserving Encryption (FPE) – Usage on GCP

Online data security has always been important, but never more so than now. With more and more of our data being stored on the cloud, users need to look for the best security solutions to ensure their confidential information is secure. While all parts of online data security are necessary to secure data, arguably the most important portion is data encryption.

This is why more and more cloud services are using a type of encryption called Format-Preserving Encryption.

What is Format-Preserving Encryption?

If your company has multiple 16-digit credit card numbers stored in a database, but the encrypted ciphertext needs to be 16-digits as well after encryption, this is where Format-Preserving Encryption [FPE] comes in.FPE encrypts plaintext that is a certain length and produces a ciphertext that is the same length as the plaintext and uses the same set of values as the plaintext. Using the previous example of a 16-digit credit card number with a plaintext of 1483920193402918, the ciphertext created with FPE could produce an output of 1483666666662918.

By using FPE, you can see that the ciphertext and plaintext are the same length and only use numerical values for encryption. One cloud provider that lets users implement FPE in their encryption is Google Cloud.

Format-Preserving Encryption in Google Cloud

Google Cloud gives users access to a de-identification technique called pseudonymization. Pseudonymization is a technique that replaces sensitive data with cryptographically generated tokens. Google Cloud supports three different pseudonymization techniques:

Deterministic encryption using AES-SIV
Format-Preserving Encryption
Cryptographic hashing

All three techniques use cryptographic keys for data transformation, but we will focus on the Format-Preserving Encryption.
Google Cloud uses a type of FPE called FPE-FFX. FFX focuses on two different FPE methods,FF1 and FF3, to encrypt data.At the time of writing this, FF1 is the only method currently supported for encryption. FF2 did not make it to publication at the time of FFX’s creation. FF2 and FF3 derivations are being resubmitted, but after a cryptanalytic attack in 2017, FF3 was considered to be too insecure.

FFX uses multiple rounds of a Feistel function on the plaintext, along with the use of a key, to create the ciphertext. A Feistel function splits the plaintext into two parts and does a permutation each round on each half of the plaintext, and then swaps the left half of text to the right and vice versa. The FF1 method uses 10 rounds of the Feistel function, and FF3 uses 8 rounds of the Feistel function. FPE-FFX has several steps necessary to encrypt data.To begin encryption, the alphabet being used to de-identify the data must be specified in one of three ways:

Using one of four values that represent the most common character sets/alphabets
Using a radix value specifying the size of the alphabet. Specifying 2 gives an alphabet consisting of the numbers 0 and 1, while specifying 95 gives an alphabet with all numeric, upper-case alpha, lower-case alpha, and symbol characters
By building an alphabet containing the exact characters to be used

When using FPE-FFX in Google Cloud, the data is encrypted as previously described, but can also be prepended with a surrogate annotation, resulting in a final token. The token takes the following form when a surrogate annotation is included: surrogate_infotype(surrogate_length): surrogate_value. The surrogate annotation is surrogate_infotype(surrogate_length). The infotype is defined by the user and the surrogate value is the resulting ciphertext. If no surrogate annotation is specified, then the final token is just the surrogate value. To re-identify unstructured data, the full token, including a surrogate annotation, is necessary, while structured data only needs the surrogate value.

Conclusion

Format preserving encryption is extremely important for users who wish to keep the ciphertext after encryption as the same length as the plaintext. Of the several different FPE-FFX methods used on Google Cloud, FF1 is the best practice method to use, due to the extra rounds of the Feistel function it goes through.

Structured data requires a surrogate annotation be prepended on the ciphertext to allow for re-identification of data. Google Cloud has a strong implementation of FPE in place for customer use. For those in need of same length plaintext and ciphertext, Google Cloud’s FPE-FFX is their best choice.