AWS News Blog

Additional At-Rest and In-Transit Encryption Options for Amazon EMR

Our customers use Amazon EMR (including Apache Hadoop and the full range of tools that make up the Apache Spark ecosystem) to handle many types of mission-critical big data use cases. For example:

  • Yelp processes over a terabyte of log files and photos every day.
  • Expedia processes streams of clickstream, user interaction, and supply data.
  • FINRA analyzes billions of brokerage transaction records daily.
  • DataXu evaluates 30 trillion ad opportunities monthly.

Because customers like these (see our big data use cases for many others) are processing data that is mission-critical and often sensitive, they need to keep it safe and sound.

We already offer several data encryption options for EMR including server and client side encryption for Amazon S3 with EMRFS and Transparent Data Encryption for HDFS. While these solutions do a good job of protecting data at rest, they do not address data stored in temporary files or data that is in flight, moving between job steps. Each of these encryption options must be individually enabled and configured, making the process of implementing encryption more tedious that it need be.

It is time to change this!

New Encryption Support
Today we are launch a new, comprehensive encryption solution for EMR. You can now easily enable at-rest and in-transit encryption for Apache Spark, Apache Tez, and Hadoop MapReduce on EMR.

The at-rest encryption addresses the following types of storage:

  • Data stored in S3 via EMRFS.
  • Data stored in the local file system of each node.
  • Data stored on the cluster using HDFS.

The in-transit encryption makes use of the open-source encryption features native to the following frameworks:

  • Apache Spark
  • Apache Tez
  • Apache Hadoop MapReduce

This new feature can be configured using an Amazon EMR security configuration.  You can create a configuration from the EMR Console, the EMR CLI, or via the EMR API.

The EMR Console now includes a list of security configurations:

Click on Create to make a new one:

Enter a name, and then choose the desired mode and type for each aspect of this new feature. Based on the mode or the type, the console will prompt you for additional information.

S3 Encryption:

Local disk encryption:

In-transit encryption:

If you choose PEM as the certificate provider type, you will need to enter the S3 location of a ZIP file that contains the PEM file(s) that you want to use for encryption. If you choose Custom, you will need to enter the S3 location of a JAR file and the class name of the custom certificate provider.

After you make all of your choices and click on Create, your security configuration will appear in the console:

You can then specify the configuration when you create a new EMR Cluster. This feature is available for clusters that are running Amazon EMR release 4.8.0 or 5.0.0. To learn more, read about Amazon EMR Encryption with Security Configurations.

Jeff;

Jeff Barr

Jeff Barr

Jeff Barr is Chief Evangelist for AWS. He started this blog in 2004 and has been writing posts just about non-stop ever since.