Securing the Cloud: Migrating to IMDSv2 for AWS Compute Workloads

Ayush Priya
CRED Engineering
Published in
8 min readJan 5, 2024

--

CRED has evolved to be a multi-product platform, and that necessitates the use of multiple microservices.

The microservices we have run primarily on AWS ECS (Elastic Container Service). Within our ecosystem, these microservices predominantly operate on AWS ECS (Elastic Container Service). Additionally, we leverage EC2 instances and EKS for specific workloads that demand enhanced resource management capabilities. Among all of these resources, there is a common factor, the Instance Metadata Service (IMDS) endpoint.

In this blog, we will detail what is IMDS and how to secure your workloads running on these various compute services with the correct IMDS configuration.

The Instance Metadata Service (IMDS)

The Instance Metadata Service (IMDS) in AWS is a service that provides information about an Amazon Elastic Compute Cloud (EC2) instance available at 169.254.169.254. It allows EC2 instances to access metadata about themselves, such as instance ID, IP address, security group information, and most notoriously the IAM credentials provided to the instance via the EC2 Instance Profile Role.

Fetching IAM Instance Profile Credentials from the IMDS endpoint
IMDSv1 Endpoint Request-Response Flow

The IMDS endpoint is a nifty way of getting information about the instance, quickly and reliably. This can be used by applications running within the instance, eliminating the need to fetch it from external sources causing an engineering overhead. An example of this information could be, you want to dynamically add the host IP for an application in the logs. This host IP can be fetched from the IMDS endpoint dynamically, instead of having to hardcode it or fetch it from external sources.

Why did we decide to migrate to IMDSv2?

SSRF (Server-Side Request Forgery) attacks against IMDSv1 (Instance Metadata Service version 1) in AWS are a major attack vector that has led to several breaches across various organizations.

If any application running on an EC2 instance was vulnerable to SSRF attacks, attackers could exfiltrate the IAM credentials, through the IMDSv1 endpoint, granted to the instance via IAM Instance Profile Role. These credentials are meant to provide the instance and applications the permissions to perform actions on AWS resources.

With the stolen temporary security credentials, attackers could elevate their privileges and gain unauthorized access to other AWS resources, such as S3 buckets, and databases, or even launch further attacks within the victim’s environment.

This breach of security can have severe consequences, as unauthorized access to sensitive data or control over critical infrastructure components could lead to data theft, service disruption, or further compromise of the entire AWS infrastructure.

SSRF Attack Flow

In response to these vulnerabilities, AWS strongly recommended that customers migrate to the newer IMDSv2, which provides enhanced security features, including better protection against SSRF attacks.

IMDSv2 introduces an extra layer of authentication and validation for metadata requests, making it significantly more challenging for attackers to exploit SSRF vulnerabilities.

AWS came up with IMDSv2 to improve security for the IMDS endpoint by implementing a small change in the way IMDS was accessed — instead of allowing access without authentication, in IMDSv2 AWS enforced fetching of a token which would be used to authenticate all requests made to the IMDS endpoint.

IMDSv2 Endpoint Session-based Flow

This seemingly small change has a huge impact hence we decided to migrate all our workloads to IMDSv2.

  1. The token itself is not stored in the IMDS endpoint so there is no way to retrieve it once generated for a particular session.
  2. The token itself only works when used from within the instance it was generated in, thus mitigating its exfiltration and abuse.
  3. AWS SDKs and CLI are configured to fetch this token automatically and keep it stored in memory for the session’s validity, before refreshing it once it expires.

Note: Discussing the issues present with IMDSv1 and how IMDSv2 mitigated them goes beyond the scope of this blog. If you would like to read more details about the same, you can check out this blog from AWS Security.

How did we migrate?

We wanted to ensure that no service disruptions happen because of the migration, so we:

  1. Did extensive research to identify all issues that could surprise us
  2. Solved edge cases that came up during our migration
  3. Developed a tool that was a one-stop-solution to do all things required as part of the migration process

We’ll take a look at the details of each step we took in the section below.

Planning, Researching, and Working with Caveats

The 5 Phases of Migration

Phased Rollout for the Migration

The first (most obvious) decision we took was to roll out the migration in a phased manner to our various AWS accounts i.e. environments.

We targeted environments that do not have complex deployments and/or critical services. We created a list of all our AWS accounts and sorted them based on their criticality or the nature of services housed within them to identify which accounts to target first.

Gaining Visibility on IMDSV1 Usage

Next, we wanted to verify what services were using the IMDSv1 endpoint because if no instance was calling the IMDSv1 endpoint, we can migrate all our workloads right away without any service disruptions.

AWS has created a CloudWatch Metric, named MetadataNoToken, to provide deeper visibility over the current usage of the IMDSv1 endpoint, and to facilitate the migration to IMDSv2.

Visualising IMDSv1 Usage with `MetadataNoToken` CloudWatch Metric

AWS has recently released IMDS Packet Analyzer that can be used to identify exactly which process within an EC2 instance is making calls to the IMDSv1 endpoint. More details on the same can be found in this article.

Attaining 100% Coverage

The process of migrating an EC2 instance is extremely straightforward and can be achieved with a single-line AWS CLI command:

~$ aws ec2 modify-instance-metadata-options \
--instance-id i-1234567898abcdef0 \
--http-tokens required \
--http-endpoint enabled

Seems simple but therein lies the first catch we found — what happens if the instances are part of an Auto-Scaling Group? In this situation, any new additions to the fleet of instances would not be launched with IMDSv2. This issue can easily be solved by updating the underlying launch template but this observation led us to identify other AWS services with similar caveats.

We put all services into 3 groups:

  1. EC2-based Services - EC2, ECS & EKS with EC2 Networking, etc.
  2. Non-EC2 Services - Lightsail, Sagemaker
  3. EC2-based Service with caveats - Instances launched via Auto Scaling Groups & Launch Templates/Configurations
Service and their Groupings

So to migrate all relevant resources we needed to:

  1. Define the updated configuration for the specific resource keeping in mind any associated caveats
  2. Iterate over all resources and regions to update the configuration to use IMDSv2

Breaking Containers and Fixing them

During a test rollout of IMDSv2, we observed certain workloads broke because they satisfied at least one of the following criteria:

  1. The EC2 instance had an app deployed as a Docker container (for example, ECS + EC2 Networking)
  2. There was a third-party software that did not support IMDSv2, running in the instance

For third-party software, in most cases, patches have already been released by vendors so we simply upgraded to the version that supported IMDSv2.

You can refer to this list to see which vendors/integrations still do not support IMDSv2.

For container workloads, we realised that the disruption was due to the Response Hop Limit (number of hops required by a packet to reach a node) that is configured to 1 by default, as the response from the IMDSv2 endpoint for a PUT request would not reach back to the process that initiated the request.

Issues with Default Response Hop-Limit set to 1

The hop limit configuration can be upgraded as part of the same modify-instance-metadata-options command by adding another option as mentioned below:

~$ aws ec2 modify-instance-metadata-options \
--instance-id i-1234567898abcdef0 \
--http-tokens required \
--http-put-response-hop-limit 2 \
--http-endpoint enabled
Hop-limit/TTL Update for PUT Request in IMDS Configuration

And with that, we were good to go ahead with the migration, covering all potential issues and caveats! Well, all but one, which we’ll talk about next.

Remediating Post-Migration Blues!

After the migration, a single instance launched with IMDSv1 configuration would nullify the migration process. So we had to plan to ensure that we did not have to face any post-migration blues by putting preventive controls in place to deal with it.

Some ways to implement the required security control could be as follows:

  1. Deploy some automation that would trigger when a resource was launched, check the IMDS configuration, and update it to make use of IMDSv2, if not configured already
  2. Use an IAM Policy, permission boundary or SCP with a condition block that denies the creation of resources that do not have IMDSv2 configured
{
"Version": "2012-10-17",
"Statement": {
"Sid": "RequireImdsV2",
"Effect": "Deny",
"Action": "ec2:RunInstances",
"Resource": "arn:aws:ec2:*:*:instance/*",
"Condition": {
"StringNotEquals": {
"ec2:MetadataHttpTokens": "required"
}
}
}
}

We recommend using SCPs as it would allow the implementation and management of policies centrally for all AWS accounts.

Additional examples of SCPs which can be used as part of post-migration controls can be found here.

Impact of the migration

  1. Safeguard against financial or reputational loss due to data breaches
  2. Protection against SSRF attacks
  3. Extensive reduction of the blast radius in case of a compromise
  4. Safeguard against abuse of EC2 instance profile’s IAM credentials
  5. Protection from attacks against Open Web Application Firewalls
  6. Protection from attacks against Layer 3 Firewalls
  7. Protection from attacks against Open Reverse Proxies

Wrap-up insights

Migrating to IMDSv2 is not a very complex activity from an implementational perspective, but the impact is huge. AWS has several support mechanisms to facilitate an easy migration process — IMDSv2 compatible SDKs out-of-the-box and IMDSv2 default configuration for Amazon Linux 3-based EC2 instances.

At CRED, we believe in a security-first culture -IMDS migration is one such example of how we constantly work towards strengthening the security of our infrastructure and apps through tools and processes and strive to ensure the safety and privacy of our users’ data.

We would love to hear from you, do share your feedback and questions if any.

--

--