Using a provisioning script in S3 with Terraform

Lately I’ve been trying to learn Terraform so I can create and manage cloud resources in a more repeatable and less error-prone way than futzing around manually in the AWS console. Terraform focuses on creating infrastructure according to a given description of the desired end state, and though it has facilities to run scripts on your compute instances after they’re created, this doesn’t seem to be encouraged. Instead, you can pass a shell script (or other supported cloud-init directives) that will be executed when the compute instance starts for the first time.

Because there are typically some limitations on what can be passed in a cloud-init script (most notably a size restriction), I wanted to make my cloud-init script as minimal as possible, and put the bulk of the configuration/bootstrapping code in a script fetched from S3. This turned out to be a relatively straightforward process once I figured out the plethora of AWS objects that needed to be created.

In broad strokes, I needed to:

create a private S3 bucket to store the provisioning script
actually place the provisioning script in the bucket
create an IAM role that allows read access to the bucket, and an instance profile associated with that role
assign that instance profile to the EC2 instance so it’s allowed to fetch the script from the bucket
have the instance execute a minimal bootstrapping script at first launch that installs the AWS command-line tools, fetches the script from S3, and executes it

All of this stuff can be expressed using regular Terraform resources!

Putting a provisioning script in an S3 bucket

This is pretty straightforward — first I created a bucket:

resource "aws_s3_bucket" "config_bucket" {
    bucket = "stacklight-config-20200611"
    server_side_encryption_configuration {
        rule {
            apply_server_side_encryption_by_default {
                sse_algorithm = "AES256"
            }
        }
    }
}

output "config_bucket_id" {
    value = aws_s3_bucket.config_bucket.id
}

Then placed a file in it:

resource "aws_s3_bucket_object" "swarm_init_script" {
    key = "stage/swarm_init.sh"
    bucket = data.terraform_remote_state.common.outputs.config_bucket_id
    source = "swarm_init.sh"
    etag = filemd5("swarm_init.sh")
}

The etag argument ensures that the file gets overwritten if the script contents change. The bucket argument here could have just been aws_s3_bucket.config_bucket.id if I’d created the bucket and the script in the same module, but I didn’t, because I figured I’ll probably reuse the config bucket for other files in the future, while this init script is specific to the module I’m working in.

Allowing access to the bucket

To do this, I needed a few different objects, starting with an IAM policy allowing read access to the bucket. I defined this policy in the same module as the bucket itself and then exported the policy’s ARN as an output:

resource "aws_iam_policy" "read_config_bucket" {
    name = "ReadConfigBucket"
    policy = <<-EOF
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "ListObjectsInBucket",
                "Effect": "Allow",
                "Action": ["s3:ListBucket"],
                "Resource": ["${aws_s3_bucket.config_bucket.arn}"]
            },
            {
                "Sid": "AllObjectActions",
                "Effect": "Allow",
                "Action": "s3:GetObject",
                "Resource": ["${aws_s3_bucket.config_bucket.arn}/*"]
            }
        ]
    }
    EOF
}

output "read_config_bucket_policy_arn" {
    value = aws_iam_policy.read_config_bucket.arn
}

Then I needed to create an IAM role to attach the policy to:

resource "aws_iam_role" "swarm_host_role" {
    name = "SwarmHostRole"
    assume_role_policy = <<-EOF
    {
        "Version": "2012-10-17",
        "Statement": [
            {
            "Action": "sts:AssumeRole",
            "Principal": {
                "Service": "ec2.amazonaws.com"
            },
            "Effect": "Allow",
            "Sid": ""
            }
        ]
    }
    EOF
}

It’s essential to specify that EC2 instances are allowed to assume the role; this wasn’t obvious to me at first.

Then, I needed to attach the policy to the role, via an IAM role policy attachment:

resource "aws_iam_role_policy_attachment" "swarm_host_read_config_bucket" {
    role = aws_iam_role.swarm_host_role.name
    policy_arn = data.terraform_remote_state.common.outputs.read_config_bucket_policy_arn
}

Finally, I needed to create an instance profile associated with the role:

resource "aws_iam_instance_profile" "swarm_host_profile" {
    name = "swarm_host_profile"
    role = aws_iam_role.swarm_host_role.name
}

With that out of the way, I was able to start working on creating the instance itself.

Fetching and executing the script

I wrote a minimal bootstrap script that installs the S3 CLI, retrieves the full init script, and executes it, then passed that script in the user_data variable when starting a new instance.

resource "aws_instance" "swarm_host" {
    instance_type = "t2.micro"
    ami = "ami-013de1b045799b282" # Ubuntu Server 20.04

[...]

    iam_instance_profile = aws_iam_instance_profile.swarm_host_profile.id

    user_data = <<-EOF
        #!/bin/bash
        set -eux

        sudo apt-get update
        sudo apt-get install -y unzip

        curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
        unzip awscliv2.zip
        sudo ./aws/install

        aws s3 cp "s3://${data.terraform_remote_state.common.outputs.config_bucket_id}/stage/swarm_init.sh" .
        chmod +x ./swarm_init.sh
        sudo ./swarm_init.sh
        EOF
}

And it all actually worked, after some debugging!

Postscript

As with most other things on this blog, I don’t really have any idea what I’m doing, so be sure to get a second opinion before running any of this in production yourself.
Man, that was a lot of nouns. It brought to mind this classic, especially this paragraph:

The main charm is that the architecture is there for all to see. Architecture is held in exceptionally high esteem by King Java, because architecture consists entirely of nouns. As we know, nouns are things, and things are prized beyond all actions in the Kingdom of Java. Architecture is made of things you can see and touch, things that tower over you imposingly, things that emit a satisfying clunk when you whack them with a stick. King Java dearly loves clunking noises; he draws immense satisfaction from kicking the wheels when he’s trying out a new horse-drawn coach. Whatever its flaws may be, the tale above does not want for things.

I guess piles and piles of nouns are kind of inevitable if you want to describe the state of something complex, rather than describing a series of steps taken to get to that state. At least for infrastructure, it seems like a good tradeoff to make; I certainly prefer the learning curve of figuring out the names of AWS nouns over dealing with a long string of imperative scripts mutating opaque state.

Notes

Using a provisioning script in S3 with Terraform

June 12, 2020

Putting a provisioning script in an S3 bucket

Allowing access to the bucket

Fetching and executing the script

Postscript