Part 3. How to Deploy Many Apps: Orchestration, VMs, Containers, and Serverless

Update, June 25, 2024: This blog post series is now also available as a book called Fundamentals of DevOps and Software Delivery: A hands-on guide to deploying and managing production software, published by O’Reilly Media!

This is Part 3 of the Fundamentals of DevOps and Software Delivery series. In Part 1 and Part 2, you deployed an app on a single server. This is a great way to learn and to get started, and for smaller and simpler apps, a single server may be all you ever need. However, for many production use cases, where you’re building a business that depends on that app, you may run into the following problems with just a single server:

Outages due to hardware issues: If the server has a hardware problem, such as the power supply dying, your users will experience an outage until you replace the server.
Outages due to software issues: If the app crashes due to a software problem, such as a bug in the code, your users will experience an outage until you manually restart it.
Outages due to load: If your app becomes popular enough, the load may exceed what a single server can handle, and your users will experience degraded performance, and potentially outages as well.
Outages due to deployments: If you want to roll out a new version of your app, it’s hard to do so with just a single server without at least a brief outage while you shut down the old version and replace it with the new version.

In short, a single copy of your app is a single point of failure. To run applications in production, you typically want multiple copies, called replicas, of your app. Moreover, you also want a way to manage those replicas: something that can automatically handle hardware issues, software issues, load issues, deployments, and so on. Although you could build your own solutions for deploying and managing replicas, it’s a tremendous amount of work, and there are tools out there that do it for you: these are called orchestration tools.

If you search around, you’ll quickly find that there are many orchestration tools out there, including Kubernetes, EKS, GKE, AKS, OpenShift, EC2, ECS, Marathon / Mesos, Nomad, AWS Lambda, Google Cloud Functions, Azure Serverless, Capistrano, Ansible, and many others. It seems like there’s a new, hot orchestration tool nearly every day. How do you keep track of them all? Which one should you use? How do these tools compare?

This Part will help you navigate the orchestration space by introducing you to the most common types of orchestration tools, which, broadly speaking, fall into the following four categories:

Server orchestration: e.g., use Ansible to deploy code onto a cluster of servers.
VM orchestration: e.g., deploy VMs into an Auto Scaling Group.
Container orchestration: e.g., deploy containers into a Kubernetes cluster.
Serverless orchestration: e.g., deploy functions using AWS Lambda.

You’ll work through examples where you deploy the same app using each of these approaches, which will let you see how different orchestration approaches perform across a variety of dimensions (e.g., rolling out updates, load balancing, auto scaling, auto healing, and so on), so that you can pick the right tool for the job.

Here’s what we’ll cover in this post:

An introduction to orchestration
Server orchestration
VM orchestration
Container orchestration
Serverless orchestration
Comparison of orchestration options

Let’s get started by understanding exactly what orchestration is, and why it’s important.

An Introduction to Orchestration

In the world of classical music, a conductor is responsible for orchestration: that is, they direct the orchestra, coordinating all the individual members to start or stop playing, to increase or decrease the tempo, to play quieter or louder, and so on. In the world of software, an orchestration tool is responsible for orchestration: they direct software clusters, coordinating all the individual apps to start or stop, to increase or decreases the hardware resources available to them, to increase or decrease the number of replicas, and so on.

These days, for many people, the term "orchestration" is associated with Kubernetes, but the underlying needs have been around since the first programmer ran the first app for others to use. Anyone running an app in production needs to solve most or all of the following core orchestration problems:

Deployment: You need a way to initially deploy one or more replicas of your app onto your servers.
Update strategies: After the initial deployment, you need a way to periodically roll out updates to all replicas of your app, and in most cases, you want a way to roll out those updates without your users experiencing downtime (known as a zero-downtime deployment).
Scheduling: For each deployment, you need to decide which apps should run on which servers, ensuring that each app gets the resources (CPU, memory, disk space) it needs. This is known as scheduling. With some orchestration tools, you do the scheduling yourself, manually; other orchestration tools provide a scheduler that can do it automatically, and this scheduler usually implements some sort of bin packing algorithm to try to use the resources available as efficiently as possible.
Rollback: If there is a problem when rolling out an update, you need a way to roll back all replicas to a previous version.
Auto scaling: As load goes up and down, you need a way to automatically scale your app up and down in response. This may include vertical scaling, where you scale the resources available to your existing servers up or down, such as getting faster CPUs, more memory, or bigger hard drives, as well as horizontal scaling, where you deploy more servers and/or more replicas of your app across your servers.
Auto healing: You need something to monitor your apps, detect if they are not healthy (i.e., the app is not responding correctly or at all), and to automatically respond to problems by restarting or replacing the app or server.
Configuration: If you have multiple environments (e.g., dev, stage, and prod), you need a way to be able to configure the app differently in each environment: e.g., use different domain names or different memory settings in each environment.
Secrets management: You may need a way to securely pass sensitive configuration data to your apps (e.g., passwords, API keys).
Load balancing: If you are running multiple replicas of your app, you may need a way to distribute traffic across all those replicas.
Service communication: If you are running multiple apps, you may need to give them a way to communicate with each other, including a way to find out how to connect to other apps (service discovery), and ways to control and monitor that communication, including authentication, authorization, encryption, error handling, observability, and so on (service mesh).
Disk management: If your app stores data on a local hard drive, then as you deploy replicas of your app to various servers, you need to find a way to ensure that the right hard drive is connected to the right servers.

Over the years, there have been dozens of different approaches to solving each of these problems. In the pre-cloud era, since every on-prem deployment was different, most companies wrote their own bespoke solutions, typically consisting of gluing together various scripts and tools to solve each problem. Nowadays, the industry is starting to standardize around four broad types of solutions:

Server orchestration: You have a pool of servers that you manage.
VM orchestration: Instead of managing servers directly, you manage VM images.
Container orchestration: Instead of managing servers directly, you manage containers.
Serverless orchestration: You no longer think about servers at all, and just focus on managing apps, or even individual functions.

We’ll discuss each of these in turn in the next several sections, starting with server orchestration.

Server Orchestration

The original approach used in the pre-cloud era, and one that, for better or worse, is still fairly common today, is to do the following:

Set up a bunch of servers.
Deploy your apps across the servers.
When you need to roll out changes, update the servers in place.

I’ve seen companies use a variety of tools for implementing this approach, including configuration management tools (e.g., Ansible, Chef, Puppet), specialized deployment scripts (e.g., Capistrano, Deployer, Mina, Fabric, Shipit), and, perhaps the most common approach, thousands and thousands of ad hoc scripts.

Because this approach pre-dates the cloud era, it also predates most attempts at creating standardized tooling for it (which is why there are so many different tools for it), and I’m not aware of any single, commonly accepted name for it. Most people would just refer to it as "deployment tooling," as deployment was the primary focus (as opposed to auto scaling, auto healing, service discovery, etc.). For the purposes of this blog post series, I’ll refer to it as server orchestration, to disambiguate it from the newer orchestration approaches you’ll see later, such as VM and container orchestration.

Key takeaway #1

Server orchestration is an older, mutable infrastructure approach where you have a fixed set of servers that you maintain and update in place.

To get a feel for server orchestration, let’s use Ansible. In Part 2, you saw how to deploy a single EC2 instance using Ansible. In this post, you’ll first use Ansible to deploy multiple servers, and once you have several servers to work with, you’ll be able to see what server orchestration looks like in practice.

Example: Deploy Multiple Servers in AWS Using Ansible

Example Code

As a reminder, you can find all the code examples in the blog post series’s sample code repo in GitHub.

The first thing you need for server orchestration is a bunch of servers. If you have existing servers you can use—e.g., several physical servers on-prem or several virtual servers in the cloud—and you have SSH access to those servers, you can skip this section, and go to the next one.

If you don’t have servers you can use, this section will show you how to deploy several EC2 instances using Ansible. As mentioned in Part 2, deploying and managing servers (hardware) is not really what configuration management tools were designed to do, but for learning and testing, Ansible is good enough. Note that the way you’ll use Ansible to deploy multiple EC2 instances in this section is meant to showcase server orchestration in its canonical form, with a fixed set of servers, and not the idiomatic approach for running multiple servers in the cloud; you’ll see the more idiomatic approach later in this blog post, in the VM orchestration section.

The blog post series’s sample code repo in GitHub contains an Ansible playbook called create_ec2_instances_playbook.yml (note the "s" in "instances," implying multiple instances, unlike the playbook from Part 2) in the ch3/ansible folder that can do the following:

Prompt you for several input variables:
- num_instances: How many EC2 instances to create.
- base_name: What to name all the resources created by this playbook.
- http_port: What port the instances should listen on for HTTP requests.
Create multiple EC2 instances, each with the Ansible tag set to base_name.
Create a security group for the instances which opens up port 22 (for SSH access) and http_port (for HTTP access).
Create an EC2 Key Pair you can use to connect to those instances via SSH.

To use this playbook, git clone the sample code repo, if you haven’t already (if you are new to Git, check out the Git tutorial in Part 4):

$ git clone https://github.com/brikis98/devops-book.git

This will check out the sample code into the devops-book folder. Next, head into the fundamentals-of-devops folder you created in Part 1 to work through the examples in this blog post series, and create a new ch3/ansible subfolder:

$ cd fundamentals-of-devops

$ mkdir -p ch3/ansible

$ cd ch3/ansible

Copy create_ec2_instances_playbook.yml from the devops-book folder into ch3/ansible:

$ cp -r ../../devops-book/ch3/ansible/create_ec2_instances_playbook.yml .

To run this playbook, make sure Ansible is installed, authenticate to AWS as described in Authenticating to AWS on the command line, and run ansible-playbook as before. Ansible will start to interactively prompt you for input variables:

$ ansible-playbook -v create_ec2_instances_playbook.yml

How many instances to create?: 3

What to use as the base name for resources?: sample_app_instances

What port to use for HTTP requests?: 8080

You can enter the values interactively and hit Enter, or, alternatively, you can define the variables in a YAML file, such as the sample-app-vars.yml file shown in Example 26:

Example 26. Variables file to create EC2 instances for the sample app (ch3/ansible/sample-app-vars.yml)

num_instances: 3

base_name: sample_app_instances

http_port: 8080

You can then use the --extra-vars flag to pass this variables file to the ansible-playbook command:

$ ansible-playbook \

  -v create_ec2_instances_playbook.yml \

  --extra-vars "@sample-app-vars.yml"

This will create three empty servers that you can configure and manage as you wish. It’s a great playground to get a sense for server orchestration. As a first step, let’s improve the security and reliability of your app deployments, as discussed next.

Example: Deploy an App Securely and Reliably Using Ansible

As explained in Watch out for snakes: these examples have several problems, the code used to deploy apps in the previous blog posts had a number of concerns related to security and reliability issues: e.g., running the app as a root user, listening on port 80, no automatic app restart in case of crashes, and so on. It’s time to fix these issues and get this code a bit closer to something you could use in production.

First, just as in Section 2.3.2, you need to tell Ansible what servers you want to configure. You do this using either an inventory file or, if you deployed servers in the cloud, such as the EC2 instances in the previous section, you can use an inventory plugin, as shown in Example 27, to discover your servers automatically.

Example 27. Inventory plugin to discover EC2 instances automatically (ch3/ansible/inventory.aws_ec2.yml)

plugin: amazon.aws.aws_ec2

regions:

  - us-east-2

keyed_groups:

  - key: tags.Ansible

leading_separator: ''

Just as in the previous blog post, this inventory file will create groups based on the Ansible tag. If you used the playbook in the previous section, that tag will be set to the value you entered for the base_name variable. In the preceding section, I used "sample_app_instances" as the base_name, so that’s what the group will be called. You’ll need to configure group variables for this group by creating a YAML file with the name of the group in the group_vars folder. So that will be group_vars/sample_app_instances.yml, as shown in Example 28:

Example 28. Configure group variables for your sample app servers (ch3/ansible/group_vars/sample_app_instances.yml)

ansible_user: ec2-user

ansible_ssh_private_key_file: ansible-ch3.key

ansible_host_key_checking: false

This file configures the user, private key, and host key checking settings for the sample_app_instances group. Now you can use a playbook to configure the servers in this group to run the Node.js sample app. Create a new playbook called configure_sample_app_playbook.yml, with the contents shown in Example 29:

Example 29. A playbook for configuring the servers to run the Node.js sample app (ch3/ansible/configure_sample_app_playbook.yml)

- name: Configure servers to run the sample-app

  hosts: sample_app_instances (1)

  gather_facts: true

  become: true

  roles:

    - role: nodejs-app        (2)

    - role: sample-app        (3)

      become_user: app-user   (4)

Here’s what this playbook does:

1	Target the `sample_app_instances` group you just configured in your inventory.
2	Instead of a single `sample-app` role that does everything, as you saw in Part 2, the code in this blog post uses two roles. The first role, called `nodejs-app`, is responsible for configuring a server to run Node.js apps. You’ll see the code for this role shortly.
3	The second role is called `sample-app`, and it’s responsible for running the sample app. You’ll see the code for this role shortly as well.
4	The `sample-app` role will be executed as the OS user `app-user`, which is a user that the `nodejs-app` role creates, rather than as the root user.

Create just a single file and folder for the nodejs-app role, roles/nodejs-app/tasks/main.yml:

roles

  └── nodejs-app

      └── tasks

          └── main.yml

Put the code shown in Example 30 into tasks/main.yml:

Example 30. The tasks of the nodejs-app role (ch3/ansible/roles/nodejs-app/tasks/main.yml)

- name: Add Node packages to yum (1)

  shell: curl -fsSL https://rpm.nodesource.com/setup_21.x | bash -



- name: Install Node.js

  yum:

    name: nodejs



- name: Create app user          (2)

  user:

    name: app-user



- name: Install pm2              (3)

  npm:

    name: pm2

    version: latest

    global: true



- name: Configure pm2 to run at startup as the app user

  shell: eval "$(sudo su app-user bash -c 'pm2 startup' | tail -n1)"

Here’s what this role does:

1	Install Node.js, just as you’ve seen before.
2	Create a new OS user called `app-user`. This allows you to run your apps with a user with more limited permissions than root.
3	Install PM2 and configure it to run on boot. You’ll see what PM2 is and why it’s installed shortly.

As you can see, the nodejs-app role is fairly generic: it’s designed so you can use it with any Node.js app, which makes this a highly reusable piece of code.

The sample-app role, on the other hand, is specifically designed to run the sample app. Create two subfolders for this role, files and tasks:

roles

  ├── nodejs-app

  └── sample-app

      ├── files

      │   ├── app.config.js

      │   └── app.js

      └── tasks

          └── main.yml

app.js is the exact same "Hello, World" Node.js sample app you saw in Part 1. Copy it into the files folder:

$ cp ../../ch1/sample-app/app.js roles/sample-app/files/

app.config.js is a new file that is used to configure PM2. So, what is PM2? PM2 is a process supervisor, which is a tool you can use to run your apps, monitor them, restart them after a reboot or a crash, manage their logging, and so on. Process supervisors provide one layer of auto healing for long-running apps. You’ll see other types of auto healing later in this post.

There are many process supervisors out there, including supervisord, runit, and systemd, with systemd as the one you’re likely to use in most situations, as it’s built into most Linux distributions these days. You’ll see an example of how to use systemd later in this blog post. For this example, I picked PM2 because it has features designed specifically for Node.js apps. To use these features, create a configuration file called app.config.js, as shown in Example 31:

Example 31. PM2 configuration file (ch3/ansible/roles/sample-app/files/app.config.js)

module.exports = {

  apps : [{

    name   : "sample-app",

    script : "./app.js",       (1)

    exec_mode: "cluster",      (2)

    instances: "max",          (3)

    env: {

      "NODE_ENV": "production" (4)

    }

  }]

}

This file configures PM2 to do the following:

1	Run app.js to start the app.
2	Run in cluster mode, so that instead of a single Node.js process, you get one process per CPU, ensuring your app is able to take advantage of all the CPUs on your server.
3	Configure cluster mode to use all CPUs available in cluster mode.
4	Set the `NODE_ENV` environment variable to "production," which is how all Node.js apps and plugins know to run in production mode rather than development mode.

Finally, create tasks/main.yml with the contents shown in Example 32:

Example 32. The sample-app role’s tasks (ch3/ansible/roles/sample-app/tasks/main.yml)

- name: Copy sample app                          (1)

  copy:

    src: ./

    dest: /home/app-user/



- name: Start sample app using pm2               (2)

  shell: pm2 start app.config.js

  args:

    chdir: /home/app-user/



- name: Save pm2 app list so it survives reboot  (3)

  shell: pm2 save

1	Copy the sample app code (app.js and app.config.js) from the files folder to the server.
2	Use PM2 to start the app in the background and start monitoring it.
3	Save the list of apps PM2 is running so that if the server reboots, PM2 will automatically restart those apps.

These changes address most of the concerns in Watch out for snakes: these examples have several problems, improving your security posture (no more root user) and the reliability and performance of your app (process supervisor, cluster mode).

To try this code out, make sure you have Ansible installed, authenticate to AWS as described in Authenticating to AWS on the command line, and run the following command:

$ ansible-playbook -v -i inventory.aws_ec2.yml configure_sample_app_playbook.yml

Ansible will discover your servers, configure each one with all the dependencies it needs, and run the app on each one. At the end, you should see the IP addresses of servers, as shown in the following log output (truncated for readability):

PLAY RECAP ************************************

13.58.56.201               : ok=9    changed=8

3.135.188.118              : ok=9    changed=8

3.21.44.253                : ok=9    changed=8

localhost                  : ok=6    changed=4

Copy the IP of one of the three servers, open http://<IP>:8080 in your web browser, and you should see the familiar "Hello, World!" text once again.

While three servers is great for redundancy, it’s not so great for usability, as your users typically want just a single endpoint to hit. This requires deploying a load balancer, as described in the next section.

Example: Deploy a Load Balancer Using Ansible and Nginx

A load balancer is a piece of software that can distribute load across multiple servers or apps. You give your users a single endpoint to hit, which is the load balancer, and under the hood, the load balancer forwards the requests it receives to a number of different endpoints, using various algorithms (e.g., round-robin, hash-based, least-response-time, etc.) to process requests as efficiently as possible. There are many popular load balancer options out there, such as Apache, Nginx, and HAProxy, as well as cloud-specific load balancing services, such as AWS Elastic Load Balancer, GCP Cloud Load Balancer, and Azure Load Balancer.

In the cloud, you’d most likely use a cloud load balancer, as you’ll see later in this blog post. However, for the purposes of server orchestration, I decided to show you a simplified example of how to run your own load balancer, as server orchestration techniques should work on-prem as well. Therefore, you’ll be deploying Nginx.

To do that, you need one more server. If you have one already with SSH access, you can use it, and skip forward a few paragraphs. If not, you can deploy one more EC2 instance using the same create_ec2_instances_playbook.yml, but with a new variables file, nginx-vars.yml, with the contents shown in Example 33:

Example 33. Variables file to create an EC2 instance for nginx (ch3/ansible/nginx-vars.yml)

num_instances: 1

base_name: nginx_instances

http_port: 80

This will create a single EC2 instance, with the base_name "nginx_instances," and it will allow requests on port 80, which is the default port for HTTP. Run the playbook with this vars file as follows:

$ ansible-playbook \

  -v create_ec2_instances_playbook.yml \

  --extra-vars "@nginx-vars.yml"

This should create one more EC2 instance you can use for nginx. Since the base_name for that instance is nginx_instances, that will also be the group name in the inventory, so configure the variables for this group by creating group_vars/nginx_instances.yml with the contents shown in Example 34:

Example 34. Configure group variables for your Nginx servers (ch3/ansible/group_vars/nginx_instances.yml)

ansible_user: ec2-user

ansible_ssh_private_key_file: ansible-ch3.key

ansible_host_key_checking: false

Now you can create a new playbook to configure these servers with Nginx. Create a new file called configure_nginx_playbook.yml with the contents shown in Example 35:

Example 35. Use a role to configure the EC2 instance with Nginx (ch3/ansible/configure_nginx_playbook.yml)

- name: Configure servers to run nginx

  hosts: nginx_instances (1)

  gather_facts: true

  become: true

  roles:

    - role: nginx        (2)

This playbook does the following:

1	Target the `nginx_instances` group you just configured in your inventory.
2	Configure the servers in that group using a new role called `nginx`, which is described next.

Create a new folder for the nginx role with tasks and templates subfolders:

roles

  ├── nginx

  │   ├── tasks

  │   │   └── main.yml

  │   └── templates

  │       └── nginx.conf.j2

  ├── nodejs-app

  └── sample-app

Inside of nginx/templates/nginx.conf.j2, create an Nginx configuration file template, as shown in Example 36:

Example 36. Nginx configuration file template (ch3/ansible/roles/nginx/templates/nginx.conf.j2)

user nginx;

worker_processes auto;

error_log /var/log/nginx/error.log notice;

pid /run/nginx.pid;



events {

    worker_connections 1024;

}



http {

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '

                      '$status $body_bytes_sent "$http_referer" '

                      '"$http_user_agent" "$http_x_forwarded_for"';



    access_log  /var/log/nginx/access.log  main;



    include             /etc/nginx/mime.types;

    default_type        application/octet-stream;



    upstream backend {                                       (1)

        {% for host in groups['sample_app_instances'] %}     (2)

        server {{ hostvars[host]['public_dns_name'] }}:8080; (3)

        {% endfor %}

    }



    server {

        listen       80;                                     (4)

        listen       [::]:80;



        location / {                                         (5)

                proxy_pass http://backend;

        }

    }

}

Most of this file is standard (boilerplate) Nginx configuration—check out the Nginx documentation if you’re curious to understand what it does—so I’ll just point out a few items to focus on:

1	Use the upstream keyword to define a group of servers that can be referenced elsewhere in this file by the name `backend`. You’ll see where this is used shortly.
2	Use Jinja templating syntax to loop over the servers in the `sample_app_instances` group.
3	Use Jinja templating syntax to configure the `backend` upstream to route traffic to the public address and port 8080 of each server in the `sample_app_instances` group.
4	Configure Nginx to listen on port 80.
5	Configure Nginx as a load balancer, forwarding requests to the / URL to the `backend` upstream.

In short, the preceding configuration file will configure Nginx to load balance traffic across the servers you deployed to run the sample app.

Create nginx/tasks/main.yml with the contents shown in Example 37:

Example 37. nginx role tasks (ch3/ansible/roles/nginx/tasks/main.yml)

- name: Install Nginx              (1)

  yum:

    name: nginx



- name: Copy Nginx config          (2)

  template:

    src: nginx.conf.j2

    dest: /etc/nginx/nginx.conf



- name: Start Nginx                (3)

  systemd_service:

    state: started

    enabled: true

    name: nginx

This file defines the tasks for the nginx role, which are the following:

1	Install Nginx.
2	Render the Nginx configuration file and copy it to the server.
3	Start Nginx. Note the use of `systemd` as a process supervisor to restart Nginx in case it crashes or the server reboots.

Run this playbook to configure Nginx:

$ ansible-playbook -v -i inventory.aws_ec2.yml configure_nginx_playbook.yml

Wait a few minutes for everything to deploy and in the end, you should see log output that looks like this:

PLAY RECAP

xxx.us-east-2.compute.amazonaws.com : ok=4    changed=2    failed=0

The value on the left, "xxx.us-east-2.compute.amazonaws.com," is a domain name you can use to access the Nginx server. If you open http://xxx.us-east-2.compute.amazonaws.com (this time with no port number, as Nginx is listening on port 80, the default port for HTTP) in your browser, you should see "Hello, World!" yet again. Each time you refresh the page, Nginx will send that request to a different EC2 instance. Congrats, you now have a single endpoint you can give your users, and that endpoint will automatically balance the load across multiple servers!

Example: Roll Out Updates with Ansible

So you’ve now seen how to deploy using a server orchestration tool, but what about doing an update? Some configuration management tools support various deployment strategies (a topic you’ll learn more about in Part 5), such as a rolling deployment, where you update your servers in batches, so some servers are always running and serving traffic, while others are being updated. With Ansible, the easiest way to have it do a rolling update is to add the serial parameter to configure_sample_app_playbook.yml, as shown in Example 38:

Example 38. Use the serial parameter to enable rolling deployment (ch3/ansible/configure_sample_app_playbook.yml)

- name: Configure servers to run the sample-app



  # ... (other params omitted for clarity) ...



  serial: 1               (1)

  max_fail_percentage: 30 (2)

1	Setting `serial` to 1 tells Ansible to apply changes to one server at a time. Since you have three servers total, this ensures that two servers are always available to serve traffic, while one goes down briefly for an update.
2	The `max_fail_percentage` parameter tells Ansible to abort a deployment if more than this percent of servers hit an error during upgrade. Setting this to 30% with three servers means that if a single server fails to update, Ansible will not try to deploy the changes to any other servers, so you never lose more than one server to a broken update.

Let’s give the rolling deployment a shot. Update the text that the app responds with in app.js, as shown in Example 39:

Example 39. Update the app to respond with the text "Fundamentals of DevOps!" (ch3/ansible/roles/sample-app/files/app.js)

  res.end('Fundamentals of DevOps!\n');

And re-run the playbook:

$ ansible-playbook -v -i inventory.aws_ec2.yml configure_sample_app_playbook.yml

You should see Ansible rolling out the change to one server at a time. When it’s done, if you refresh the Nginx IP in your browser, you should see the text "Fundamentals of DevOps!"

Get your hands dirty

Here are a few exercises you can try at home to get a better feel for using Ansible for server orchestration:

Figure out how to scale the number of instances running the sample app from three to four.
Try restarting one of the instances using the AWS Console. How does nginx handle it while the instance is rebooting? Does the sample app still work after the reboot? How does this compare to the behavior you saw in Part 1?
Try terminating one of the instances using the AWS Console. How does nginx handle it? How can you restore the instance?

When you’re done experimenting with Ansible, you should manually undeploy the EC2 instances by finding each one in the EC2 Console (look for the instance IDs the playbook writes to the log), clicking "Instance state," and choosing "Terminate instance" in the drop down, as shown in Figure 18. This ensures that your account doesn’t start accumulating any unwanted charges.

VM Orchestration

The idea with VM orchestration is to do the following:

Create VM images that have your apps and all their dependencies fully installed and configured.
Deploy the VM images across a cluster of servers.
Scale the number of servers up or down depending on your needs.
When you need to deploy an update, create new VM images, deploy those onto new servers, and then undeploy the old servers.

This is a slightly more modern approach that works best with cloud providers such as AWS, GCP, and Azure, where the servers are all virtual servers, so you can spin up new ones and tear down old ones in minutes. That said, you can also use virtualization on-prem with tools from VMWare, Citrix, Microsoft Hyper-V, and so on. We’ll take a look at a VM orchestration example using AWS, but be aware that the basic techniques here apply to most VM orchestration tools, whether in the cloud or on-prem.

Key takeaway #2

VM orchestration is an immutable infrastructure approach where you deploy and manage VM images across virtualized servers.

To get a feel for VM orchestration let’s go through an example. This requires the following three things:

A tool for building VM images: Just as in Part 2, you’ll use Packer to create VM images for AWS.
A tool for orchestrating VMs: This blog post series primarily uses AWS, so you’ll use AWS Auto Scaling Groups.
A tool for managing your infrastructure as code: Instead of setting up Auto Scaling Groups by manually clicking around, you’ll use the same IaC tool as in Part 2, OpenTofu, to manage your infrastructure as code.

We’ll start with the first item, building VM images.

Example: Build a VM Image Using Packer

Head into the fundamentals-of-devops folder you created in Part 1 to work through the examples in this blog post series, and create a new subfolder for the Packer code:

$ cd fundamentals-of-devops

$ mkdir -p ch3/packer

$ cd ch3/packer

Copy the Packer template you created in Part 2 into the new ch3/packer folder:

$ cp ../../ch2/packer/sample-app.pkr.hcl .

You should also copy app.js (the sample app) and app.config.js (the PM2 configuration file) from the server orchestration section of this blog post into the ch3/packer folder:

$ cp ../ansible/roles/sample-app/files/app*.js .

Example 40 shows the updates to make to the Packer template:

Example 40. Update the Packer template to use PM2 as a process supervisor and create app-user (ch3/packer/sample-app.pkr.hcl)

build {

  sources = [

    "source.amazon-ebs.amazon_linux"

  ]



  provisioner "file" {                                                   (1)

    sources     = ["app.js", "app.config.js"]

    destination = "/tmp/"

  }



  provisioner "shell" {

    inline = [

      "curl -fsSL https://rpm.nodesource.com/setup_21.x | sudo bash -",

      "sudo yum install -y nodejs",

      "sudo adduser app-user",                                           (2)

      "sudo mv /tmp/app.js /tmp/app.config.js /home/app-user/",          (3)

      "sudo npm install pm2@latest -g",                                  (4)

      "eval \"$(sudo su app-user bash -c 'pm2 startup' | tail -n1)\""    (5)

    ]

    pause_before = "30s"

  }

}

The main changes are to make security and reliability improvements similar to the ones you did in the server orchestration section: that is, use PM2 as a process supervisor and create app-user to run the app (instead of using the root user).

1	Copy two files, app.js and app.config.js, onto the server (into the /tmp folder, as the final destination, the home folder of `app-user`, doesn’t exist until a later step).
2	Create `app-user`. This will also automatically create a home folder for `app-user`.
3	Move app.js and app.config.js from the /tmp folder to the home folder of `app-user`.
4	Install PM2.
5	Configure PM2 to run on boot (as `app-user`) so if your server ever restarts, PM2 will restart your app.

To build the AMI, make sure Packer is installed, authenticate to AWS as described in Authenticating to AWS on the command line, and run the following commands:

$ packer init sample-app.pkr.hcl

$ packer build sample-app.pkr.hcl

When the build is done, Packer will output the ID of the newly created AMI. Make sure to jot this ID down somewhere, as you’ll need it shortly.

Example: Deploy a VM Image in an Auto Scaling Group Using OpenTofu

The next step is to deploy the AMI. In Part 2, you used OpenTofu to deploy an AMI on a single EC2 instance. The goal now is to see VM orchestration at play, which means deploying multiple servers, or what’s sometimes called a cluster. Most cloud providers offer a native way to run VMs across a cluster: for example, AWS offers Auto Scaling Groups (ASG), GCP offers Managed Instance Groups, and Azure offers Scale Sets. For this example, you’ll be using an AWS ASG, as that offers a number of nice features:

Cluster management: ASGs make it easy to launch multiple instances and manually resize the cluster.
Auto scaling: You can also configure the ASG to resize the cluster automatically in response to load.
Auto healing: The ASG monitors all the instances in the cluster and automatically replaces any instance that crashes.

Let’s use a reusable OpenTofu module called asg from this blog post series’s sample code repo to deploy an ASG. You can find the module in the ch3/tofu/modules/asg folder. This is a simple module that creates three main resources:

A launch template, which is a bit like a blueprint that specifies the configuration to use for each EC2 instance.
An ASG which uses the configuration in the launch template to stamp out EC2 instances. The ASG will deploy these instances into the Default VPC. See the note on Default VPCs in A Note on Default Virtual Private Clouds.
A security group which controls the traffic that can go in and out of each EC2 instance.

A Note on Default Virtual Private Clouds

All the AWS examples in the early parts of this blog post series use the Default VPC in your AWS account. A VPC, or virtual private cloud, is an isolated area of your AWS account that has its own virtual network and IP address space. Just about every AWS resource deploys into a VPC. If you don’t explicitly specify a VPC, the resource will be deployed into the Default VPC, which is part of every AWS account created after 2013 (if you deleted your Default VPC, you can create a new Default VPC using the VPC Console). It’s not a good idea to use the Default VPC for production apps, but it’s OK to use it for learning and testing. In Part 7 [coming soon], you’ll learn more about VPCs, including how to create a custom VPC that you can use for production apps instead of the Default VPC.

To use the asg module, create a live/asg-sample folder to act as a root module:

$ cd fundamentals-of-devops

$ mkdir -p ch3/tofu/live/asg-sample

$ cd ch3/tofu/live/asg-sample

Inside the asg-sample folder, create a main.tf file with the initial contents shown in Example 41:

Example 41. Configure the asg module (ch3/tofu/live/asg-sample/main.tf)

provider "aws" {

  region = "us-east-2"

}



module "asg" {

  source = "github.com/brikis98/devops-book//ch3/tofu/modules/asg"



  name = "sample-app-asg"                                   (1)



  # TODO: fill in with your own AMI ID!

  ami_id        = "ami-0f5b3d9c244e6026d"                   (2)

  user_data     = filebase64("${path.module}/user-data.sh") (3)

  app_http_port = 8080                                      (4)



  instance_type    = "t2.micro"                             (5)

  min_size         = 1                                      (6)

  max_size         = 10                                     (7)

  desired_capacity = 3                                      (8)

}

The preceding code sets the following parameters:

1	`name`: The name to use for the launch template, ASG, security group, and all other resources created by the module.
2	`ami_id`: The AMI to use for each EC2 instance. You’ll need to set this to the ID of the AMI you built from the Packer template in the previous section.
3	`user_data`: The user data script to run on each instance during boot. The contents of user-data.sh are shown in Example 42.
4	`app_http_port`: The port to open in the security group to allow the app to receive HTTP requests.
5	`instance_type`: The type of instances to run in the ASG.
6	`min_size`: The minimum number of instances to run in the ASG.
7	`max_size`: The maximum number of instances to run in the ASG.
8	`desired_capacity`: The initial number of instances to run in the ASG.

Create a file called user-data.sh with the contents shown in Example 42:

Example 42. The user data script for each EC2 instance, which uses PM2 to start the sample app (ch3/tofu/live/asg-sample/user-data.sh)

#!/usr/bin/env bash



set -e



sudo su app-user         (1)

cd /home/app-user        (2)

pm2 start app.config.js  (3)

pm2 save                 (4)

This user data script does the following:

1	Switch to `app-user`.
2	Go into the home folder for `app-user`, which is where the Packer template copied the sample app code.
3	Use PM2 to run the sample app.
4	Save the running app status so that if the server reboots, PM2 will restart the sample app.

If you were to run apply right now, you’d get an ASG with three EC2 instances running your sample app. While this is great for redundancy, as discussed in the server orchestration section, you typically want to give your users just a single endpoint to hit. This requires deploying a load balancer, as described in the next section.

Example: Deploy an Application Load Balancer Using OpenTofu

In the server orchestration section, you deployed your own load balancer using Nginx. This is a simplified deployment that works fine for an example, but has a number of drawbacks if you try to use it for production apps:

Availability: You are running only a single instance for your load balancer. If it crashes, your users experience an outage.
Scalability: If load exceeds what a single server can handle, users will see degraded performance or an outage.
Maintenance: Keeping the load balancer up to date is entirely up to you. Moreover, when you need to update the load balancer itself (e.g., update to a new version of Nginx), it’s tricky to do so without downtime.
Security: The load balancer server is not especially hardened against attacks.
Encryption: If you want to encrypt data in transit (e.g., use HTTPS and TLS)—which you should for just about all production use cases—you’ll have to set it all up manually (you’ll learn more about encryption in Part 8 [coming soon]).

To be clear, there’s nothing wrong with Nginx: if you put the work in, there are ways to address all of these issues with Nginx. However, it’s a considerable amount of work. One of the big benefits of the cloud is that most cloud providers offer managed services that can do this work for you. Load balancing is a very common problem, and as I mentioned before, almost every cloud provider offers a managed service for load balancing, such as AWS Elastic Load Balancer, GCP Cloud Load Balancer, and Azure Load Balancer. All of these provide a number of powerful features out-of-the-box. For example, the AWS Elastic Load Balancer (ELB) gives you the following:

Availability: Under the hood, AWS automatically deploys multiple servers for an ELB so you don’t get an outage if one server crashes.
Scalability: AWS monitors load on the ELB, and if it is starting to exceed capacity, AWS automatically deploys more servers.
Maintenance: AWS automatically keeps the load balancer up to date, with zero downtime.
Security: AWS load balancers are hardened against a variety of attacks, including meeting the requirements of a variety of security standards (e.g., SOC 2, ISO 27001, HIPAA, PCI, FedRAMP) out-of-the-box.^[15]
Encryption: AWS has out-of-the-box support for HTTPS, Mutual TLS, TLS Offloading, auto-rotated TLS certs, and more.

Using a managed service for load balancing can be a huge time saver, so let’s use an AWS load balancer. There are actually several types of AWS load balancers to choose from; the one that’ll be the best fit for the simple sample app is the Application Load Balancer (ALB). The ALB consists of several parts, as shown in Figure 24:

Figure 24. An ALB consists of listeners, listener rules, and target groups.

Listener: Listen for requests on a specific port (e.g., 80) and protocol (e.g., HTTP).
Listener rule: Specify which requests that come into a listener to route to which target group based on rules that match on request parameters such as the path (e.g., /foo and /bar) and hostname (e.g., foo.example.com and bar.example.com).
Target groups: One or more servers that receive requests from the load balancer. The target group also performs health checks on these servers by sending each server a request on a configurable interval (e.g., every 30 seconds), and only considering the server as healthy if it returns an expected response (e.g., a 200 OK) within a configurable time period (e.g., within 2 seconds). The target group will only send requests to servers that pass its health checks.

The blog post series’s sample code repo includes a module called alb in the ch3/tofu/modules/alb folder that you can use to deploy an ALB. Note that this is a very simple ALB—it deploys into the Default VPC (see the note in A Note on Default Virtual Private Clouds) and only has a single listener rule where it forwards all requests to a single target group—but it should suffice for our purposes in this blog post.

Example 43 shows how to update the asg-sample module to use the alb module:

Example 43. Configure the alb module (ch3/tofu/live/asg-sample/main.tf)

module "asg" {

  source = "github.com/brikis98/devops-book//ch3/tofu/modules/asg"



  # ... (other params omitted) ...



}



module "alb" {

  source = "github.com/brikis98/devops-book//ch3/tofu/modules/alb"



  name                  = "sample-app-alb" (1)

  alb_http_port         = 80               (2)

  app_http_port         = 8080             (3)

  app_health_check_path = "/"              (4)

}

The preceding code sets the following parameters on the alb module:

1	`name`: The name to use for the ALB, target group, security group, and all other resources created by the module.
2	`alb_http_port`: The port the ALB (the listener) listens on for HTTP requests.
3	`app_http_port`: The port the app listens on for HTTP requests. The ALB target group will send traffic and health checks to this port.
4	`app_health_check_path`: The path to use when sending health check requests to the app.

The one missing piece is the connection between the ASG and the ALB: that is, how does the ALB know which EC2 instances to send traffic to (which instances to put in its target group)? To tie these pieces together, go back to your usage of the asg module, and update it with one parameter, as shown in Example 44:

Example 44. Configure the asg module (ch3/tofu/live/asg-sample/main.tf)

module "asg" {

  source = "github.com/brikis98/devops-book//ch3/tofu/modules/asg"



  # ... (other params omitted) ...



  target_group_arns = [module.alb.target_group_arn]

}

The preceding code sets the target_group_arns parameter, which will change the ASG behavior in two ways:

First, it’ll configure the ASG to register all of its instances in the specified target group, including the initial set of instances when you first launch the ASG, as well as any new instances that launch later, either as a result of a deployment or auto healing or auto scaling.
Second, it’ll configure the ASG to use the ALB for health checks and auto healing. By default, the auto healing feature in the ASG is simple: it replaces any instance that has crashed (a hardware issue). However, if the instance is still running, and it’s the app that crashed or stopped responding (a software issue), the ASG won’t know to replace it. Configuring the ASG to use the ALB for health checks tells the ASG to replace an instance if it fails the load balancer’s health check, which gives you more robust auto healing, as the load balancer health check will detect both hardware and software issues.

The final change to the asg-sample module is to add the load balancer’s domain name as an output variable in outputs.tf, as shown in Example 45:

Example 45. Output the ALB domain name (ch3/tofu/live/asg-sample/outputs.tf)

output "alb_dns_name" {

  value = module.alb.alb_dns_name

}

To deploy the module, make sure OpenTofu is installed, authenticate to AWS as described in Authenticating to AWS on the command line, and run the following commands:

$ tofu init

$ tofu apply

After a few minutes, everything should be deployed, and you should see the ALB domain name as an output:

Apply complete! Resources: 10 added, 0 changed, 0 destroyed.



Outputs:



alb_dns_name = "sample-app-tofu-656918683.us-east-2.elb.amazonaws.com"

Open this domain name up in your web browser, and you should see "Hello, World!" once again. Congrats, you now have a single endpoint, the load balancer domain name, that you can give your users, and when users hit it, the load balancer will distribute their requests across all your apps in your ASG!

Example: Roll Out Updates with OpenTofu and Auto Scaling Groups

You’ve seen the initial deployment with VM orchestration, but what about rolling out updates? Most of the VM orchestration tools have support for zero-downtime deployments and various deployment strategies. For example, the ASGs in AWS have native support for a feature called instance refresh, which can update your instances automatically by doing a rolling deployment. Example 46 shows how to enable instance refresh in the asg module:

Example 46. Enable instance refresh for the ASG (ch3/tofu/live/asg-sample/main.tf)

module "asg" {

  source = "github.com/brikis98/devops-book//ch3/tofu/modules/asg"



  # ... (other params omitted) ...



  instance_refresh = {

    min_healthy_percentage = 100  (1)

    max_healthy_percentage = 200  (2)

    auto_rollback          = true (3)

  }

}

The preceding code sets the following parameters:

1	`min_healthy_percentage`: Setting this to 100% means that the cluster will never have fewer than the desired number of instances (initially, three), even during deployment. Whereas with server orchestration, you updated instances in place, with VM orchestration, you’ll deploy new instances, as per the next parameter.
2	`max_healthy_percentage`: Setting this to 200% means that to deploy updates, the cluster will deploy totally new instances, up to twice the original size of the cluster, wait for the new instances to pass health checks, and then undeploy the old instances. So if you started with three instances, then during deployment, you’ll go up to six instances, with three new and three old, and when the new instances pass health checks, you’ll go back to three instances by undeploying the three old ones.
3	`auto_rollback`: If something goes wrong during deployment, and the new instances fail to pass health checks, this setting will automatically initiate a rollback, putting your cluster back to its previous working condition.

Run apply one more time to enable the instance refresh setting. Once that’s done, you can try rolling out a change. For example, update app.js in the packer folder to respond with "Fundamentals of DevOps!", as shown in Example 47:

Example 47. Update the app to respond with the text "Fundamentals of DevOps!" (ch3/packer/app.js)

res.end('Fundamentals of DevOps!\n');

Next, build a new AMI using Packer:

$ packer build sample-app.pkr.hcl

This will give you a new AMI ID. Update the ami_id in the asg-sample module to this new ID and run apply one more time. You should then see a plan output that looks something like this (truncated for readability):

$ tofu apply



OpenTofu will perform the following actions:



  # aws_autoscaling_group.sample_app will be updated in-place

  ~ resource "aws_autoscaling_group" "sample_app" {

        # (27 unchanged attributes hidden)



      ~ launch_template {

            id      = "lt-0bc25ef067814e3c0"

            name    = "sample-app-tofu20240414163932598800000001"

          ~ version = "1" -> (known after apply)

        }



        # (3 unchanged blocks hidden)

    }



  # aws_launch_template.sample_app will be updated in-place

  ~ resource "aws_launch_template" "sample_app" {

      ~ image_id       = "ami-0f5b3d9c244e6026d" -> "ami-0d68b7b6546331281"

      ~ latest_version = 1 -> (known after apply)

        # (10 unchanged attributes hidden)

    }

This plan output shows that launch template has changed, due to the new AMI ID, and as a result, the version of the launch template used in the ASG has changed. This will result in an instance refresh. Type in yes, hit Enter, and AWS will kick off the instance refresh process in the background. If you go to the EC2 Console, click Auto Scaling Groups in the left nav, find your ASG, and click the "Instance refresh" tab, you should be able to see the instance refresh in progress, as shown in Figure 25.

Figure 25. Using the EC2 console to see an ASG instance refresh in progress

During this process, the ASG will launch three new EC2 instances, and the ALB will start performing health checks. Once the new instances start to pass health checks, the ASG will undeploy the old one instances, leaving you with just the three new instances running the new code. The whole process should take around five minutes.

During this deployment, the load balancer URL should always return a successful response, as this is a zero-downtime deployment. You can even check this by opening a new terminal tab, and running the following Bash one-liner:

$ while true; do curl http://<load_balancer_url>; sleep 1; done

This code runs curl, an HTTP client, in a loop, hitting your ALB once per second and allowing you to see the zero-downtime deployment in action. For the first couple minutes, you should see only responses from the old instances: "Hello, World!" Then, as new instances start to pass health checks, the ALB will begin sending traffic to them, and you should see the response from the ALB alternate between "Hello, World!" and "Fundamentals of DevOps!" After another couple minutes, the "Hello, World!" message will disappear, and you’ll see only "Fundamentals of DevOps!", which means all the old instances have been shut down. The output will look something like this:

Hello, World!

Hello, World!

Hello, World!

Hello, World!

Hello, World!

Hello, World!

Fundamentals of DevOps!

Hello, World!

Fundamentals of DevOps!

Hello, World!

Fundamentals of DevOps!

Hello, World!

Fundamentals of DevOps!

Hello, World!

Fundamentals of DevOps!

Hello, World!

Fundamentals of DevOps!

Fundamentals of DevOps!

Fundamentals of DevOps!

Fundamentals of DevOps!

Fundamentals of DevOps!

Fundamentals of DevOps!

Congrats, you’ve now seen VM orchestration in action, including rolling out changes following immutable infrastructure practices!

Get your hands dirty

Here are a few exercises you can try at home to get a better feel for using OpenTofu for VM orchestration:

Figure out how to scale the number of instances in the ASG running the sample app from three to four. How does this compare to adding a fourth instance to the Ansible code?
Try restarting one of the instances using the AWS Console. How does the ALB handle it while the instance is rebooting? Does the sample app still work after the reboot? How does this compare to the behavior you saw when restarting an instance with Ansible?
Try terminating one of the instances using the AWS Console. How does the ALB handle it? Do you need to do anything to restore the instance?

When you’re done experimenting with the ASG, run tofu destroy to undeploy all your infrastructure. This ensures that your account doesn’t start accumulating any unwanted charges.

Container Orchestration

The idea with container orchestration is to do the following:

Create container images that have your apps and all their dependencies fully installed and configured.
Deploy the container images across a cluster of servers, with potentially multiple containers per server, packed in as efficiently as possible (bin packing).
Automatically scale the number of servers or the number of containers up or down depending on load.
When you need to deploy an update, create new container images, deploy them into the cluster, and then undeploy the old containers.

Although containers have been around for decades^[16], container orchestration started to explode in popularity around 2013, with the emergence of Docker, a tool for building, running, and sharing containers, and Kubernetes, a container orchestration tool. The reason for this popularity is that containers and container orchestration offer a number of advantages over VMs and VM orchestration:

Speed: Containers typically build faster than VMs, especially with caching. Moreover, container orchestration tools typically deploy containers faster than VMs. So the build & deploy cycle with containers can be considerably faster: you can expect 10-20 minutes for VMs, but just 1-5 minutes for containers.
Efficiency: Most container orchestration tools have a built-in scheduler to decide which servers in your cluster should run which containers, using bin packing algorithms to use the available resources as efficiently as possible.^[17]
Portability: You can run containers and container orchestration tools everywhere, including on-prem and in all the major cloud providers. Moreover, the most popular container tool, Docker, and container orchestration tool, Kubernetes, are both open source. All of this means that using containers reduces lock-in to any single vendor.
Local development: You can also run containers and container orchestration tools in your own local dev environment, as containers typically have reasonable file sizes, boot quickly, and have little CPU/memory overhead. This is a huge boon to local development, as you could now realistically run your entire tech stack—e.g., Kubernetes and Docker containers for multiple services—completely locally. While it’s possible to run VMs locally too, VM images tend to be considerably bigger, slower to boot, and have more CPU and memory overhead, so using them for local development is relatively uncommon; moreover, there is no practical way to run most VM orchestration tools locally: e.g., there’s no way to deploy an AWS ASG on your own computer.
Functionality: Container orchestration tools solved more orchestration problems out-of-the-box than VM orchestration tools. For example, Kubernetes has built-in solutions for deployment, updates, auto scaling, auto healing, configuration, secrets management, service discovery, and disk management.

Key takeaway #3

Container orchestration is an immutable infrastructure approach where you deploy and manage container images across a cluster of servers.

There are many container tools out there, including Docker, Moby, CRI-O, Podman, runc, and buildkit. Likewise, there are many container orchestration tools out there, including Kubernetes, Nomad, Docker Swarm, Amazon ECS, Marathon / Mesos, and OpenShift. The most popular, by far, are Docker and Kubernetes—so much so their names are nearly synonymous with containers and container orchestration, respectively—so that’s what we’ll focus on in this blog post series.

In the next several sections, you’ll learn to use Docker, followed by Kubernetes, and finally, you’ll learn to use Docker and Kubernetes in AWS. Let’s get into it!

Example: A Crash Course on Docker

As you may remember from Section 2.4, Docker images are like self-contained "snapshots" of the operating system (OS), the software, the files, and all other relevant details. Let’s now see Docker in action.

First, if you don’t have Docker installed already, follow the instructions on the Docker website to install Docker Desktop for your operating system. Once it’s installed, you should have the docker command available on your command line. You can use the docker run command to run Docker images locally:

$ docker run <IMAGE> [COMMAND]

where IMAGE is the Docker image to run and COMMAND is an optional command to execute. For example, here’s how you can run a Bash shell in an Ubuntu 24.04 Docker image (note that the following command includes the -it flag so you get an interactive shell where you can type):

$ docker run -it ubuntu:24.04 bash



Unable to find image 'ubuntu:24.04' locally

24.04: Pulling from library/ubuntu

Digest: sha256:3f85b7caad41a95462cf5b787d8a04604c

Status: Downloaded newer image for ubuntu:24.04



root@d96ad3779966:/#

And voilà, you’re now in Ubuntu! If you’ve never used Docker before, this can seem fairly magical. Try running some commands. For example, you can look at the contents of /etc/os-release to verify you really are in Ubuntu:

root@d96ad3779966:/# cat /etc/os-release

PRETTY_NAME="Ubuntu 24.04 LTS"

NAME="Ubuntu"

VERSION_ID="24.04"

(...)

How did this happen? Well, first, Docker searches your local filesystem for the ubuntu:24.04 image. If you don’t have that image downloaded already, Docker downloads it automatically from Docker Hub, which is a Docker Registry that contains shared Docker images. The ubuntu:24.04 image happens to be a public Docker image—an official one maintained by the Docker team—so you’re able to download it without any authentication. It’s also possible to create private Docker images that only certain authenticated users can use, as you’ll see later in this blog post.

Once the image is downloaded, Docker runs the image, executing the bash command, which starts an interactive Bash prompt, where you can type. Try running the ls command to see the list of files:

root@d96ad3779966:/# ls -al

total 56

drwxr-xr-x   1 root root 4096 Feb 22 14:22 .

drwxr-xr-x   1 root root 4096 Feb 22 14:22 ..

lrwxrwxrwx   1 root root    7 Jan 13 16:59 bin -> usr/bin

drwxr-xr-x   2 root root 4096 Apr 15  2020 boot

drwxr-xr-x   5 root root  360 Feb 22 14:22 dev

drwxr-xr-x   1 root root 4096 Feb 22 14:22 etc

drwxr-xr-x   2 root root 4096 Apr 15  2020 home

lrwxrwxrwx   1 root root    7 Jan 13 16:59 lib -> usr/lib

drwxr-xr-x   2 root root 4096 Jan 13 16:59 media

(...)

You might notice that’s not your filesystem. That’s because Docker images run in containers that are isolated at the userspace level: when you’re in a container, you can only see the filesystem, memory, networking, etc., in that container. Any data in other containers, or on the underlying host operating system, is not accessible to you, and any data in your container is not visible to those other containers or the underlying host operating system. This is one of the things that makes Docker useful for running applications: the image format is self-contained, so Docker images run the same way no matter where you run them, and no matter what else is running there.

To see this in action, write some text to a test.txt file as follows:

root@d96ad3779966:/# echo "Hello, World!" > test.txt

Next, exit the container by hitting Ctrl-D, and you should be back in your original command prompt on your underlying host OS. If you try to look for the test.txt file you just wrote, you’ll see that it doesn’t exist: the container’s filesystem is totally isolated from your host OS.

Now, try running the same Docker image again:

$ docker run -it ubuntu:24.04 bash

root@3e0081565a5d:/#

Notice that this time, since the ubuntu:24.04 image is already downloaded, the container starts almost instantly. This is another reason Docker is useful for running applications: unlike virtual machines, containers are lightweight, boot up quickly, and incur little CPU or memory overhead.

You may also notice that the second time you fired up the container, the command prompt looked different. That’s because you’re now in a totally new container; any data you wrote in the previous one is no longer accessible to you. Run ls -al and you’ll see that the test.txt file does not exist. Containers are isolated not only from the host OS but also from each other.

Hit Ctrl-D again to exit the container, and back on your host OS, run the docker ps -a command:

$ docker ps -a

CONTAINER ID   IMAGE            COMMAND    CREATED          STATUS

3e0081565a5d   ubuntu:24.04     "bash"     5 min ago    Exited (0) 16 sec ago

d96ad3779966   ubuntu:24.04     "bash"     14 min ago   Exited (0) 5 min ago

This will show you all the containers on your system, including the stopped ones (the ones you exited). You can start a stopped container again by using the docker start <ID> command, setting ID to an ID from the CONTAINER ID column of the docker ps output. For example, here is how you can start the first container up again (and attach an interactive prompt to it via the -ia flags):

$ docker start -ia d96ad3779966

root@d96ad3779966:/#

You can confirm this is really the first container by outputting the contents of test.txt:

root@d96ad3779966:/# cat test.txt

Hello, World!

Hit Ctrl-D once more to exit the container and get back to your host OS.

Now that you’ve seen the basics of Docker, let’s look at what it takes to create your own Docker images, and use them to run web apps.

Example: Create a Docker Image for a Node.js app

Let’s see how a container can be used to run a web app: in particular, the Node.js sample app you’ve been using throughout this blog post series. Create a new folder called docker:

$ cd fundamentals-of-devops

$ mkdir -p ch3/docker

$ cd ch3/docker

Copy app.js from the server orchestration section into the docker folder (note: you do not need to copy app.config.js this time):

$ cp ../ansible/roles/sample-app/files/app.js .

Next, create a file called Dockerfile, with the contents shown in Example 48:

Example 48. Dockerfile for the Node.js sample-app (ch3/docker/Dockerfile)

(1)

FROM node:21.7



(2)

WORKDIR /home/node/app



(3)

COPY app.js .



(4)

EXPOSE 8080



(5)

USER node



(6)

CMD ["node", "app.js"]

Just as you used a Packer template to define how to build a VM image for your sample app, this Dockerfile is a template that defines how to build a Docker image for your sample app. This Dockerfile does the following:

1	It starts with the official Node.js Docker image from Docker Hub as the base. One of the advantages of Docker is that it’s easy to share Docker images, so instead of having to figure out how to install Node.js yourself, you can use the official image, which is maintained by the Node.js team.
2	Set the working directory for the rest of the build.
3	Copy app.js into the Docker image.
4	This tells the Docker image to advertise that the app within it will listen on port 8080. When someone uses your Docker image, they can use the information from `EXPOSE` to figure out which ports they wish to expose. You’ll see an example of this shortly.
5	Use the `node` user (created as part of the official Node.js Docker image) instead of the `root` user when running this app.
6	When you run the Docker image, this will be the default command that it executes. Note that you typically do not need to use a process supervisor for Docker images, as Docker orchestration tools take care of process supervision, resource usage (e.g., CPU, memory), and so on, automatically. Also note that just about all container orchestration tools expect your containers to run apps in the "foreground," blocking until they exit, and logging directly to `stdout` and `stderr`.

To build a Docker image from this Dockerfile, use the docker build command:

$ docker build -t sample-app:v1 .

The -t flag is the tag (name) to use for the Docker image: the preceding code sets the image name to "sample-app" and the version to "v1." Later on, if you make changes to the sample app, you’ll be able to build a new Docker image and give it a new version, such as "v2." The dot (.) at the end tells docker build to run the build in the current directory (which should be the folder that contains your Dockerfile). When the build finishes, you can use the docker run command to run your new image:

$ docker run --init sample-app:v1

 Listening on port 8080

Note the use of --init: Node.js doesn’t handle kernel signals (such as Ctrl+C) properly, so this is necessary to ensure that the app will exit correctly if you hit Ctrl+C. See Docker and Node.js best practices for more information, including other practices that you should use with Node.js Docker images.

Your app is now listening on port 8080! However, if you open a new terminal on your host operating system and try to access the sample app, it won’t work:

$ curl localhost:8080

curl: (7) Failed to connect to localhost port 8080: Connection refused

What’s the problem? Actually, it’s not a problem but a feature! Docker containers are isolated from the host operating system and other containers, not only at the filesystem level but also in terms of networking. So while the container really is listening on port 8080, that is only on a port inside the container, which isn’t accessible on the host OS. If you want to expose a port from the container on the host OS, you have to do it via the -p flag.

First, hit Ctrl-C to shut down the sample-app container: note that it’s Ctrl-C this time, not Ctrl-D, as you’re shutting down a process, rather than exiting an interactive prompt. Now rerun the container but this time with the -p flag as follows:

$ docker run -p 8080:8080 --init sample-app:v1

 Listening on port 8080

Adding -p 8080:8080 to the command tells Docker to expose port 8080 inside the container on port 8080 of the host OS. You know to use port 8080 here, as you built this Docker image yourself, but if this was someone else’s image, you could use docker inspect on the image, and that will tell you about any ports that image labeled with EXPOSE. In another terminal on your host OS, you should now be able to see the sample app working:

$ curl localhost:8080

Hello, World!

Congrats, you now know how to run a web app locally using Docker! However, while using docker run directly is fine for local testing and learning, it’s not the way you’d run Dockerized apps in production. For that, you typically want to use a container orchestration tool such as Kubernetes, which is the topic of the next section.

Cleaning Up Containers

Every time you run docker run and exit, you are leaving behind containers, which take up disk space. You may wish to clean them up with the docker rm <CONTAINER_ID> command, where CONTAINER_ID is the ID of the container from the docker ps output. Alternatively, you could include the --rm flag in your docker run command to have Docker automatically clean up when you exit the container.

Example: Deploy a Dockerized App with Kubernetes

Kubernetes is a container orchestration tool, which means it’s a platform for running and managing containers on your servers, including scheduling, auto healing, auto scaling, load balancing, and much more. Under the hood, Kubernetes consists of two main pieces, as shown in Figure 26:

Figure 26. The Kubernetes architecture consists of a control plane and worker nodes

Control plane: The control plane is responsible for managing the Kubernetes cluster. It is the "brains" of the operation, responsible for storing the state of the cluster, monitoring containers, and coordinating actions across the cluster. It also runs the API server, which provides an API you can use from command-line tools (e.g., kubectl), web UIs (e.g., the Kubernetes Dashboard), and IaC tools (e.g., OpenTofu) to control what’s happening in the cluster.
Worker nodes: The worker nodes are the servers used to actually run your containers. The worker nodes are entirely managed by the control plane, which tells each worker node what containers it should run.

Kubernetes is open source, and one of its strengths is that you can run it anywhere: in any public cloud (e.g., AWS, Azure, GCP), in your own datacenter, and even on your personal computer. A little later in this blog post, I’ll show you how you can run Kubernetes in the cloud (in AWS), but for now, let’s start small and run it locally. This is easy to do if you installed a relatively recent version of Docker Desktop, as it has the ability to fire up a Kubernetes cluster locally with just a few clicks.

If you open Docker Desktop’s preferences on your computer, you should see Kubernetes in the nav, as shown in Figure 27.

Figure 27. Enable Kubernetes on Docker Desktop.

If it’s not enabled already, check the Enable Kubernetes checkbox, click Apply & Restart, and wait a few minutes for that to complete. In the meantime, follow the instructions on the Kubernetes website to install kubectl, which is the command-line tool for interacting with Kubernetes.

To use kubectl, you must first update its configuration file, which lives in $HOME/.kube/config (that is, the .kube folder of your home directory), to tell it what Kubernetes cluster to connect to. Conveniently, when you enable Kubernetes in Docker Desktop, it updates this config file for you, adding a docker-desktop entry to it, so all you need to do is tell kubectl to use this configuration as follows:

$ kubectl config use-context docker-desktop

Switched to context "docker-desktop".

Now you can check if your Kubernetes cluster is working with the get nodes command:

$ kubectl get nodes

NAME             STATUS   ROLES           AGE     VERSION

docker-desktop   Ready    control-plane   2m31s   v1.29.1

The get nodes command shows you information about all the nodes in your cluster. Since you’re running Kubernetes locally, your computer is the only node, and it’s running both the control plane and acting as a worker node. You’re now ready to run some Docker containers!

To deploy something in Kubernetes, you create Kubernetes objects, which are persistent entities you write to the Kubernetes cluster (via the API server) that record your intent: e.g., your intent to have specific Docker images running. The cluster runs a reconciliation loop, which continuously checks the objects you stored in it and works to make the state of the cluster match your intent.

There are many different types of Kubernetes objects available. The one we’ll use to deploy your sample app is a Kubernetes Deployment, which is a declarative way to manage an application in Kubernetes. The Deployment allows you to declare what Docker images to run, how many copies of them to run (replicas), a variety of settings for those images (e.g., CPU, memory, port numbers, environment variables), and so on, and the Deployment will then work to ensure that the requirements you declared are always met.

One way to interact with Kubernetes is to create YAML files to define your Kubernetes objects, and to use the kubectl apply command to submit those objects to the cluster. Create a new folder called kubernetes to store these YAML files:

$ cd fundamentals-of-devops

$ mkdir -p ch3/kubernetes

$ cd ch3/kubernetes

Within the kubernetes folder, create a file called sample-app-deployment.yml with the contents shown in Example 49:

Example 49. The YAML for a Kubernetes Deployment (ch3/kubernetes/sample-app-deployment.yml)

apiVersion: apps/v1

kind: Deployment                  (1)

metadata:                         (2)

  name: sample-app-deployment

spec:

  replicas: 3                     (3)

  template:                       (4)

    metadata:                     (5)

      labels:

        app: sample-app-pods

    spec:

      containers:                 (6)

        - name: sample-app        (7)

          image: sample-app:v1    (8)

          ports:

            - containerPort: 8080 (9)

          env:                    (10)

            - name: NODE_ENV

              value: production

  selector:                       (11)

    matchLabels:

      app: sample-app-pods

This YAML file gives you a lot of functionality for just ~20 lines of code:

1	The `kind` keyword specifies that this is Kubernetes object is a Deployment.
2	Every Kubernetes object includes metadata that can be used to identify and target that object in API calls. Kubernetes makes heavy use of metadata and labels to keep the system highly flexible and loosely coupled. The preceding code sets the name of the Deployment to "sample-app-deployment."
3	The Deployment will run 3 replicas.
4	This is the pod template—the blueprint—that defines what this Deployment will deploy and manage. It’s similar to the launch template you saw with AWS ASGs. In Kubernetes, instead of deploying one container at a time, you deploy pods, which are groups of containers that are meant to be deployed together. For example, you could have a pod with one container to run a web app (e.g., the sample app) and another container that gathers metrics on the web app and sends them to a central service (e.g., Datadog). So this `template` block allows you to configure your pods, specifying what container(s) to run, the ports to use, environment variables to set, and so on.
5	Templates can be used separately from Deployments, so they have separate metadata which allows you to identify and target that template in API calls (this is another example of Kubernetes trying to be highly flexible and decoupled). The preceding code sets the "app" label to "sample-app-pods." You’ll see how this is used shortly.
6	Inside the pod template, you define one or more containers to run in that pod.
7	This simple example configures just a single container to run, giving it the name "sample-app."
8	The Docker image to run for this container. This is the Docker image you built earlier in the post.
9	This tells Kubernetes that the Docker image listens for requests on port 8080.
10	The `env` configuration lets you set environment variables for the container. The preceding code sets the `NODE_ENV` environment variable to "production" to tell the Node.js app and all its dependencies to run in production mode.
11	The `selector` block tells the Kubernetes Deployment what to target: that is, which pod template to deploy and manage. Why doesn’t the Deployment just assume that the pod defined within that Deployment is the one you want to target? Because Deployments and templates can be defined completely separately, so you always need to specify a `selector` to tell the Deployment what to target (this is yet another example of Kubernetes trying to be flexible and decoupled).

You can use the kubectl apply command to apply your Deployment configuration:

$ kubectl apply -f sample-app-deployment.yml

deployment.apps/sample-app-deployment created

This command should complete very quickly. How do you know if it actually worked? To answer that question, you can use kubectl to explore your cluster. First, run the get deployments command, and you should see your Deployment:

$ kubectl get deployments

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE

sample-app-deployment   3/3     3            3           1m

Here, you can see how Kubernetes uses metadata, as the name of the Deployment (sample-app-deployment) comes from your metadata block. You can use that metadata in API calls yourself. For example, to get more details about a specific Deployment, you can run describe deployment <NAME>, where <NAME> is the name from the metadata:

$ kubectl describe deployment sample-app-deployment

Name:                   sample-app-deployment

CreationTimestamp:      Mon, 15 Apr 2024 12:28:19 -0400

Selector:               app=sample-app-pods

Replicas:               3 desired | 3 updated | 3 total | 3 available

StrategyType:           RollingUpdate

MinReadySeconds:        0

RollingUpdateStrategy:  0 max unavailable, 3 max surge

(... truncated for readability ...)

This Deployment is reporting that all 3 replicas are available. To see those replicas, run the get pods command:

$ kubectl get pods

NAME                                     READY   STATUS    RESTARTS   AGE

sample-app-deployment-64f97797fb-hcskq   1/1     Running   0          4m23s

sample-app-deployment-64f97797fb-p7zjk   1/1     Running   0          4m23s

sample-app-deployment-64f97797fb-qtkl8   1/1     Running   0          4m23s

And to get the details about a specific pod, copy its name, and run describe pod:

$ kubectl describe pod sample-app-deployment-64f97797fb-hcskq

Name:             sample-app-deployment-64f97797fb-hcskq

Node:             docker-desktop/192.168.65.3

Start Time:       Mon, 15 Apr 2024 14:08:04 -0400

Labels:           app=sample-app-pods

                  pod-template-hash=64f97797fb

Status:           Running

IP:               10.1.0.31

Controlled By:  ReplicaSet/sample-app-deployment-64f97797fb

Containers:

  sample-app:

    Image:          sample-app:v1

    Port:           8080/TCP

    Host Port:      0/TCP

(... truncated for readability ...)

From this output, you can see the containers that are running for each pod, which in this case, is just one container per pod running the sample-app:v1 Docker image you built earlier.

You can also see the logs for a single pod by using the logs command, which is useful for understanding what’s going on and debugging:

$ kubectl logs sample-app-deployment-64f97797fb-hcskq

Listening on port 8080

Ah, there’s that familiar log output. You now have three replicas of your sample app running. But, just as you saw with server and VM orchestration, users will want just one endpoint to hit, so now it’s time to figure out how to do load balancing with Kubernetes.

Example: Deploy a Load Balancer with Kubernetes

Kubernetes has built-in support for load balancing. The typical way to set it up is to make use of another Kubernetes object, called a Kubernetes Service, which is a way to expose an app running in Kubernetes as a service you can talk to over the network. Example 50 shows the YAML code for a Kubernetes service, which you should put in a file called sample-app-service.yml:

Example 50. The YAML for a Kubernetes Service (ch3/kubernetes/sample-app-service.yml)

apiVersion: v1

kind: Service                    (1)

metadata:                        (2)

  name: sample-app-loadbalancer

spec:

  type: LoadBalancer             (3)

  selector:

    app: sample-app-pods         (4)

  ports:

    - protocol: TCP

      port: 80                   (5)

      targetPort: 8080           (6)

Here’s what this code does:

1	This Kubernetes object is a Service.
2	You have to configure metadata for every Kubernetes object. The preceding code sets the name of the Service to "sample-app-loadbalancer."
3	Configure the Service to be a load balancer.^[18] Under the hood, depending on what sort of Kubernetes cluster you’re running, and how you configure that cluster, the actual type of load balancer you get will be different: for example, if you run this code in AWS, you’ll get an AWS ELB; if you run it in GCP, you’ll get a Cloud Load Balancer; and if you run it locally, as you will shortly, you’ll get a simple load balancer that is built into the Kubernetes distribution in Docker Desktop.
4	Distribute traffic across the pods you defined in the Deployment.
5	The Service will receive requests on port 80, the default HTTP port.
6	The Service will forward requests to port 8080 of the pods.

You apply the Service the same way, using kubectl apply:

$ kubectl apply -f sample-app-service.yml

service/sample-app-loadbalancer created

To see if your service worked, use the get services command:

$ kubectl get services

NAME                     TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)

kubernetes               ClusterIP     10.96.0.1      <none>       443/TCP

sample-app-loadbalancer  LoadBalancer  10.111.250.21  localhost    80:30910/TCP

The first service in the list is Kubernetes itself, which you can ignore. The second is the Service you created, with the name sample-app-loadbalancer (based on its own metadata block). You can get more details about your service by using the describe service command:

$ kubectl describe service sample-app-loadbalancer

Name:                     sample-app-loadbalancer

Selector:                 app=sample-app-pods

Type:                     LoadBalancer

LoadBalancer Ingress:     localhost

Port:                     <unset>  80/TCP

TargetPort:               8080/TCP

(... truncated for readability ...)

You can see that the load balancer is listening on localhost, at port 80, so you can test it out by opening http://localhost in your browser, or using curl:

$ curl http://localhost

Hello, World!

Congrats, you’re now able to deploy Docker containers with Kubernetes and distribute traffic across your containers with a load balancer! But what if you want to update your app?

Example: Roll Out Updates with Kubernetes

Kubernetes Deployments have built-in support for rolling updates. Open up sample-app-deployment.yml and add the code shown in Example 51 to the bottom of the spec section:

Example 51. The YAML for doing rolling updates (ch3/kubernetes/sample-app-deployment.yml)

spec:



  # (... other params omitted for clarity ...)



  strategy:

    type: RollingUpdate

    rollingUpdate:

      maxSurge: 3

      maxUnavailable: 0

This configures the Deployment to do a rolling update where it can deploy up to 3 extra pods during the deployment, similar to the instance refresh you saw with ASGs. Run apply to update the Deployment with these changes:

$ kubectl apply -f sample-app-deployment.yml

deployment.apps/sample-app-deployment configured

Now, make a change to the sample app in docker/app.js, such as returning the text "Fundamentals of DevOps!" instead of "Hello, World!", as shown in Example 52:

Example 52. Update the app to respond with the text "Fundamentals of DevOps!" (ch3/docker/app.js)

res.end('Fundamentals of DevOps!\n')

To deploy this change, the first step is to build a new Docker image, with v2 as the new version:

$ docker build -t sample-app:v2 .

The build will likely run in less than a second! This is because Docker has a built-in build cache, which, if used correctly, can dramatically speed up builds.

Next, open sample-app-deployment.yml one more time, and in the spec section, update the image from sample-app:v1 to sample-app:v2, as shown in Example 53:

Example 53. Update the Deployment to use the v2 image (ch3/kubernetes/sample-app-deployment.yml)

spec:



  # (... other params omitted for clarity ...)



    spec:

      containers:

        - name: sample-app

          image: sample-app:v2

Run apply one more time to deploy this change:

$ kubectl apply -f sample-app-deployment.yml

deployment.apps/sample-app-deployment configured

In the background, Kubernetes will kick off the rolling update. If you run get pods during this process, you’ll see up to six pods running at the same time (three old, three new):

$ kubectl get pods

NAME                                     READY   STATUS    RESTARTS   AGE

sample-app-deployment-64f97797fb-pnh96   1/1     Running   0          15m

sample-app-deployment-64f97797fb-tmprp   1/1     Running   0          15m

sample-app-deployment-64f97797fb-xmjfl   1/1     Running   0          15m

sample-app-deployment-6c5ff6d6ff-fxqd4   1/1     Running   0          21s

sample-app-deployment-6c5ff6d6ff-hvwjx   1/1     Running   0          21s

sample-app-deployment-6c5ff6d6ff-krkcs   1/1     Running   0          21s

After a little while, the three old pods will be undeployed, and you’ll be left with just the new ones. At that point, the load balancer will be responding with "Fundamentals of DevOps!"

$ curl http://localhost

Fundamentals of DevOps!

Get your hands dirty

Using YAML and kubectl is a great way to learn Kubernetes, and I’m using it in the examples in this blog post to avoid introducing extra tools, but raw YAML is not a great choice for production usage. In particular, YAML doesn’t have support for variables, templating, for-loops, conditionals, and other programming language features that allow for code reuse.

Therefore, when using Kubernetes in production, instead of raw YAML, try out one of the following tools that can solve these gaps for you:

Example: Deploy a Kubernetes Cluster in AWS Using EKS

So far, you’ve been running Kubernetes locally, which is great for learning and testing. However, for production deployments, you’ll need to run a Kubernetes cluster on servers in a data center. Kubernetes is a complicated system: it’s more or less a cloud in and of itself, and setting it up and maintaining it is a significant undertaking. Fortunately, if you’re using the cloud, most cloud providers have managed Kubernetes offerings that make this considerably simpler. The one you’ll learn to use in this blog post series is Amazon’s Elastic Kubernetes Service (EKS), which can deploy and manage the control plane and worker nodes for you.

Watch out for snakes: EKS is not part of the AWS free tier!

While most of the examples in this book are part of the AWS free tier, Amazon EKS is not: as of June 2024, the pricing is $0.10 per hour for the control plane. So please be aware that running the examples in this section will cost you a little bit of money.

The blog post series’s sample code repo contains a module called eks-cluster in the ch3/tofu/modules/eks-cluster folder that you can use to deploy a simple EKS cluster, which includes the following:

A fully-managed control plane.
Fully-managed worker nodes. EKS supports several types of worker nodes; the eks-cluster module uses an EKS managed node group, which deploys worker nodes in an ASG, so you’re making use of VM orchestration in addition to container orchestration, although the VM orchestration is mostly invisible to you, as AWS handles all the details.
IAM roles with the minimal permissions required by the control plane and worker nodes. An IAM role is similar to an IAM user, in that it’s an entity in AWS that can be granted IAM permissions. However, unlike IAM users, IAM roles are not associated with any one person and do not have permanent credentials (password or access keys). Instead, a role can be assumed by other IAM entities, such as the EKS control plane. So an IAM role is a mechanism for granting those services permissions to make certain API calls in your AWS account.
Everything deploys into the Default VPC (see the note on Default VPCs in A Note on Default Virtual Private Clouds).

To use the eks-cluster module, create a new folder called live/eks-sample to use as a root module:

$ cd fundamentals-of-devops

$ mkdir -p ch3/tofu/live/eks-sample

$ cd ch3/tofu/live/eks-sample

Inside of the eks-sample folder, create a file called main.tf, with the contents shown in Example 54:

Example 54. Configure the eks-cluster module (ch3/tofu/live/eks-sample/main.tf)

provider "aws" {

  region = "us-east-2"

}



module "cluster" {

  source = "github.com/brikis98/devops-book//ch3/tofu/modules/eks-cluster"



  name        = "eks-sample"        (1)

  eks_version = "1.29"              (2)



  instance_type        = "t2.micro" (3)

  min_worker_nodes     = 1          (4)

  max_worker_nodes     = 10         (5)

  desired_worker_nodes = 3          (6)

}

The preceding code configures the following parameters:

1	`name`: The name to use for the control plane, worker nodes, and all other resources created by the module.
2	`eks_version`: The version of Kubernetes to use. A new version comes out roughly once per quarter.
3	`instance_type`: The type of instance to run for worker nodes.
4	`min_worker_nodes`: The minimum number of worker nodes to run.
5	`max_worker_nodes`: The maximum number of worker nodes to run.
6	`desired_worker_nodes`: The initial number of worker nodes to run.

To deploy the EKS cluster, authenticate to AWS as described in Authenticating to AWS on the command line, and run the following commands:

$ tofu init

$ tofu apply

After 3-5 minutes, the cluster should finish deploying. To explore the cluster with kubectl, you first need to authenticate to your cluster. The aws CLI has a built-in command for doing this:

aws eks update-kubeconfig --region <REGION> --name <CLUSTER_NAME>

Where <REGION> is the AWS region you deployed the EKS cluster into and <CLUSTER_NAME> is the name of the EKS cluster. The preceding code used us-east-2 and eks-tofu for these, respectively, so you can run the following:

aws eks update-kubeconfig --region us-east-2 --name eks-tofu

Once this is done, try running get nodes:

$ kubectl get nodes

NAME                                          STATUS   ROLES    AGE

ip-172-31-21-41.us-east-2.compute.internal    Ready    <none>   5m

ip-172-31-34-203.us-east-2.compute.internal   Ready    <none>   5m

ip-172-31-4-188.us-east-2.compute.internal    Ready    <none>   5m

This output looks a bit different from when you ran the command with the Kubernetes cluster from Docker Desktop. You should see three nodes, each of which is an EC2 instance in your managed node group.

The next step is to try deploying the sample app into the EKS cluster. However, there’s one problem: you’ve created a Docker image for the sample app, but that image only lives on your own computer. The EKS cluster in AWS won’t be able to fetch the image from your computer, so you need to push the image to a container registry that EKS can read from, as described in the next section.

Example: Push a Docker Image to ECR

There are a number of container registries out there, including Docker Hub, Amazon’s Elastic Container Registry (ECR), the Azure Container Registry, the Google Artifact Registry, the JFrog Docker Registry, and the GitHub Container Registry. If you’re using AWS, the easiest one to use is ECR, so let’s set that up.

For each Docker image you want to store in ECR, you have to create an ECR repository (ECR repo for short). The blog post series’s sample code repo includes a module called ecr-repo in the ch3/tofu/modules/ecr-repo folder that you can use to create an ECR repo.

To use the ecr-repo module, create a new folder called live/ecr-sample to use as a root module:

$ cd fundamentals-of-devops

$ mkdir -p ch3/tofu/live/ecr-sample

$ cd ch3/tofu/live/ecr-sample

In the ecr-sample folder, create a file called main.tf with the contents shown in Example 55:

Example 55. Configure the ecr-repo module (ch3/tofu/live/ecr-sample/main.tf)

provider "aws" {

  region = "us-east-2"

}



module "repo" {

  source = "github.com/brikis98/devops-book//ch3/tofu/modules/ecr-repo"



  name = "sample-app"

}

This code will create an ECR repo called "sample-app." Typically, the repo name should match your Docker image name.

You should also create outputs.tf with an output variable, as shown in Example 56:

Example 56. The ecr-sample module output variables (ch3/tofu/live/ecr-sample/outputs.tf)

output "registry_url" {

  value       = module.repo.registry_url

  description = "URL of the ECR repo"

}

The preceding code will output the URL of the ECR repo, which you’ll need to be able to push and pull images. To create the ECR repo, run the following commands:

$ tofu init

$ tofu apply

After a few seconds, you should see the registry_url output:

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.



Outputs:



registry_url = "111111111111.dkr.ecr.us-east-2.amazonaws.com/sample-app"

Copy down that registry_url value, as you’ll need it shortly.

Before you can push your Docker image to this ECR repo, you have to build the image with the right CPU architecture. Each Docker image you build is built for a specific CPU architecture: the docker build command, by default, builds for whatever CPU architecture you have on your own computer. For example, if you’re on a recent Macbook with an ARM CPU (e.g., the M1 or M2), your Docker images will be built for the arm64 architecture. This is a problem if you try to run those Docker images in the EKS cluster deployed by the eks-cluster module, as the t2.micro worker nodes in that cluster use the amd64 architecture, and won’t be able to run arm64 images.

Therefore, you need to ensure that you build your Docker images for whatever architecture(s) you plan to deploy onto. Fortunately, Docker now ships with the buildx command, which makes it easy to build Docker images for multiple architectures. The very first time you use buildx, you need to create a multi-platform-builder for your target architectures. For example, if you’re on an ARM64 Mac, and you’re going to be deploying onto AMD64 Linux servers, use the following command:

$ docker buildx create \

  --use \

  --platform=linux/amd64,linux/arm64 \

  --name multi-platform-builder

Now you can run the following command to build a Docker image of the sample app for both architectures (note the use of a new tag, v3, for these images):

$ docker buildx build \

  --platform=linux/amd64,linux/arm64 \

  -t sample-app:v3 \

  .

Once the build is done, to be able to push the Docker image to ECR, you need to tag it using the registry URL of the ECR repo that you got from the registry_url output:

docker tag \

  sample-app:v3 \

  <YOUR_ECR_REPO_URL>:v3

Next, you need to authenticate to your ECR repo, which you can do using a combination of the aws CLI and the docker CLI, making sure to replace the last argument with the registry URL of your own ECR repo that you got from the registry_url output:

$ aws ecr \

  get-login-password \

  --region us-east-2 | \

  docker login \

  --username AWS \

  --password-stdin \

  <YOUR_ECR_REPO_URL>

Finally, you can push the Docker image to your ECR repo:

$ docker push <YOUR_ECR_REPO_URL>:v3

The first time you push, it may take a minute or two to upload the image. Subsequent pushes, due to Docker’s layer caching, will be faster.

Example: Deploy a Dockerized App into an EKS Cluster

At this point, you are ready to deploy the sample app Docker image into the EKS cluster. The only change you need to make to the YAML you used to deploy locally is to switch the image in kubernetes/sample-app-deployment.yml to the v3 ECR repo URL, as shown in Example 57:

Example 57. Update the Deployment to use the Docker image from your ECR repo (ch3/kubernetes/sample-app-deployment.yml)

spec:



  # (... other params omitted for clarity ...)



    spec:

      containers:

        - name: sample-app

          image: <YOUR_ECR_REPO_URL>:v3

You can now apply both YAML files to deploy into your EKS cluster:

$ kubectl apply -f sample-app-deployment.yml

$ kubectl apply -f sample-app-service.yml

Deployments to an EKS cluster will take slightly longer than a local Kubernetes cluster, but after a minute or two, if you run the get pods command, you should see something like this:

$ kubectl get pods

NAME                                     READY   STATUS    RESTARTS   AGE

sample-app-deployment-59f5c6cd66-nk45z   1/1     Running   0          1m

sample-app-deployment-59f5c6cd66-p5jxz   1/1     Running   0          1m

sample-app-deployment-59f5c6cd66-pmjns   1/1     Running   0          1m

And if you run get services, you should see something like this:

NAME                     TYPE          EXTERNAL-IP                     PORT(S)

kubernetes               ClusterIP     <none>                          443

sample-app-loadbalancer  LoadBalancer  xx.us-east-2.elb.amazonaws.com  80:3225

If you look at the EXTERNAL-IP for sample-app-loadbalancer, you should see the domain name of an AWS ELB. Open this URL up in a web browser or using curl:

$ curl xx.us-east-2.elb.amazonaws.com

Fundamentals of DevOps!

If you get "Could not resolve host" errors, this is probably because the load balancer is still booting up or the health checks haven’t passed yet. Give it a minute or two more, and try again, and you should see the familiar "Fundamentals of DevOps!" text.

Congrats, you’re now running a Dockerized application in a Kubernetes cluster in AWS!

Get your hands dirty

Here are a few exercises you can try at home to get a better feel for using Kubernetes for container orchestration:

By default, if you deploy a Kubernetes Service of type LoadBalancer into EKS, EKS will create a Classic Load Balancer, which is an older type of load balancer that is not generally recommended anymore. In most cases, you actually want an Application Load Balancer (ALB), as you saw in the VM orchestration section. To deploy an ALB, you need to make a few changes, as explained in the AWS documentation.
Try terminating one of the worker node instances using the AWS Console. How does the ELB handle it? How does EKS respond? Do you need to do anything to restore the instance or your containers?
Try using kubectl exec to get a shell (like an SSH session) into a running container.

When you’re done experimenting with the EKS cluster, run tofu destroy on both the eks-cluster and ecr-repo modules to undeploy all your infrastructure. This ensures that your account doesn’t start accumulating any unwanted charges.

Serverless Orchestration

All the orchestration options you’ve seen so far have required you to think about and manage the servers you’re using, though a bit less with each step up the abstraction ladder. The idea behind serverless is to allow you to focus entirely on your app code, without having to think about servers at all. There are of course still servers there, but they are behind the scenes, and fully managed for you.

The original model referred to as "serverless" was Functions as a Service (FaaS), which works as follows:

Create a deployment package which contains just the source code to run one function (rather than a whole app).
Upload the deployment package to your serverless provider, which is typically a cloud provider like AWS, GCP, or Azure (although you can also use tools like Knative to add support for serverless in your on-prem Kubernetes cluster).
Configure the serverless provider to trigger your function in response to certain events: e.g., an HTTP request, a file upload, a new message in a queue.
When the trigger goes off, the serverless provider executes your function, passing it information about the event as an input, and, in some cases, taking the data the function returns as an output, and passing it on elsewhere (e.g., sending it as an HTTP response).
When you need to deploy an update, you create a new deployment package and upload it to the serverless provider, who will use it to respond to all future triggers.

Key takeaway #4

Serverless orchestration is an immutable infrastructure approach where you deploy and manage functions without having to think about servers at all.

There are a few key points that are easy to miss that make serverless with the FaaS model stand out from all the other orchestration options:

You focus on your code, not on the hardware: The goal of serverless is that you don’t have to think about the hardware at all. If your trigger goes off 1,000 times per second or once per year, it’s completely up to the serverless provider to manage the servers, clusters, auto scaling, and auto healing that are necessary to handle that load.
You focus on your code, not the OS: The deployment package only includes your app code. Notably, it does not include anything about the OS or other tooling. Running, securing, and updating the OS is completely handled by the serverless provider.
You get even more speed: Serverless deployments are even faster than containers: whereas you can expect the build and deploy cycle to take 5-10 minutes with VMs and 1-5 minutes with containers, with serverless, it can take less than a minute. This is because the deployment packages are often tiny, containing just a small amount of source code for one function, and there are no servers or clusters to spin up, so deployments are fast.
You get even more efficiency: Serverless can make even more efficient use of computing resources than containers: instead of scheduling long-running apps, you schedule short-running functions, which you can move around the cluster extremely quickly onto any server that has spare resources. That said, most of these benefits accrue to the cloud providers, but they do pass some of those cost savings down to the end-user too, offering serverless at incredibly low prices.^[19]
Pricing scales perfectly with usage: With server, VM, and container orchestration, you typically pay per hour to rent whatever hardware you need, even if that hardware is sitting completely idle. With serverless, you pay per invocation, so the pricing scales exactly with usage. If usage is high, you pay more, but if usage goes to zero, most serverless providers can scale to zero, which means you pay nothing.

While FaaS has some major benefits, it also typically comes with a number of limitations:

Size limits: There are usually limits on deployment package size, event payload size, and response payload size.
Time limits: There is usually a maximum amount of time that your functions can run for (e.g., 15 minutes with AWS Lambda).
Disk space: You typically only have a small amount of storage available locally, and it’s usually ephemeral, so you can’t store anything permanent on it.
Performance: Since the servers are hidden from you, you have very little control over the hardware that you’re using, which can make performance tuning difficult.
Debugging: You usually can’t connect to the servers directly (e.g. via SSH), which can make debugging difficult.
Cold starts: Serverless often struggles with cold starts, where on the first run, or the first run after a period of idleness, the serverless provider needs to download your deployment package and run it, which can take up to several seconds. For some use cases, such as responding to live HTTP requests, a delay of several seconds is unacceptably slow.
Long-running connections: Use cases such as database connection pools and WebSockets are typically more complicated with FaaS. For example, with AWS Lambda, if you want a database connection pool, you have to run an entirely separate service called the Amazon RDS Proxy.

The FaaS model of serverless first became prominent in 2015 with the release of AWS Lambda. It grew in popularity very quickly, and since then, other cloud providers have released their own FaaS offerings, such as GCP Cloud Functions and Azure serverless.

In fact, serverless has become so popular, that these days, the term is being applied not only to FaaS, but other models, too:

Google App Engine (GAE): Released in 2008, GAE predates AWS Lambda, and is perhaps the first serverless offering (though I don’t believe the term serverless was used back then), as it allowed you to deploy web apps without having to think about servers or clusters. However, this required that the apps were written in very specific ways: e.g., specific languages, frameworks, data stores, runtime limits, data access patterns, etc.
Serverless containers: A number of cloud providers these days allow you to run containers without having to manage the servers or clusters under the hood. For example, AWS Fargate lets you use Amazon EKS or Amazon ECS without having to run or manage any worker nodes yourself. Combining containers with serverless helps work around some of the limitations of FaaS: e.g., you can have long-running containers, which avoids issues with cold starts and long-running connections. However, this very same feature also nullifies the scale-to-zero benefits of serverless. Also, containers give you greater portability than serverless, as serverless depends on provider-specific deployment packages. However, containers are typically larger and container orchestration tools tend to be slower, so you lose some of the speed benefits, and containers include the OS and other tooling, so you have more maintenance work to do (that said, with containers, the OS kernel is shared with the underlying host, which is managed for you by the serverless provider).
Serverless databases: The term serverless is now being applied to databases too, such as Amazon Aurora Serverless. In this case, the term serverless typically implies two things. First, you can use these databases without having to worry about running or managing the underlying servers, hard-drives, etc. Second, these databases can typically scale to zero when not in use, so you don’t have to pay hourly to run a server when things are idle (however, you typically do still pay for data storage).

To get a feel for serverless, let’s try out what is arguably the most popular approach, which is AWS Lambda and FaaS. First, you’re going to deploy a Lambda function that can respond with "Hello, World!", and second, you’ll deploy API Gateway to trigger the Lambda function when HTTP requests come in.

Example: Deploy a Serverless Function with AWS Lambda

The blog post series’s sample code repo includes a module called lambda in the ch3/tofu/modules/lambda folder that can do the following:

Zip up a folder you specify into a deployment package.
Upload the deployment package as an AWS Lambda function.
Configure various settings for the Lambda function, such as memory, CPU, and environment variables.

To use the lambda module, create a live/lambda-sample folder to use as a root module:

$ cd fundamentals-of-devops

$ mkdir -p ch3/tofu/live/lambda-sample

$ cd ch3/tofu/live/lambda-sample

In the lambda-sample folder, create a file called main.tf with the contents shown in Example 58:

Example 58. Configure the lambda module (ch3/tofu/live/lambda-sample/main.tf)

provider "aws" {

  region = "us-east-2"

}



  name = "lambda-sample"         (1)



  src_dir = "${path.module}/src" (2)

  runtime = "nodejs20.x"         (3)

  handler = "index.handler"      (4)



  memory_size = 128              (5)

  timeout     = 5                (6)



  environment_variables = {      (7)

    NODE_ENV = "production"

  }



  # ... (other params omitted) ...



}

This code sets the following parameters:

1	`name`: The name to use for the Lambda function and all other resources created by this module.
2	`src_dir`: The directory which contains the code for the Lambda function. The `lambda` module will zip this folder up into a deployment package. Example 59 shows the contents of this folder.
3	`runtime`: The runtime used by this function. AWS Lambda supports several different runtimes, including Node.js, Python, Java, Ruby, and .NET, as well as the ability to create custom runtimes for all other languages.
4	`handler`: The handler or entrypoint to call your function. The format is `<FILE>.<FUNCTION>`, where `<FILE>` is the file in your deployment package and `<FUNCTION>` is the name of the function to call in that file. Lambda will pass this function the event information. The preceding code sets the handler to the `handler` function in index.js, which is shown in Example 59.
5	`memory_size`: The amount of memory to give the Lambda function. Adding more memory also proportionally increases the amount of CPU available, as well as the cost to run the function.
6	`timeout`: The maximum amount of time the Lambda function has to run. The timeout limit is 15 minutes.
7	`environment_variables`: Environment variables to set for the function. The preceding code sets the `NODE_ENV` environment variable to "production" to tell the Node.js app and all its dependencies to run in production mode.

Create a folder in lambda-sample/src, and inside that folder, create a file called index.js, which defines the handler, as shown in Example 59:

Example 59. The handler code in index.js (ch3/tofu/live/lambda-sample/src/index.js)

exports.handler = (event, context, callback) => {

  callback(null, {statusCode: 200, body: "Hello, World!"});

};

As you can see, this is a function that takes the event object as input and then uses the callback to return a response which is a 200 OK with the text "Hello, World!"

Deploy the lambda-sample module the usual way:

$ tofu init

$ tofu apply

apply should complete in just a few seconds: Lambda is fast! To see if it worked, open the Lambda console in your browser, click on the function called "sample-app-lambda," and you should see your function and the handler code, as shown in Figure 28:

Figure 28. The Lambda console shows your newly created function

Currently, the function has no triggers, so it doesn’t really do anything. You can manually trigger it by clicking the blue Test button. The console will pop up a box where you can enter test data in JSON format to send to the function as the event object; leave everything at its default value and and click the Invoke button. That should run your function and show you log output that looks similar to Figure 29:

Figure 29. The output from manually triggering the Lambda function with a test event

As you can see, your function has run, and responded with the expected 200 OK and "Hello, World!"

Triggering Lambda functions manually is great for learning and testing, but in the real world, if you want to build a serverless web app, you need to be able to have HTTP requests trigger your function, as described in the next section.

Example: Deploy an API Gateway in Front of AWS Lambda

You can configure a variety of events to trigger your Lambda function: e.g., you can have AWS automatically run your Lambda function each time a file is uploaded to Amazon’s Simple Storage Service (S3), or a new message is written to a queue in Amazon’s Simple Queue Service (SQS), or each time you get a new email in Amazon’s Simple Email Service (SES). So Lambda is a great choice for building event-driven systems and background processing jobs.

You can also configure AWS to trigger a Lambda function each time you receive an HTTP request in API Gateway, which is a managed service you can use to expose an entrypoint for your apps, managing routing, authentication, throttling, and so on. You can also use API Gateway to create serverless web apps.

The blog post series’s sample code repo includes a module called api-gateway in the ch3/tofu/modules/api-gateway folder that can deploy an HTTP API Gateway, a version of API Gateway designed for simple HTTP APIs, that knows how to trigger a Lambda function. Example 60 shows how to update the lambda-sample module to use the api-gateway module:

Example 60. Configure the api-gateway module to trigger the Lambda function (ch3/tofu/live/lambda-sample/main.tf)

module "function" {

  source = "github.com/brikis98/devops-book//ch3/tofu/modules/lambda"



  # ... (other params omitted) ...



}



module "gateway" {

  source = "github.com/brikis98/devops-book//ch3/tofu/modules/api-gateway"



  name               = "lambda-sample"              (1)

  function_arn       = module.function.function_arn (2)

  api_gateway_routes = ["GET /"]                    (3)

}

This code sets the following parameters:

1	`name`: The name to use for the API Gateway and all the other resources created by the module.
2	`function_arn`: The ARN of the Lambda function the API Gateway should trigger when it gets HTTP requests. This is set to an output from the `lambda` module you configured earlier.
3	`api_gateway_routes`: The routes that should trigger the Lambda function. The preceding code configures an HTTP `GET` to the `/` path to as the only route.

You should also add an output variable in outputs.tf, as shown in Example 61:

Example 61. The lambda-sample module’s outputs (ch3/tofu/live/lambda-sample/outputs.tf)

output "api_endpoint" {

  value = module.gateway.api_endpoint

}

This code will give you the API Gateway’s domain name as an output.

Deploy the updates:

$ tofu init

$ tofu apply

When apply completes, you should see the api_endpoint output:

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.



Outputs:



api_endpoint = "https://iome6ldq7i.execute-api.us-east-2.amazonaws.com"

Open this output in a web browser, and you should see "Hello, World!" Congrats, API Gateway is now routing requests to your Lambda function! As load goes up and down, AWS will automatically scale your Lambda functions up and down, and API Gateway will automatically distribute traffic across these functions. And when there’s no load, your Lambda function will automatically scale to zero, so it won’t cost you a cent.

Example: Roll Out Updates with AWS Lambda

By default, AWS Lambda natively supports a nearly instantaneous deployment model: that is, if you upload a new deployment package, all new requests will start executing the code in that deployment package more or less immediately.

For example, try updating lambda-sample/src/index.js to respond with "Fundamentals of DevOps!" rather than "Hello, World!", as shown in Example 62:

Example 62. Update the Lambda function response text (ch3/tofu/live/lambda-sample/src/index.js)

exports.handler = (event, context, callback) => {

  callback(null, {statusCode: 200, body: "Fundamentals of DevOps!"});

};

Re-run apply to deploy these changes:

$ tofu apply

apply should complete in a few seconds, and if you retry the api_endpoint URL, you’ll see "Fundamentals of DevOps!" right away. So again, deployments with Lambda are fast! In fact, AWS Lambda does an instantaneous switchover from the old to the new version, so it’s effectively a blue-green deployment (which you’ll learn more about in Part 5).

Get your hands dirty

To avoid introducing too many new tools, this blog post uses OpenTofu to deploy Lambda functions, which works great for functions used for background jobs and event processing, but for serverless web apps, where you use a mix of Lambda functions and API Gateway, the OpenTofu code can get very verbose (especially the API Gateway parts). Moreover, if you’re using OpenTofu to manage a serverless webapp, you have no easy way to run or test that webapp (especially the API Gateway endpoints) locally.

If you’re going to be building serverless web apps for production use cases, try out one of the following tools instead, as they are purpose-built for serverless web apps, keep the code more concise, and give you ways to test locally:

When you’re done experimenting with the serverless code, run tofu destroy to undeploy all your infrastructure. This ensures that your account doesn’t start accumulating any unwanted charges.

Comparing Orchestration Options

You’ve now seen the most common approaches to orchestration: server orchestration, VM orchestration, container orchestration, and serverless orchestration. Table 6 shows how these orchestration approaches compare in their ability to solve the core orchestration problems introduced in the beginning of the blog post:

Lossy compression

As there are dozens of different tools within each orchestration category, the tables in this section only try to show what you should expect from the typical tools in each category. Think of these tables as compressed guides to the strengths & weaknesses of each category, but be aware that, in the effort to compress this information, some of the variation within a category inevitably gets lost.

Table 6. How orchestration approaches compare in terms of the core orchestration problems
Problem	Server orchestration	VM orchestration	Container orchestration	Serverless orchestration
Deployment	Manual Manually specify which servers should run which apps.	Supported Define a template and the orchestrator spins up servers from that template.	Strong support Set up worker nodes, define a template, and the orchestrator schedules containers on the worker nodes.	Strong support Upload a deployment package and let the orchestration tool run it whenever it is triggered.
Update strategies	Supported Limited strategies: e.g., Ansible rolling deployments.	Supported Limited strategies: e.g., ASG rolling deployments.	Strong support Multiple strategies: e.g., rolling, canary, blue-green.^[20]	Strong support Multiple strategies: e.g., Lambda supports blue-green, canary, traffic shifting.
Scheduling	Not supported There is no scheduler built-in.	Supported A scheduler decides which VMs run where. As an end-user, you see (and pay for) one VM per server.	Strong support A scheduler decides which containers run where. As an end-user, you see (and pay for) servers, but you get to run multiple containers per server.	Strong support A scheduler decides where to run your deployment package. As an end-user, you see (and pay for) functions, not servers.
Rollback	Not supported Mutable infrastructure practices have side effects, so there’s no automatic "undo." You always fix forward.	Strong support With immutable infrastructure, if you hit an error with the new version, you go back to the old version.	Strong support With immutable infrastructure, if you hit an error with the new version, you go back to the old version.	Strong support With immutable infrastructure, if you hit an error with the new version, you go back to the old version.
Auto scaling	Not supported The number of servers is fixed, and only changes manually.	Supported E.g., AWS ASGs supports auto scaling servers based on metrics, schedules, and historical patterns.	Supported E.g., Kubernetes supports auto scaling of both pods and nodes based on metrics, schedules, and events.	Strong support E.g., AWS Lambda handles scaling for you, including scale to zero, without you having to think about it at all.^[21]
Auto healing	Not supported You have to manually restore servers and use process supervisors.	Supported E.g., ASGs automatically replace instances that crash or fail ELB health checks.	Supported E.g., Kubernetes automatically replaces nodes that crash or pods that fail any one of a variety of health checks (known as probes).	Strong support E.g., AWS Lambda handles auto healing of servers without you having to think about it at all.
Configuration	Strong support E.g., Ansible has support for variables, roles, templates, inventories, etc.	Supported E.g., Create an OpenTofu module that exposes variables to configure ASGs for different environments.	Strong support E.g., Kubernetes supports ConfigMaps, which give you a way to pass arbitrary key-value pairs to your apps.	Strong support E.g., Lambda functions can get configuration from environment variables and the SSM Parameter Store.
Secrets management	Supported E.g., Use Ansible Vault to encrypt and manage sensitive data.	Manual You typically have to handle this yourself: e.g., have your app read from a secret store during boot.	Strong support E.g., Kubernetes supports Secrets as a way to pass sensitive data to your apps.	Strong support E.g., AWS Lambda can automatically fetch secrets from AWS Secrets Manager.
Load balancing	Manual E.g., Manually deploy Nginx.	Strong support E.g., Use AWS ASGs with ALBs.	Strong support E.g., Use Kubernetes Services with Kubernetes Deployments.	Strong support E.g., Use API Gateway to trigger Lambda functions in response to HTTP requests.
Service communication	Manual E.g., Have Ansible pass the IP addresses of servers in its inventory to your apps.	Manual E.g., You can use load balancers between ASGs, using AWS APIs to discover load balancer URLs.	Strong support E.g., Use a Kubernetes Service to expose your app on a private IP within the cluster, and then discover IPs using environment variables or DNS.	Strong support E.g., Lambda functions can trigger other Lambda functions either directly via API calls or indirectly via events.^[22]
Disk management	Manual Manually attach and manage hard drives.	Supported Ephemeral disks are typically supported, but permanent disks have to be managed manually.^[23]	Strong support E.g., Kubernetes supports both Volumes and Persistent Volumes.	Not supported E.g., The file system for Lambda functions is read-only. If you need to store data, you must use an external data store.

While the core orchestration problems define what an orchestration tool should do, it’s also important to consider how they do it. As you used the various orchestration approaches in this blog post, you probably saw that they varied across a number of other dimensions, such as speed, ease of learning, and so on. Table 7 shows how the different orchestration approaches compare across these dimensions, which I’ll refer to as the core orchestration attributes:

Table 7. How orchestration approaches compare in terms of core orchestration attributes
Dimension	Server orchestration	VM orchestration	Container orchestration	Serverless orchestration
Deployment speed	Weak Simple code changes: 5-30 minutes. Major dependency or OS upgrades: 30-60 minutes.	Moderate Building a new VM image and rolling it out: 5-30 minutes.	Strong Building a new container image and rolling it out: 1-5 minutes.	Very strong Building a new deployment package and rolling it: 1 minute.
Maintenance	Weak You have to maintain the servers, the OS and tools on each server, and the orchestration tool itself (e.g., Chef servers and agents).	Moderate You have to maintain the virtual servers and the OS and tools in each VM image.	Weak You have to maintain the servers, the OS and tools on each server and Docker image, and the orchestration tool itself (e.g., quarterly Kubernetes upgrades).	Very strong There are no servers, no OS, and no orchestration tools to maintain.
Ease of learning	Strong Most people understand this model quickly (a few days).	Strong Most people understand this model quickly (a few days).	Weak Most people understand containers quickly, but container orchestration tools, especially Kubernetes, take a long time to learn (a few weeks).	Very strong Most people understand this model very quickly (less than a day).
Dev/prod parity	Weak It’s rare to use a server orchestration tool (e.g., Ansible) in your local dev environment.	Weak You can’t run most VM orchestration tools (e.g., AWS ASG) in your local dev environment.	Very strong It’s very common to run Docker containers in your local dev environment.	Very strong It’s very common to run serverless apps in your local dev environment.
Maturity	Strong The oldest approach, with large, open source communities (e.g., Ansible, Chef, Puppet), so you get many person-years of maturity.	Moderate The second-oldest approach, but mostly proprietary (e.g., AWS ASGs), so not as many person-years of maturity as the age would suggest.	Strong A newer approach, but with massive, open source communities (especially Kubernetes), so you get many person-years of maturity.	Weak The youngest approach, and mostly proprietary (e.g., AWS Lambda), so not mature at all.
Debugging	Strong Full access to the servers and no extra layers of abstraction makes debugging easier, but mutable infrastructure practices (and configuration drift) make debugging harder.	Very strong Full access to the virtual servers, a simple abstraction layer, and immutable VM images all make debugging easier.	Weak Full access to the servers and immutable container images make debugging easier, but multiple layers of abstraction, and the complexity of orchestration tools make debugging challenging.	Weak No access to the servers, and everything is abstracted away from you, which can make debugging very challenging.
Long-running tasks	Very strong Typically, long-running tasks work fine.	Very strong Typically, long-running tasks work fine.	Very strong Typically, long-running tasks work fine.	Weak Limits on runtimes and numerous hoops to jump through for long-running connections.
Performance tuning	Very strong Full control over the hardware.	Strong Full control over the virtualized hardware. However, you may hit the noisy neighbor problem: other VMs running on the same underlying physical server sometimes cause performance issues.	Moderate The same trade-offs as VMs for worker nodes, plus the added layer of containers, which makes the noisy neighbor problem and performance tuning more complicated.	Weak No control over the underlying hardware, plus the additional challenge of cold starts, so performance tuning is very challenging.

I hope that next time you need to deploy an app, you can use Table 6 and Table 7 to pick the right tool for the job.

Conclusion

You now know how to run your apps in a way that more closely handles the demands of production, including using multiple replicas to avoid having a single point of failure, deploying load balancers to distribute traffic across the replicas, and using deployment strategies to roll out updates to your replicas without downtime. You’ve seen a number of orchestration approaches for handling all of this, summarized via the 4 takeaways from this Part:

Server orchestration is an older, mutable infrastructure approach where you have a fixed set of servers that you maintain and update in place.
VM orchestration is an immutable infrastructure approach where you deploy and manage VM images across virtualized servers.
Container orchestration is an immutable infrastructure approach where you deploy and manage container images across a cluster of servers.
Serverless orchestration is an immutable infrastructure approach where you deploy and manage functions without having to think about servers at all.

As you worked your way through the first few parts of this blog post series, you wrote and executed a bunch of code, including Node.js, Ansible, OpenTofu, Docker, YAML, and so on. So far, you’ve been working on all this code alone, but in the real world, you’ll most likely need to work on code with a whole team of developers. How do you collaborate on code as a team so you aren’t constantly overwriting each other’s changes? How do you minimize bugs and outages? How do you package and deploy your changes on a regular basis? These questions are the focus of Part 4, How to Version, Build, and Test Your Code.

Part 3. How to Deploy Many Apps: Orchestration, VMs, Containers, and Serverless

An Introduction to Orchestration

Server Orchestration

Example: Deploy Multiple Servers in AWS Using Ansible

Example: Deploy an App Securely and Reliably Using Ansible

Example: Deploy a Load Balancer Using Ansible and Nginx

Example: Roll Out Updates with Ansible

VM Orchestration

Example: Build a VM Image Using Packer

Example: Deploy a VM Image in an Auto Scaling Group Using OpenTofu

Example: Deploy an Application Load Balancer Using OpenTofu

Example: Roll Out Updates with OpenTofu and Auto Scaling Groups

Container Orchestration

Example: A Crash Course on Docker

Example: Create a Docker Image for a Node.js app

Example: Deploy a Dockerized App with Kubernetes

Example: Deploy a Load Balancer with Kubernetes

Example: Roll Out Updates with Kubernetes

Example: Deploy a Kubernetes Cluster in AWS Using EKS

Example: Push a Docker Image to ECR

Example: Deploy a Dockerized App into an EKS Cluster

Serverless Orchestration

Example: Deploy a Serverless Function with AWS Lambda

Example: Deploy an API Gateway in Front of AWS Lambda

Example: Roll Out Updates with AWS Lambda

Comparing Orchestration Options

Conclusion

Join the Fundamentals of DevOps Newsletter!

Platform

Resources

Company