Deploying a Secure Django App on AWS ECS Using Terraform and GitHub Actions

The Problem We’re Solving

In modern DevOps, deploying applications securely is non-negotiable—especially when dealing with production workloads.

But here’s the problem: Most ECS tutorials focus on “how to deploy fast” instead of “how to deploy securely.”

This project intentionally takes the security-first path, accepting some additional cost and complexity to achieve:

  • Private networking for containers

  • Secure image pulls without exposing the VPC to the public internet

  • Encrypted, managed databases

  • Clear separation between DevOps automation and runtime operations

That’s the problem this project solves.


Architecture Explained

High-Level Overview

Django App (Dockerized)

The application

Terraform

Infrastructure as Code (IaC)

ECS Fargate

Serverless container orchestration

RDS (PostgreSQL)

Managed database

ALB (Application Load Balancer)

Frontend routing

VPC Endpoints (Interface)

Private networking for ECR, S3, CloudWatch

CloudWatch Logs

Centralized logging

GitHub Actions

CI/CD pipeline

Docker + ECR

Container image build & storage


Architecture Diagram


File Structure Overview

Workflow Breakdown

Build

Docker image creation

Push

Push image to ECR

Infrastructure

Terraform apply ECS, RDS, ALB

Deploy

ECS pulls image and serves app

Destroy

Optional cleanup step

Exit

Workflow cancellation


Explanation of the Chosen Services

ECS Fargate

Security: No SSH, runs in private subnet, AWS-managed runtime. Cost: Pay per vCPU and memory; slightly more expensive for long-running tasks than EC2. HA: Automatically spans multiple AZs. Complexity: Easier to operate but less control over OS-level configs.

Application Load Balancer (ALB)

Security: Terminates HTTPS, handles SSL/TLS certificates securely. Cost: ~$18–$20/month base + traffic costs. HA: Regional service, auto-scales and load balances across AZs.

Complexity: Adds config overhead if using path-based routing or multiple target groups.

VPC with Public/Private Subnets

Security: Public ALB, private ECS tasks & RDS. Minimizes surface area. Cost: No direct cost, but subnet design affects resource placement and networking choices. HA: Subnets in multiple AZs for failover. Complexity: More complex Terraform code; requires careful design to avoid misconfiguration.

VPC Endpoints (S3, ECR, Logs, Secrets Manager)

Security: Keeps traffic private; no internet exposure for pulls/logs. Cost: ~$7.3/month per interface endpoint (e.g., 4 endpoints = ~$29.2). HA: Requires per-AZ deployment for true HA. Complexity: Each service needs a separate endpoint; setup can get messy fast.

Amazon RDS (PostgreSQL Multi-AZ)

Security: Encrypted at rest & in transit; runs in private subnet. Cost: ~$30–$40/month for dev size; production costs much higher. HA: Multi-AZ failover, automated backups. Complexity: No OS-level access; bound to AWS maintenance windows.

CloudWatch Logs (via VPC Endpoint)

Security: Logs sent privately via VPC endpoint. Cost: ~$7.3/month for endpoint + $0.50/GB logs. HA: AWS-managed; no single point of failure.Complexity: Needs careful retention management or costs can spiral.


Code Explanation

Containerize Django App

App File structure

In ./dockerfile section:

Started To copy important files and install the right dependencies that application need to work.

Used Slim distro and Multi-stage to tried to minimize the use of the RUN & COPY command so i make less layers in the Image.

In ENTRYPOINT Used Script that will run the App right.

In ./entrypoint.sh section:

Commands that the app need after communicate with Dev Team.

circle-info

In the Creating user is for testing prupose


Infrastructure Using Terraform

Infrastructure File structure

Modules

Subnets

Anything I use in VPC Network is in this Directory/Folder.

In subnets/variables.tf section:

This is all variable that subnets need to work.

In subnets/main.tf Section:

Configured 4 Subnets

  • 2 Public Subnets

  • 2 Private Subnets

Used Count so i can repeat the creation of Subnet Twice, also Used cidrsubnet()

Configured 4 Endpoints

  • 3 Interface endpoint (ECS dkr, ECS API, CloudWatch Logs)

    • DKR => for Push/Pull Image Predefined URL from ECR

    • API => for ECS To request Tokens from ECR

    • Logs => for troubleshooting if there is any error in the ECS Image after pull

  • 1 Gateway (S3)

    • S3 => for ECS so it can request the Layers after taking the Predefined URL from the ECR

Used vpc_id , region , and vpc_endpoint_sg as Variable because they fill by the root/main.tf

Configured Two Route table

  • One Public Route table

  • One Private Route table

Used Public Route table to connect to Public subnet and when configure Application Load balancer the users can access it via Internet gateway.

Used Private Route table to connect Private subnet so I can secure the ECS and RDS DB.

Configured One Internet Gateway

Used Internet Gateway, so User can access the Application through it.

In subnets/output.tf section:

the output that I will pass it to other services


Computes

Anything I use in computes Services is in this Directory/Folder.

In computes/variable.tf

This is all variable that ECS need to work.

In computes/main.tf section:

Configured ECR with Dynamic name and pass it when i need it in other service.

Created ECS Cluster, and ECS Service

Used data aws_caller_identity{} to get the Account ID, the uses of it when you have a Cross-account ECR.

Used aws_ecs_task_definition to set up the Container settings like what cluster it will be in it, resources need it, IAM, Image URL, and finally environment variable or Secrets of the app in the image.

triangle-exclamation

Used aws_cloudwatch_log_group to get logs of the images.

Used "aws_iam_role" "ecs_task_execution_role" to tell which service could use this policies that i will attach to it later.

circle-info

Created aws_iam_role twice because it's the best practice of AWS to make this, as AWS says -> Split execution role and task role to enforce least privilege.

In ecs_execution_role_policy starts to attach the Policies that the ECS will need for Pull,Push, Authorization, and etc...

In ecs_task_s3_policy from the name its obvious what it will do, as it will Get/List objects from S3, in the ECS it will Get the Image Layer.

Used aws_lb to create ALB. Used aws_lb_target_group to target the ECS with the port dynamic port with the container.

In computes/output.tf section:

Finally, Used output block ecr_repo_url to get ECR URL to pass it to other service.


Root Configurations

Let's get out of Modules thing and Integrate every configuration we did.

In ./variable.tf section:

This is all variable that All services need to work.

In ./main.tf section:

Configured VPC to connect all subnets in Same network, and Used aws_default_route_table to attach every subnets to it so they can communicate together easily.

Used module "subnet" to call our resources from Deploying a Secure Django App on AWS ECS Using Terraform and GitHub Actions Section.

Used module "computes" to call our resources from Deploying a Secure Django App on AWS ECS Using Terraform and GitHub Actions Section.

Configured Postgresql with need it requirment to be High available, Secure ,and Backup for disaster recovery.

Configured Security Group with the least configuration to Minimize security group rules to reduce attack surface.

In ./output.tf section:

Used the outputs to pass the important URLs and environment through Pipeline/Workflow. (It will come later in Workflow Using Github Action Section.)

In ./terrafirn-prod.tfvars or ./terrafirn-dev.tfvars (whatever stage you will use) Section:

the Values of the Variable in ./variable.tf section.


Workflow Using Github Action

Workflow File Structure:

In ./github/workflows/workflow.yml section:

It starts with a workflow_dispatch, meaning the pipeline is only triggered manually. The person triggering it must choose two inputs: action (apply or destroy) and approve (approve or dont). This double confirmation prevents accidental deployments or infrastructure destruction. It acts as a safety lock to avoid surprises in production.

In env start to Pass necessary inputs for the workflow.

Under Jobs: we will find infrastructure , build , push , destroy , and exit

In infrastructure :

This job runs only when both action is set to apply and approve is set to approve. Inside, it initializes Terraform and deploys the AWS infrastructure needed for your Django app. That usually means ECS Cluster, Application Load Balancer, Security Groups, RDS databases, and ECR repositories. Terraform reads the Docker image name from the environment, but the actual image does not exist yet at this point the pipeline is just setting up the infrastructure shell. The Terraform apply uses terraform-prod.tfvars, which likely contains production configuration like instance sizes, DB passwords (hopefully through variables or secrets), and VPC IDs. This separation allows infrastructure to be provisioned first, independently of the Docker build process.

Next comes the build job, which also runs only if action is apply and approve is approve. Its purpose is to build the Docker image for the Django application. It uses docker build to create the image locally and saves it as image.tar. Instead of pushing the image right away, the pipeline uploads it as an artifact using GitHub's actions/upload-artifact. This allows the push job to download and reuse the same image later, ensuring consistency between build and deployment. It also avoids rebuilding the same image multiple times if other steps fail, which is good for debugging and repeatability, but using docker save/load is slower compared to building and pushing directly in a single step.

circle-exclamation

After building the image, the push job starts. This job depends on both infrastructure and build jobs completing successfully. It first downloads the previously built image.tar from GitHub's artifact storage. Then it uses Terraform outputs to get the dynamically created ECR repository URL. This is important because the infrastructure layer controls where the image is supposed to go, and the workflow doesn't hardcode the ECR URL. After that, it loads the Docker image from image.tar, tags it with the ECR repo URL, and pushes it to AWS ECR. At this point, the ECS service can pull the image directly from ECR in future deploys. This separation between build and push makes the workflow more flexible, but also introduces a risk: if Terraform outputs are wrong or missing (for example, ecr_repo_url is not properly set), the push will fail even if the image build succeeded.

The destroy job handles the infrastructure teardown. It only runs if action is destroy and approve is approve. This job checks out the repo, configures AWS credentials, initializes Terraform, creates a destroy plan, and then applies it. This process destroys everything: ECS services, ALB, RDS, ECR repos, and any other AWS resources managed by Terraform. The use of terraform plan -destroy ensures the operator can preview the destruction before applying if needed, but here the plan is auto-applied in one workflow run after approval. This is efficient but risky if not carefully monitored because resources are deleted immediately after confirmation.

Finally, the exit job handles the case where someone triggers the pipeline but selects approve as dont. Instead of failing silently, this job runs a simple echo "Action denied by reviewer." to make it clear that the workflow was intentionally aborted by human decision. This improves transparency in CI/CD logs, so that others reviewing the workflow understand it wasn’t a failure—it was a conscious choice not to proceed.

In the bigger picture, this pipeline is designed for safe, manual deployments rather than continuous integration or fast delivery. It prioritizes control over speed. All jobs are isolated: Terraform infra setup is separated from the Docker build and ECR push to reduce coupling and improve troubleshooting. However, there are tradeoffs. Docker artifacts are saved and loaded across jobs, which slows down the process compared to direct ECR pushes. There's no tagging strategy beyond latest, so production deployments might accidentally overwrite images. Also, there’s no rollback if something fails after infrastructure is created but before the image is pushed. This pipeline is good for environments where you want to prevent mistakes more than you want speed, but in mature CI/CD pipelines, you might eventually automate some parts while keeping approval only for destructive actions like destroy.


Final Thoughts

Security is not free. But breaches cost more.

This setup prioritizes security and high availability, even if that means paying for:

  • VPC endpoints

  • Multi-AZ RDS

  • Load Balancing

If you’re building something serious—not just a hobby project—this trade-off is justified.


Code Repository

Source Codearrow-up-right


Contributors

Omar Tamer(Me)arrow-up-right


Discussion

Would you choose lower cost with higher risk, or pay for security and redundancy upfront?

Let us know your thoughts!

Last updated