Deploying a Secure Django App on AWS ECS Using Terraform and GitHub Actions
The Problem We’re Solving
In modern DevOps, deploying applications securely is non-negotiable—especially when dealing with production workloads.
But here’s the problem: Most ECS tutorials focus on “how to deploy fast” instead of “how to deploy securely.”
This project intentionally takes the security-first path, accepting some additional cost and complexity to achieve:
Private networking for containers
Secure image pulls without exposing the VPC to the public internet
Encrypted, managed databases
Clear separation between DevOps automation and runtime operations
That’s the problem this project solves.
Architecture Explained
High-Level Overview
Django App (Dockerized)
The application
Terraform
Infrastructure as Code (IaC)
ECS Fargate
Serverless container orchestration
RDS (PostgreSQL)
Managed database
ALB (Application Load Balancer)
Frontend routing
VPC Endpoints (Interface)
Private networking for ECR, S3, CloudWatch
CloudWatch Logs
Centralized logging
GitHub Actions
CI/CD pipeline
Docker + ECR
Container image build & storage
Architecture Diagram

File Structure Overview
Workflow Breakdown
Build
Docker image creation
Push
Push image to ECR
Infrastructure
Terraform apply ECS, RDS, ALB
Deploy
ECS pulls image and serves app
Destroy
Optional cleanup step
Exit
Workflow cancellation
Explanation of the Chosen Services
ECS Fargate
Security: No SSH, runs in private subnet, AWS-managed runtime. Cost: Pay per vCPU and memory; slightly more expensive for long-running tasks than EC2. HA: Automatically spans multiple AZs. Complexity: Easier to operate but less control over OS-level configs.
Application Load Balancer (ALB)
Security: Terminates HTTPS, handles SSL/TLS certificates securely. Cost: ~$18–$20/month base + traffic costs. HA: Regional service, auto-scales and load balances across AZs.
Complexity: Adds config overhead if using path-based routing or multiple target groups.
VPC with Public/Private Subnets
Security: Public ALB, private ECS tasks & RDS. Minimizes surface area. Cost: No direct cost, but subnet design affects resource placement and networking choices. HA: Subnets in multiple AZs for failover. Complexity: More complex Terraform code; requires careful design to avoid misconfiguration.
VPC Endpoints (S3, ECR, Logs, Secrets Manager)
Security: Keeps traffic private; no internet exposure for pulls/logs. Cost: ~$7.3/month per interface endpoint (e.g., 4 endpoints = ~$29.2). HA: Requires per-AZ deployment for true HA. Complexity: Each service needs a separate endpoint; setup can get messy fast.
Amazon RDS (PostgreSQL Multi-AZ)
Security: Encrypted at rest & in transit; runs in private subnet. Cost: ~$30–$40/month for dev size; production costs much higher. HA: Multi-AZ failover, automated backups. Complexity: No OS-level access; bound to AWS maintenance windows.
CloudWatch Logs (via VPC Endpoint)
Security: Logs sent privately via VPC endpoint. Cost: ~$7.3/month for endpoint + $0.50/GB logs. HA: AWS-managed; no single point of failure.Complexity: Needs careful retention management or costs can spiral.
Code Explanation
Containerize Django App
App File structure
In ./dockerfile section:
Started To copy important files and install the right dependencies that application need to work.
Used Slim distro and Multi-stage to tried to minimize the use of the RUN & COPY command so i make less layers in the Image.
In ENTRYPOINT Used Script that will run the App right.
In ./entrypoint.sh section:
Commands that the app need after communicate with Dev Team.
In the Creating user is for testing prupose
Infrastructure Using Terraform
Infrastructure File structure
Modules
Subnets
Anything I use in VPC Network is in this Directory/Folder.
In subnets/variables.tf section:
This is all variable that subnets need to work.
In subnets/main.tf Section:
Configured 4 Subnets
2 Public Subnets
2 Private Subnets
Used Count so i can repeat the creation of Subnet Twice, also Used cidrsubnet()
Configured 4 Endpoints
3 Interface endpoint (ECS dkr, ECS API, CloudWatch Logs)
DKR => for Push/Pull Image Predefined URL from ECR
API => for ECS To request Tokens from ECR
Logs => for troubleshooting if there is any error in the ECS Image after pull
1 Gateway (S3)
S3 => for ECS so it can request the Layers after taking the Predefined URL from the ECR
Used vpc_id , region , and vpc_endpoint_sg as Variable because they fill by the root/main.tf
Configured Two Route table
One Public Route table
One Private Route table
Used Public Route table to connect to Public subnet and when configure Application Load balancer the users can access it via Internet gateway.
Used Private Route table to connect Private subnet so I can secure the ECS and RDS DB.
Configured One Internet Gateway
Used Internet Gateway, so User can access the Application through it.
In subnets/output.tf section:
the output that I will pass it to other services
Computes
Anything I use in computes Services is in this Directory/Folder.
In computes/variable.tf
This is all variable that ECS need to work.
In computes/main.tf section:
Configured ECR with Dynamic name and pass it when i need it in other service.
Created ECS Cluster, and ECS Service
Used data aws_caller_identity{} to get the Account ID, the uses of it when you have a Cross-account ECR.
Used aws_ecs_task_definition to set up the Container settings like what cluster it will be in it, resources need it, IAM, Image URL, and finally environment variable or Secrets of the app in the image.
The ECS task definition depends on the ECR repository to get the image URL, which is why aws_ecs_task_definition cannot be created before the ECR repository is initialized.
Used aws_cloudwatch_log_group to get logs of the images.
Used "aws_iam_role" "ecs_task_execution_role" to tell which service could use this policies that i will attach to it later.
Created aws_iam_role twice because it's the best practice of AWS to make this, as AWS says -> Split execution role and task role to enforce least privilege.
In ecs_execution_role_policy starts to attach the Policies that the ECS will need for Pull,Push, Authorization, and etc...
In ecs_task_s3_policy from the name its obvious what it will do, as it will Get/List objects from S3, in the ECS it will Get the Image Layer.
Used aws_lb to create ALB. Used aws_lb_target_group to target the ECS with the port dynamic port with the container.
In computes/output.tf section:
Finally, Used output block ecr_repo_url to get ECR URL to pass it to other service.
Root Configurations
Let's get out of Modules thing and Integrate every configuration we did.
In ./variable.tf section:
This is all variable that All services need to work.
In ./main.tf section:
Configured VPC to connect all subnets in Same network, and Used aws_default_route_table to attach every subnets to it so they can communicate together easily.
Used module "subnet" to call our resources from Deploying a Secure Django App on AWS ECS Using Terraform and GitHub Actions Section.
Used module "computes" to call our resources from Deploying a Secure Django App on AWS ECS Using Terraform and GitHub Actions Section.
Configured Postgresql with need it requirment to be High available, Secure ,and Backup for disaster recovery.
Configured Security Group with the least configuration to Minimize security group rules to reduce attack surface.
In ./output.tf section:
Used the outputs to pass the important URLs and environment through Pipeline/Workflow. (It will come later in Workflow Using Github Action Section.)
In ./terrafirn-prod.tfvars or ./terrafirn-dev.tfvars (whatever stage you will use) Section:
the Values of the Variable in ./variable.tf section.
Workflow Using Github Action
Workflow File Structure:
In ./github/workflows/workflow.yml section:
It starts with a workflow_dispatch, meaning the pipeline is only triggered manually. The person triggering it must choose two inputs: action (apply or destroy) and approve (approve or dont). This double confirmation prevents accidental deployments or infrastructure destruction. It acts as a safety lock to avoid surprises in production.
In env start to Pass necessary inputs for the workflow.
Under Jobs: we will find infrastructure , build , push , destroy , and exit
In infrastructure :
This job runs only when both action is set to apply and approve is set to approve. Inside, it initializes Terraform and deploys the AWS infrastructure needed for your Django app. That usually means ECS Cluster, Application Load Balancer, Security Groups, RDS databases, and ECR repositories. Terraform reads the Docker image name from the environment, but the actual image does not exist yet at this point the pipeline is just setting up the infrastructure shell. The Terraform apply uses terraform-prod.tfvars, which likely contains production configuration like instance sizes, DB passwords (hopefully through variables or secrets), and VPC IDs. This separation allows infrastructure to be provisioned first, independently of the Docker build process.
Next comes the build job, which also runs only if action is apply and approve is approve. Its purpose is to build the Docker image for the Django application. It uses docker build to create the image locally and saves it as image.tar. Instead of pushing the image right away, the pipeline uploads it as an artifact using GitHub's actions/upload-artifact. This allows the push job to download and reuse the same image later, ensuring consistency between build and deployment. It also avoids rebuilding the same image multiple times if other steps fail, which is good for debugging and repeatability, but using docker save/load is slower compared to building and pushing directly in a single step.
Separated the build and push because push needs infrastructure to be initalized so it pushs the Image to it, so i do it to speed up the workflow(but indirectly) by makes the build and infrastructure to run in parallel.
After building the image, the push job starts. This job depends on both infrastructure and build jobs completing successfully. It first downloads the previously built image.tar from GitHub's artifact storage. Then it uses Terraform outputs to get the dynamically created ECR repository URL. This is important because the infrastructure layer controls where the image is supposed to go, and the workflow doesn't hardcode the ECR URL. After that, it loads the Docker image from image.tar, tags it with the ECR repo URL, and pushes it to AWS ECR. At this point, the ECS service can pull the image directly from ECR in future deploys. This separation between build and push makes the workflow more flexible, but also introduces a risk: if Terraform outputs are wrong or missing (for example, ecr_repo_url is not properly set), the push will fail even if the image build succeeded.
The destroy job handles the infrastructure teardown. It only runs if action is destroy and approve is approve. This job checks out the repo, configures AWS credentials, initializes Terraform, creates a destroy plan, and then applies it. This process destroys everything: ECS services, ALB, RDS, ECR repos, and any other AWS resources managed by Terraform. The use of terraform plan -destroy ensures the operator can preview the destruction before applying if needed, but here the plan is auto-applied in one workflow run after approval. This is efficient but risky if not carefully monitored because resources are deleted immediately after confirmation.
Finally, the exit job handles the case where someone triggers the pipeline but selects approve as dont. Instead of failing silently, this job runs a simple echo "Action denied by reviewer." to make it clear that the workflow was intentionally aborted by human decision. This improves transparency in CI/CD logs, so that others reviewing the workflow understand it wasn’t a failure—it was a conscious choice not to proceed.
In the bigger picture, this pipeline is designed for safe, manual deployments rather than continuous integration or fast delivery. It prioritizes control over speed. All jobs are isolated: Terraform infra setup is separated from the Docker build and ECR push to reduce coupling and improve troubleshooting. However, there are tradeoffs. Docker artifacts are saved and loaded across jobs, which slows down the process compared to direct ECR pushes. There's no tagging strategy beyond latest, so production deployments might accidentally overwrite images. Also, there’s no rollback if something fails after infrastructure is created but before the image is pushed. This pipeline is good for environments where you want to prevent mistakes more than you want speed, but in mature CI/CD pipelines, you might eventually automate some parts while keeping approval only for destructive actions like destroy.
Final Thoughts
Security is not free. But breaches cost more.
This setup prioritizes security and high availability, even if that means paying for:
VPC endpoints
Multi-AZ RDS
Load Balancing
If you’re building something serious—not just a hobby project—this trade-off is justified.
Code Repository
Contributors
Discussion
Would you choose lower cost with higher risk, or pay for security and redundancy upfront?
Let us know your thoughts!
Last updated