ALB Consolidation on AWS ECS — How We Cut Load Balancer Costs by 80%
The Problem
At Simpl, we ran 250+ microservices on AWS ECS, spread across multiple clusters — production, staging, sandbox, and a few special-purpose ones. Our deployment tool — Cloudlift, an open-source Python CLI for managing ECS services — had a simple model: every service with an HTTP interface gets its own dedicated Application Load Balancer.
One ALB per service. Simple, isolated, easy to reason about. Until it isn’t.
The Cost of Simplicity
Each ALB has a fixed hourly cost on AWS — roughly $0.0239/hour in ap-south-1 (Mumbai), regardless of traffic. That doesn’t sound like much until you do the rough math:
On top of the fixed cost, you pay per LCU (Load Balancer Capacity Unit) based on traffic, new connections, and rule evaluations. For a typical service, that pushes total cost to $18-22/month per ALB. Now multiply:
Five thousand dollars a month just in ALB fixed costs across the org, before any traffic-based charges.
graph TD
subgraph "Before: Dedicated ALBs"
S1[Service A] --> ALB1[ALB A<br/>~$20/mo]
S2[Service B] --> ALB2[ALB B<br/>~$20/mo]
S3[Service C] --> ALB3[ALB C<br/>~$20/mo]
S4[...] --> ALB4[...]
S250[Service N] --> ALB250[ALB N<br/>~$20/mo]
end
style ALB1 fill:#ff6b6b,color:#fff
style ALB2 fill:#ff6b6b,color:#fff
style ALB3 fill:#ff6b6b,color:#fff
style ALB4 fill:#ff6b6b,color:#fff
style ALB250 fill:#ff6b6b,color:#fff
And the waste wasn’t just financial. Most of these services handled modest traffic. A service processing 100 requests/minute doesn’t need its own ALB — it needs a target group behind a shared one. We were provisioning load balancer infrastructure at the same granularity as services, which made no sense at our scale.
Beyond cost, the operational overhead was real. Each dedicated ALB creates a fixed set of CloudFormation resources:
That meant:
- 250+ security groups tied to individual ALBs
- 500+ listeners (HTTP + HTTPS per ALB)
- 250+ SSL certificates attached to individual listeners
- CloudFormation stacks that were bloated with ALB-related resources
The Idea
ALBs support host-header based routing — a single ALB can look at the Host header in an incoming request and route it to different target groups. This is how most reverse proxies (nginx, HAProxy, Traefik) have worked forever. AWS calls them listener rules.
The plan was straightforward — do this at the cluster level, so each cluster (production, staging, sandbox, etc.) gets its own pair of shared ALBs:
- Create two shared ALBs per cluster — one internal, one internet-facing
- Services opt into the shared ALB by specifying a
hostname - Listener rules route traffic based on the
Hostheader to the right target group - The dedicated ALB mode stays as a fallback for services that need it
graph TD
subgraph "After: Cluster ALBs (per cluster)"
ALBInt[Internal ALB<br/>~$25/mo] --> TG1[Target Group A]
ALBInt --> TG2[Target Group B]
ALBInt --> TG3[Target Group C]
ALBInt --> TG250[Target Group N]
ALBPub[Public ALB<br/>~$25/mo]
TG1 --> S1[Service A]
TG2 --> S2[Service B]
TG3 --> S3[Service C]
TG250 --> S250[Service N]
end
style ALBInt fill:#51cf66,color:#fff
style ALBPub fill:#51cf66,color:#fff
From 250+ ALBs down to ~10 — just 2 per cluster. The rough math was obvious:
These are rough estimates — actual costs vary by region, traffic patterns, and LCU consumption. But the direction is clear. In practice, the shared ALBs handle more traffic per LCU (since they aggregate all services), so LCU costs are slightly higher per ALB — but the total is still dramatically lower because you’re not paying the fixed cost 250 times.
Implementation
This was built into Cloudlift — the tool generates CloudFormation templates from service configurations. The change touched two template generators and the configuration schema, and was merged upstream as PR #156.
How Cloudlift Works
Cloudlift takes declarative service configurations (stored in AWS Parameter Store) and generates CloudFormation templates for ECS services. When you run cloudlift deploy, it:
- Reads the service config
- Generates a CloudFormation template (using troposphere, a Python library for programmatic CloudFormation)
- Creates/updates the CloudFormation stack
- Deploys the new task definition
The ALB change had to work within this model — modify the template generators to conditionally create shared ALBs at the cluster level instead of dedicated ALBs per service.
The Cluster Template
The first part was adding shared ALBs to the cluster template generator. Before this change, the cluster template created the VPC, subnets, ECS cluster, auto-scaling groups, and security groups. Load balancers were entirely the service template’s responsibility.
I added a _add_cluster_albs() method that creates two ALBs — one internal, one public:
def _add_cluster_albs(self):
alb_configs = [
{ 'count': 1, 'scheme': self.INTERNAL_ALB_SCHEME },
{ 'count': 1, 'scheme': self.PUBLIC_ALB_SCHEME },
]
for alb_config in alb_configs:
count = alb_config['count']
for index in range(count):
index += 1 # 1-based
alb_scheme = alb_config['scheme']
alb = self._create_alb(alb_scheme, index)
self.albs.append(alb)
listeners = self._create_alb_listeners(alb, alb_scheme, index)
for listener in listeners:
self.alb_listeners.append(listener)
The count field was intentionally forward-looking. ALBs have a default limit of 100 listener rules (expandable to 200). If the number of services on a single cluster ever exceeded that:
We could add more ALBs per scheme without restructuring the template — just bump the count.
Each ALB gets:
- A security group with ingress on 80/443 and an ingress rule allowing traffic from the ALB SG to the EC2 host SG
- An HTTP listener — returns a fixed 404 response (internal) or redirects to HTTPS (public)
- An HTTPS listener — returns a fixed 404 by default, with host-header rules added per service
def _create_alb_listeners(self, alb, alb_scheme, index):
# base_title derived from scheme + index, e.g. "Internal1"
# Internal: both HTTP and HTTPS serve traffic (fixed 404 default)
# Public: HTTP redirects to HTTPS, HTTPS serves traffic
if alb_scheme == self.INTERNAL_ALB_SCHEME:
http_listener = ALBListener(
title=f"Http{base_title}",
LoadBalancerArn=Ref(alb),
Port=80,
Protocol='HTTP',
DefaultActions=[self._create_fixed_response_action()]
)
else:
http_listener = ALBListener(
title=f"Http{base_title}",
LoadBalancerArn=Ref(alb),
Port=80,
Protocol='HTTP',
DefaultActions=[self._create_redirect_action()]
)
# ...
The fixed 404 default action is important — if a request arrives with a Host header that doesn’t match any listener rule, it gets a clean 404 instead of hitting a random service. This was a security consideration in a fintech context.
All ALB ARNs and listener ARNs are exported as CloudFormation outputs so the service template can reference them.
The Service Template
This is where the branching logic lives. The _add_alb() method now takes an alb_mode parameter:
def _add_alb(self, cd, service_name, config, launch_type, alb_mode):
target_group_name = "TargetGroup" + service_name
if alb_mode == 'cluster':
target_group_name = target_group_name + 'Cluster'
# Target group is always created (same for both modes)
service_target_group = TargetGroup(
target_group_name,
# ... health check config, port, protocol, etc.
)
if alb_mode == 'dedicated':
# Original behavior: create ALB, SG, listeners
# ... (unchanged)
else:
# Cluster mode: just add listener rules to the shared ALB
is_alb_internal = config.get('http_interface', {}).get('internal', True)
self._add_listener_rules_to_cluster_alb(
config, service_target_group, is_alb_internal
)
service_listener = None
alb = None
svc_alb_sg = self._fetch_cluster_alb_sg_id(is_alb_internal)
# lb = load balancer DNS output, set in dedicated mode (omitted)
return alb, lb, service_listener, svc_alb_sg
In cluster mode, no ALB is created. Instead, the service template:
- Creates a target group (as before)
- Fetches the cluster ALB’s listener ARNs from the environment stack outputs
- Creates listener rules on the appropriate listeners with a host-header condition
- Returns the cluster ALB’s security group ID (fetched from stack outputs) instead of creating a new one
Host-Header Routing
The listener rule creation uses the hostname from the service config to create a host-header condition:
def _get_listener_rule(self, protocol, index, is_internal,
listener_arn, priority, hostname, target_group_arn):
action = ListenerRuleAction(
Type="forward",
TargetGroupArn=target_group_arn
)
condition = Condition(
Field="host-header",
HostHeaderConfig=HostHeaderConfig(
Values=[hostname]
),
)
# title derived from protocol + scheme + index
listener_rule = ListenerRule(
title=title,
Actions=[action],
Conditions=[condition],
ListenerArn=listener_arn,
Priority=priority
)
return listener_rule
The flow for a request hitting a cluster ALB:
sequenceDiagram
autonumber
participant Client
participant ALB as Cluster ALB
participant LR as Listener Rules
participant TG as Target Group
participant ECS as ECS Service
Client->>ALB: GET / (Host: api.internal.example.com)
ALB->>LR: Match Host header against rules
alt Host matches rule
LR->>TG: Forward to target group
TG->>ECS: Route to healthy container
ECS-->>Client: 200 OK
else No match
LR-->>Client: 404 No matching host found
end
Priority Assignment
ALB listener rules need a priority — an integer in the range that determines the evaluation order. This was a subtle problem. We couldn’t use sequential priorities because services are deployed independently, and two services deploying at the same time could try to claim the same priority.
The solution was a random priority generator that picks from a large range and checks for conflicts. With existing rules in a range of possible priorities, the probability of a collision on a single attempt is:
Negligible, and the generator retries on collision anyway:
def _get_listener_rule_priority(self, listener_arn, hostname):
MIN_PRIORITY = 100
MAX_PRIORITY = 49999
elbv2_client = get_client_for('elbv2', self.env)
rules = elbv2_client.describe_rules(
ListenerArn=listener_arn, PageSize=400
)
# If a rule for this hostname already exists, reuse its priority
for rule in rules['Rules']:
for condition in rule['Conditions']:
if (condition['Field'] == 'host-header' and
hostname in condition['HostHeaderConfig']['Values']):
return int(rule['Priority'])
# Generate a unique random priority
existing_priorities = [
int(r['Priority']) for r in rules['Rules']
if not r['IsDefault']
]
return self._generate_unique_priority(
existing_priorities, MIN_PRIORITY, MAX_PRIORITY
)
The idempotency check is key — if the service already has a listener rule (from a previous deploy), reuse its priority. This prevents priority churn on redeploys.
Configuration Schema
Services opt into cluster mode through their config:
{
"services": {
"MyService": {
"http_interface": {
"internal": true,
"restrict_access_to": ["0.0.0.0/0"],
"container_port": 8080,
"health_check_path": "/health",
"alb_mode": "cluster",
"hostname": "my-service.internal.example.com"
}
}
}
}
The alb_mode field accepts "cluster" or "dedicated" (default). There’s also an environment-level default in service_defaults, so an entire cluster can default to cluster ALB mode:
{
"service_defaults": {
"alb_mode": "cluster"
}
}
This was important for migration — we could set the default to "cluster" for new services while existing services kept "dedicated" until explicitly migrated.
The Migration
We didn’t flip everything at once. Since we had separate clusters for prod, staging, and sandbox, staging was the obvious proving ground. The rollout was staged:
graph LR
A[Deploy cluster ALBs<br/>to staging] --> B[Migrate 5 low-traffic<br/>services in staging]
B --> C[Validate routing<br/>and health checks]
C --> D[Deploy cluster ALBs<br/>to production]
D --> E[Migrate services<br/>in batches of 20-30]
E --> F[Decommission old ALBs<br/>as services migrate]
What We Watched
During migration, we monitored:
- Target group health: Were containers passing health checks through the shared ALB?
- Listener rule count: ALBs have a default limit of 100 rules (expandable to 200). We tracked utilization.
- Latency: Was host-header routing adding measurable latency? (It wasn’t — negligible overhead for exact host-header matching.)
- 502/504 errors: Any routing failures during the switchover?
The biggest risk was DNS. Each service had a CNAME pointing to its dedicated ALB’s DNS name. When migrating to cluster ALB, we had to update the CNAME to point to the cluster ALB and ensure the hostname in the listener rule matched. A mismatch means a 404.
The Gotchas
A few things bit us during the rollout:
1. Fargate security groups
In dedicated mode, the Fargate service’s security group allowed ingress from the dedicated ALB’s security group. In cluster mode, the ALB security group is different (it’s the cluster ALB’s SG). The initial implementation had a bug where it was still trying to Ref() the security group object, but in cluster mode it’s a string (the SG ID fetched from stack outputs). PR #158 fixed this.
2. Listener rule naming conflicts
CloudFormation resource names need to be unique within a template. Our initial naming scheme for listener rules could produce conflicts when a service had both HTTP and HTTPS rules. The fix was using distinct prefixes (Internal/Public + HTTP/HTTPS + index).
3. Multiple hostnames
Some services needed to respond to more than one hostname (e.g., api.example.com and api.internal.example.com). The initial implementation only supported a single hostname. PR #159 added support for an array of hostnames.
4. Optional alb_mode
Making alb_mode a required field would have forced every existing service to update its config. PR #160 made it optional, with fallback to the environment default.
The Results
After migrating the majority of internal services to cluster ALBs across all clusters (prod, staging, sandbox):
| Metric | Before | After |
|---|---|---|
| ALB count | 250+ across clusters | ~10 (2 per cluster + dedicated exceptions) |
| ALB monthly fixed cost | ~$5,000+ | ~$200 |
| Security groups (ALB-related) | 250+ | ~10 |
| CloudFormation resources per service | ~15 ALB-related | ~3 (target group + listener rules) |
| Deployment time | ~4-5 min (ALB creation) | ~2-3 min (rule creation only) |
The overall cost reduction in ALB spend:
All numbers are rough — actual figures depend on region pricing, traffic, and how many services stayed on dedicated ALBs. The 96% applies to fixed costs alone. When you include LCU charges (traffic-based), the total picture looks more like:
LCU costs stayed roughly the same — same traffic, just flowing through fewer ALBs. That gives us the ~80% reduction in total ALB spend:
On an annualized basis:
Deployments also got faster. Creating an ALB from scratch in CloudFormation takes 2-3 minutes. Creating a listener rule takes seconds.
When to Keep Dedicated ALBs
Not every service should use a shared ALB. We kept dedicated ALBs for:
- Public-facing services with unique security requirements: Services that needed IP-based allow lists that differed from the cluster default.
- Services with custom SSL certificates: The cluster ALB uses a single wildcard cert. Services with specific domain certs needed their own ALB.
- Services approaching the listener rule limit: If a cluster ALB was getting close to 200 rules, some services stayed on dedicated ALBs.
The alb_mode: "dedicated" option exists specifically for these cases.
What I Learned
This was one of those changes where the idea is simple but the implementation has a lot of edge cases. A few takeaways:
Backwards compatibility matters more than you think. The target group naming had to be different for cluster mode (TargetGroupCluster suffix) to avoid CloudFormation trying to replace existing target groups when a service switched modes. A resource name change in CloudFormation means delete + create, which means downtime.
CloudFormation outputs are the glue. The entire pattern works because the cluster template exports ALB ARNs, listener ARNs, and security group IDs as stack outputs. The service template reads these outputs at deploy time. Without this cross-stack reference pattern, the service template would need to make AWS API calls to discover cluster resources — fragile and slow.
Random priority generation is fine. I was initially concerned about using random priorities for listener rules, but the priority only matters for ordering evaluation when multiple rules could match. Since we use exact host-header matching (not wildcards), rule priority is irrelevant in practice — only one rule will ever match a given Host header.
Open source amplifies impact. This change was merged into the upstream Cloudlift repo. Anyone running Cloudlift on ECS now has access to cluster ALB mode. The specific code I wrote lives in a tool used by multiple teams, and the pattern is reusable for any organization running ECS at scale.
Conclusion
The ALB-per-service model is a fine default. It’s simple, isolated, and works well at small scale. But it doesn’t scale — the cost and operational overhead grow linearly with service count, while the actual traffic per ALB doesn’t justify the dedicated infrastructure.
Cluster-level ALBs with host-header routing is the pattern that AWS’s own documentation recommends at scale, but tooling to automate it was missing from Cloudlift. Adding it was a matter of modifying two template generators and the configuration schema — a few hundred lines of Python that eliminated thousands of dollars in monthly spend.
The PR is public if you want to dig into the implementation. And if you’re running ECS with Cloudlift, alb_mode: "cluster" is there waiting for you.