ALB Consolidation on AWS ECS — How We Cut Load Balancer Costs by 80%

The Problem

At Simpl, we ran 250+ microservices on AWS ECS, spread across multiple clusters — production, staging, sandbox, and a few special-purpose ones. Our deployment tool — Cloudlift, an open-source Python CLI for managing ECS services — had a simple model: every service with an HTTP interface gets its own dedicated Application Load Balancer.

One ALB per service. Simple, isolated, easy to reason about. Until it isn’t.

The Cost of Simplicity

Each ALB has a fixed hourly cost on AWS — roughly $0.0239/hour in ap-south-1 (Mumbai), regardless of traffic. That doesn’t sound like much until you do the rough math:

\text{Cost}_{\text{per ALB}} = \$0.0239/\text{hr} \times 730\text{ hrs/mo} \approx \$17.45/\text{mo}

On top of the fixed cost, you pay per LCU (Load Balancer Capacity Unit) based on traffic, new connections, and rule evaluations. For a typical service, that pushes total cost to $18-22/month per ALB. Now multiply:

\text{Total ALB cost} = n_{\text{services}} \times \text{Cost}_{\text{per ALB}} = 250 \times \$20 \approx \$5{,}000/\text{mo}

Five thousand dollars a month just in ALB fixed costs across the org, before any traffic-based charges.

graph TD
    subgraph "Before: Dedicated ALBs"
        S1[Service A] --> ALB1[ALB A<br/>~$20/mo]
        S2[Service B] --> ALB2[ALB B<br/>~$20/mo]
        S3[Service C] --> ALB3[ALB C<br/>~$20/mo]
        S4[...] --> ALB4[...]
        S250[Service N] --> ALB250[ALB N<br/>~$20/mo]
    end

    style ALB1 fill:#ff6b6b,color:#fff
    style ALB2 fill:#ff6b6b,color:#fff
    style ALB3 fill:#ff6b6b,color:#fff
    style ALB4 fill:#ff6b6b,color:#fff
    style ALB250 fill:#ff6b6b,color:#fff

And the waste wasn’t just financial. Most of these services handled modest traffic. A service processing 100 requests/minute doesn’t need its own ALB — it needs a target group behind a shared one. We were provisioning load balancer infrastructure at the same granularity as services, which made no sense at our scale.

Beyond cost, the operational overhead was real. Each dedicated ALB creates a fixed set of CloudFormation resources:

\text{Resources}_{\text{per service}} = 1\ \text{ALB} + 1\ \text{SG} + 2\ \text{listeners (HTTP + HTTPS)} + 1\ \text{TG} + 1\ \text{cert attachment} \approx 6\ \text{resources}

\text{Total resources} = 250 \times 6 = 1{,}500\ \text{ALB-related resources across the org}

That meant:

250+ security groups tied to individual ALBs
500+ listeners (HTTP + HTTPS per ALB)
250+ SSL certificates attached to individual listeners
CloudFormation stacks that were bloated with ALB-related resources

The Idea

ALBs support host-header based routing — a single ALB can look at the Host header in an incoming request and route it to different target groups. This is how most reverse proxies (nginx, HAProxy, Traefik) have worked forever. AWS calls them listener rules.

The plan was straightforward — do this at the cluster level, so each cluster (production, staging, sandbox, etc.) gets its own pair of shared ALBs:

Create two shared ALBs per cluster — one internal, one internet-facing
Services opt into the shared ALB by specifying a hostname
Listener rules route traffic based on the Host header to the right target group
The dedicated ALB mode stays as a fallback for services that need it

graph TD
    subgraph "After: Cluster ALBs (per cluster)"
        ALBInt[Internal ALB<br/>~$25/mo] --> TG1[Target Group A]
        ALBInt --> TG2[Target Group B]
        ALBInt --> TG3[Target Group C]
        ALBInt --> TG250[Target Group N]

        ALBPub[Public ALB<br/>~$25/mo]

        TG1 --> S1[Service A]
        TG2 --> S2[Service B]
        TG3 --> S3[Service C]
        TG250 --> S250[Service N]
    end

    style ALBInt fill:#51cf66,color:#fff
    style ALBPub fill:#51cf66,color:#fff

From 250+ ALBs down to ~10 — just 2 per cluster. The rough math was obvious:

\text{Cost}_{\text{before}} = 250 \times \$20/\text{mo} = \$5{,}000/\text{mo}

\text{Cost}_{\text{after}} \approx 10 \times \$20/\text{mo} = \$200/\text{mo (fixed)}

\text{Reduction} = \frac{\$5{,}000 - \$200}{\$5{,}000} = 96\%\ \text{on fixed costs alone}

These are rough estimates — actual costs vary by region, traffic patterns, and LCU consumption. But the direction is clear. In practice, the shared ALBs handle more traffic per LCU (since they aggregate all services), so LCU costs are slightly higher per ALB — but the total is still dramatically lower because you’re not paying the fixed cost 250 times.

Implementation

This was built into Cloudlift — the tool generates CloudFormation templates from service configurations. The change touched two template generators and the configuration schema, and was merged upstream as PR #156.

How Cloudlift Works

Cloudlift takes declarative service configurations (stored in AWS Parameter Store) and generates CloudFormation templates for ECS services. When you run cloudlift deploy, it:

Reads the service config
Generates a CloudFormation template (using troposphere, a Python library for programmatic CloudFormation)
Creates/updates the CloudFormation stack
Deploys the new task definition

The ALB change had to work within this model — modify the template generators to conditionally create shared ALBs at the cluster level instead of dedicated ALBs per service.

The Cluster Template

The first part was adding shared ALBs to the cluster template generator. Before this change, the cluster template created the VPC, subnets, ECS cluster, auto-scaling groups, and security groups. Load balancers were entirely the service template’s responsibility.

I added a _add_cluster_albs() method that creates two ALBs — one internal, one public:

def _add_cluster_albs(self):
    alb_configs = [
        { 'count': 1, 'scheme': self.INTERNAL_ALB_SCHEME },
        { 'count': 1, 'scheme': self.PUBLIC_ALB_SCHEME },
    ]

    for alb_config in alb_configs:
        count = alb_config['count']
        for index in range(count):
            index += 1  # 1-based
            alb_scheme = alb_config['scheme']
            alb = self._create_alb(alb_scheme, index)
            self.albs.append(alb)

            listeners = self._create_alb_listeners(alb, alb_scheme, index)
            for listener in listeners:
                self.alb_listeners.append(listener)

The count field was intentionally forward-looking. ALBs have a default limit of 100 listener rules (expandable to 200). If the number of services on a single cluster ever exceeded that:

n_{\text{ALBs per scheme}} = \left\lceil \frac{n_{\text{services per cluster}}}{R_{\text{max}}} \right\rceil

We could add more ALBs per scheme without restructuring the template — just bump the count.

Each ALB gets:

A security group with ingress on 80/443 and an ingress rule allowing traffic from the ALB SG to the EC2 host SG
An HTTP listener — returns a fixed 404 response (internal) or redirects to HTTPS (public)
An HTTPS listener — returns a fixed 404 by default, with host-header rules added per service

def _create_alb_listeners(self, alb, alb_scheme, index):
    # base_title derived from scheme + index, e.g. "Internal1"
    # Internal: both HTTP and HTTPS serve traffic (fixed 404 default)
    # Public: HTTP redirects to HTTPS, HTTPS serves traffic
    if alb_scheme == self.INTERNAL_ALB_SCHEME:
        http_listener = ALBListener(
            title=f"Http{base_title}",
            LoadBalancerArn=Ref(alb),
            Port=80,
            Protocol='HTTP',
            DefaultActions=[self._create_fixed_response_action()]
        )
    else:
        http_listener = ALBListener(
            title=f"Http{base_title}",
            LoadBalancerArn=Ref(alb),
            Port=80,
            Protocol='HTTP',
            DefaultActions=[self._create_redirect_action()]
        )
    # ...

The fixed 404 default action is important — if a request arrives with a Host header that doesn’t match any listener rule, it gets a clean 404 instead of hitting a random service. This was a security consideration in a fintech context.

All ALB ARNs and listener ARNs are exported as CloudFormation outputs so the service template can reference them.

The Service Template

This is where the branching logic lives. The _add_alb() method now takes an alb_mode parameter:

def _add_alb(self, cd, service_name, config, launch_type, alb_mode):
    target_group_name = "TargetGroup" + service_name
    if alb_mode == 'cluster':
        target_group_name = target_group_name + 'Cluster'

    # Target group is always created (same for both modes)
    service_target_group = TargetGroup(
        target_group_name,
        # ... health check config, port, protocol, etc.
    )

    if alb_mode == 'dedicated':
        # Original behavior: create ALB, SG, listeners
        # ... (unchanged)
    else:
        # Cluster mode: just add listener rules to the shared ALB
        is_alb_internal = config.get('http_interface', {}).get('internal', True)
        self._add_listener_rules_to_cluster_alb(
            config, service_target_group, is_alb_internal
        )
        service_listener = None
        alb = None
        svc_alb_sg = self._fetch_cluster_alb_sg_id(is_alb_internal)

    # lb = load balancer DNS output, set in dedicated mode (omitted)
    return alb, lb, service_listener, svc_alb_sg

In cluster mode, no ALB is created. Instead, the service template:

Creates a target group (as before)
Fetches the cluster ALB’s listener ARNs from the environment stack outputs
Creates listener rules on the appropriate listeners with a host-header condition
Returns the cluster ALB’s security group ID (fetched from stack outputs) instead of creating a new one

Host-Header Routing

The listener rule creation uses the hostname from the service config to create a host-header condition:

def _get_listener_rule(self, protocol, index, is_internal,
                       listener_arn, priority, hostname, target_group_arn):
    action = ListenerRuleAction(
        Type="forward",
        TargetGroupArn=target_group_arn
    )
    condition = Condition(
        Field="host-header",
        HostHeaderConfig=HostHeaderConfig(
            Values=[hostname]
        ),
    )

    # title derived from protocol + scheme + index
    listener_rule = ListenerRule(
        title=title,
        Actions=[action],
        Conditions=[condition],
        ListenerArn=listener_arn,
        Priority=priority
    )
    return listener_rule

The flow for a request hitting a cluster ALB:

sequenceDiagram
    autonumber

    participant Client
    participant ALB as Cluster ALB
    participant LR as Listener Rules
    participant TG as Target Group
    participant ECS as ECS Service

    Client->>ALB: GET / (Host: api.internal.example.com)
    ALB->>LR: Match Host header against rules
    alt Host matches rule
        LR->>TG: Forward to target group
        TG->>ECS: Route to healthy container
        ECS-->>Client: 200 OK
    else No match
        LR-->>Client: 404 No matching host found
    end

Priority Assignment

ALB listener rules need a priority — an integer in the range $[1, 50000]$ that determines the evaluation order. This was a subtle problem. We couldn’t use sequential priorities because services are deployed independently, and two services deploying at the same time could try to claim the same priority.

The solution was a random priority generator that picks from a large range and checks for conflicts. With $n$ existing rules in a range of $R = 49{,}900$ possible priorities, the probability of a collision on a single attempt is:

P(\text{collision}) = \frac{n}{R} = \frac{250}{49{,}900} \approx 0.5\%

Negligible, and the generator retries on collision anyway:

def _get_listener_rule_priority(self, listener_arn, hostname):
    MIN_PRIORITY = 100
    MAX_PRIORITY = 49999

    elbv2_client = get_client_for('elbv2', self.env)
    rules = elbv2_client.describe_rules(
        ListenerArn=listener_arn, PageSize=400
    )

    # If a rule for this hostname already exists, reuse its priority
    for rule in rules['Rules']:
        for condition in rule['Conditions']:
            if (condition['Field'] == 'host-header' and
                hostname in condition['HostHeaderConfig']['Values']):
                return int(rule['Priority'])

    # Generate a unique random priority
    existing_priorities = [
        int(r['Priority']) for r in rules['Rules']
        if not r['IsDefault']
    ]
    return self._generate_unique_priority(
        existing_priorities, MIN_PRIORITY, MAX_PRIORITY
    )

The idempotency check is key — if the service already has a listener rule (from a previous deploy), reuse its priority. This prevents priority churn on redeploys.

Configuration Schema

Services opt into cluster mode through their config:

{
  "services": {
    "MyService": {
      "http_interface": {
        "internal": true,
        "restrict_access_to": ["0.0.0.0/0"],
        "container_port": 8080,
        "health_check_path": "/health",
        "alb_mode": "cluster",
        "hostname": "my-service.internal.example.com"
      }
    }
  }
}

The alb_mode field accepts "cluster" or "dedicated" (default). There’s also an environment-level default in service_defaults, so an entire cluster can default to cluster ALB mode:

{
  "service_defaults": {
    "alb_mode": "cluster"
  }
}

This was important for migration — we could set the default to "cluster" for new services while existing services kept "dedicated" until explicitly migrated.

The Migration

We didn’t flip everything at once. Since we had separate clusters for prod, staging, and sandbox, staging was the obvious proving ground. The rollout was staged:

graph LR
    A[Deploy cluster ALBs<br/>to staging] --> B[Migrate 5 low-traffic<br/>services in staging]
    B --> C[Validate routing<br/>and health checks]
    C --> D[Deploy cluster ALBs<br/>to production]
    D --> E[Migrate services<br/>in batches of 20-30]
    E --> F[Decommission old ALBs<br/>as services migrate]

What We Watched

During migration, we monitored:

Target group health: Were containers passing health checks through the shared ALB?
Listener rule count: ALBs have a default limit of 100 rules (expandable to 200). We tracked utilization.
Latency: Was host-header routing adding measurable latency? (It wasn’t — negligible overhead for exact host-header matching.)
502/504 errors: Any routing failures during the switchover?

The biggest risk was DNS. Each service had a CNAME pointing to its dedicated ALB’s DNS name. When migrating to cluster ALB, we had to update the CNAME to point to the cluster ALB and ensure the hostname in the listener rule matched. A mismatch means a 404.

The Gotchas

A few things bit us during the rollout:

1. Fargate security groups

In dedicated mode, the Fargate service’s security group allowed ingress from the dedicated ALB’s security group. In cluster mode, the ALB security group is different (it’s the cluster ALB’s SG). The initial implementation had a bug where it was still trying to Ref() the security group object, but in cluster mode it’s a string (the SG ID fetched from stack outputs). PR #158 fixed this.

2. Listener rule naming conflicts

CloudFormation resource names need to be unique within a template. Our initial naming scheme for listener rules could produce conflicts when a service had both HTTP and HTTPS rules. The fix was using distinct prefixes (Internal/Public + HTTP/HTTPS + index).

3. Multiple hostnames

Some services needed to respond to more than one hostname (e.g., api.example.com and api.internal.example.com). The initial implementation only supported a single hostname. PR #159 added support for an array of hostnames.

4. Optional alb_mode

Making alb_mode a required field would have forced every existing service to update its config. PR #160 made it optional, with fallback to the environment default.

The Results

After migrating the majority of internal services to cluster ALBs across all clusters (prod, staging, sandbox):

Metric	Before	After
ALB count	250+ across clusters	~10 (2 per cluster + dedicated exceptions)
ALB monthly fixed cost	~$5,000+	~$200
Security groups (ALB-related)	250+	~10
CloudFormation resources per service	~15 ALB-related	~3 (target group + listener rules)
Deployment time	~4-5 min (ALB creation)	~2-3 min (rule creation only)

The overall cost reduction in ALB spend:

\text{Savings} = \frac{\$5{,}000 - \$200}{\$5{,}000} \times 100 = 96\%\ \text{on fixed costs}

All numbers are rough — actual figures depend on region pricing, traffic, and how many services stayed on dedicated ALBs. The 96% applies to fixed costs alone. When you include LCU charges (traffic-based), the total picture looks more like:

\text{Total}_{\text{before}} \approx \$5{,}000\ \text{(fixed)} + \text{LCU} \approx \$5{,}000/\text{mo}

\text{Total}_{\text{after}} \approx \$200\ \text{(fixed)} + \text{LCU} \approx \$1{,}000/\text{mo}

LCU costs stayed roughly the same — same traffic, just flowing through fewer ALBs. That gives us the ~80% reduction in total ALB spend:

\text{Reduction}_{\text{total}} = \frac{\$5{,}000 - \$1{,}000}{\$5{,}000} = 80\%

On an annualized basis:

\text{Annual savings} \approx (\$5{,}000 - \$1{,}000) \times 12 = \$48{,}000/\text{yr}

Deployments also got faster. Creating an ALB from scratch in CloudFormation takes 2-3 minutes. Creating a listener rule takes seconds.

When to Keep Dedicated ALBs

Not every service should use a shared ALB. We kept dedicated ALBs for:

Public-facing services with unique security requirements: Services that needed IP-based allow lists that differed from the cluster default.
Services with custom SSL certificates: The cluster ALB uses a single wildcard cert. Services with specific domain certs needed their own ALB.
Services approaching the listener rule limit: If a cluster ALB was getting close to 200 rules, some services stayed on dedicated ALBs.

The alb_mode: "dedicated" option exists specifically for these cases.

What I Learned

This was one of those changes where the idea is simple but the implementation has a lot of edge cases. A few takeaways:

Backwards compatibility matters more than you think. The target group naming had to be different for cluster mode (TargetGroupCluster suffix) to avoid CloudFormation trying to replace existing target groups when a service switched modes. A resource name change in CloudFormation means delete + create, which means downtime.

CloudFormation outputs are the glue. The entire pattern works because the cluster template exports ALB ARNs, listener ARNs, and security group IDs as stack outputs. The service template reads these outputs at deploy time. Without this cross-stack reference pattern, the service template would need to make AWS API calls to discover cluster resources — fragile and slow.

Random priority generation is fine. I was initially concerned about using random priorities for listener rules, but the priority only matters for ordering evaluation when multiple rules could match. Since we use exact host-header matching (not wildcards), rule priority is irrelevant in practice — only one rule will ever match a given Host header.

Open source amplifies impact. This change was merged into the upstream Cloudlift repo. Anyone running Cloudlift on ECS now has access to cluster ALB mode. The specific code I wrote lives in a tool used by multiple teams, and the pattern is reusable for any organization running ECS at scale.

Conclusion

The ALB-per-service model is a fine default. It’s simple, isolated, and works well at small scale. But it doesn’t scale — the cost and operational overhead grow linearly with service count, while the actual traffic per ALB doesn’t justify the dedicated infrastructure.

Cluster-level ALBs with host-header routing is the pattern that AWS’s own documentation recommends at scale, but tooling to automate it was missing from Cloudlift. Adding it was a matter of modifying two template generators and the configuration schema — a few hundred lines of Python that eliminated thousands of dollars in monthly spend.

The PR is public if you want to dig into the implementation. And if you’re running ECS with Cloudlift, alb_mode: "cluster" is there waiting for you.