Kubernetes on AWS · open source

Your AWS data-transfer bill is a black box. Tollwing names the pod paying it, and gets cross-AZ right.

Tollwing is an open-source eBPF agent that attributes every byte of Kubernetes traffic to the exact pod, the conversation, and the AWS billing path paying for it. Live, in dollars, across all 9 paths. No app changes, ~0.1–0.5% CPU.

Run make demomake demo · 60s · no cloud account Star on GitHub
  ① THE DIFFERENTIATOR: cross-AZ cost that post-DNAT-only tools miss

     cart [shop/us-east-1a]  ──1.0 GiB──▶  checkout (ClusterIP) [shop/us-east-1b]

        billing path     attributed to     cost
        cross_az         checkout          $0.01   ◀── correct
        ───────────────────────────────────────
        total                              $0.01

  Post-DNAT-only tools (Kubecost / OpenCost) see only the rewritten
  destination IP, so they bill this to cart, or miss it.

$0.01 a GiB sounds like nothing. At Kafka RF=3 replication scale it is tens of thousands of dollars a month, billed to a pod you cannot currently see.

Real output from make demo. Pure Go: no cloud account, no cluster, no kernel. Every dollar is bytes × a dated rate, re-derived independently by an oracle (make sim).

Apache-2.0 · read every line ~0.1–0.5% CPU · in-kernel PERCPU aggregation No app changes, no sidecars, no code edits Every number oracle-tested · run make sim
Works with the tools you already run: PrometheusOpenTelemetryGrafanaOpenCostKubernetesHelm
The black box

You get the total. You never get the culprit.

Your AWS bill lumps every byte that crosses an availability zone, hits a NAT gateway, or leaves the VPC into one opaque line: EC2-Other, DataTransfer. It tells you the number is big. It never tells you which workload spent it. Your tags describe how things were provisioned, not how they actually talk at runtime. So when the data-transfer line jumps, you are guessing.

A Kafka cluster with RF=3 replication crosses two AZ boundaries on every produced byte. That can be tens of thousands of dollars a month, and no existing tool attributes a dollar of it to a pod.

You can see the bill is high. You cannot see which pod is paying it. There is a structural reason every other tool gets this wrong, and here it is.

Honest by design

It refuses to guess.

Heuristic tools fill the gaps they can’t see with estimates. That is how a charge lands on the wrong pod. Tollwing does the opposite. Every dollar is bytes × a dated rate: traceable, never estimated. It counts each cross-AZ interaction’s cost exactly once, with no double-counting. And when a single agent genuinely cannot see a flow’s zone, it marks that leg Unknown instead of inventing a number. An independent oracle then re-derives every figure in make sim.

Cost numbers are honest and traceable

Never invent or double-count a dollar. Every figure resolves to bytes × a dated rate.

Accurate attribution over convenient approximation

Refuse to guess. When a leg can’t be seen, mark it Unknown instead of fabricating it.

That is why the demo bills cross-AZ to the right service exactly once.

Why everyone else bills the wrong pod

We see the connection twice.

When a pod calls a Kubernetes Service, kube-proxy rewrites the destination mid-flight (DNAT), like a mail forwarder swapping the address on the envelope. Any tool that reads the envelope after forwarding sees the wrong recipient, and often the wrong availability zone. That is how the cross-AZ charge lands on the wrong pod, or vanishes.

Post-DNAT-only tools (Kubecost / OpenCost)
cart · us-east-1akube-proxy DNAT: original destination lost
checkout · us-east-1b
sees one hop → bills cart, or misses it
Tollwing
cart · us-east-1a
① connect4, on the calling node: captures the ClusterIP the pod meant to reach
② the backend-node agent: records where the connection landed, in a zone it knows
checkout · us-east-1b
cross-AZ captured exactly once, attributed to the checkout Service, correctly · $0.01/GiB

We see the connection twice: what it meant to reach, and where it actually landed.

For the curious →

On the calling node, a cgroup/connect4 eBPF hook captures the pre-DNAT destination, the ClusterIP the pod meant to reach. A ClusterIP has no zone, so that agent leaves the zone Unknown rather than guessing it (P5). The agent on the backend node sees where the connection actually landed, in a zone it knows, and prices the cross-AZ movement exactly once, attributed to the Service (P4, no double-count). In-kernel PERCPU maps aggregate the bytes, so there is no per-packet userspace handoff. The two-phase capture is DEC-003; leaving the dialer leg Unknown rather than guessing it is DEC-010.

Verify, don’t trust

See the right answer on your laptop in 60 seconds.

git clone https://github.com/tollwing/tollwing
cd tollwing
make demo
  • Runs the real cost engine: the same classification and dated-rate math the agent feeds in production, not a mock.
  • No AWS account, no cluster, no kernel. Clone to answer in ~60s; the engine itself runs in milliseconds.
  • Every dollar is oracle-tested. Run make sim and the oracle re-derives each number independently. Open the tests and check the math yourself.
② the breadth: per-pod cost by billing path
billing path          example flow       data       cost
────────────────────────────────────────────────────────
same_zone             api → cache        1.0 GiB    $0.00
cross_az              cart → checkout    1.0 GiB    $0.01
cross_region          api → replica      1.0 GiB    $0.02
nat_gateway           worker → nat       2.0 GiB    $0.09
internet_egress       api → internet     5.0 GiB    $0.36
vpc_peering           api → peer-db      1.0 GiB    $0.01
transit_gateway       api → tgw-peer     1.0 GiB    $0.02
vpc_endpoint          api → s3-endpoint  1.0 GiB    $0.01
cloud_service_public  api → s3-public    1.0 GiB    $0.00

Each row is an independent scenario, priced by the same engine. The dollar column owns the only color on the page. NAT vs VPC-endpoint is the boring, huge fix: $0.045/GB → $0.01/GB.

Want to watch it beat the incumbents on a real cluster? make sim-differential deploys OpenCost and Kubecost’s network daemon next to the Tollwing agent and prints all three numbers side by side. OpenCost reports $0, Kubecost reports same-zone/free with no service field, Tollwing attributes cross_az to the dialed Service.

Run make demo Community tier: one AWS cluster, up to 100 nodes. That’s the whole engine, not a teaser.

From demo to live dollars in three steps.

make demo

Read a per-pod, per-billing-path cost report locally. No cloud account, no cluster, no kernel.

helm install tollwing-agent

Deploy the eBPF agent as a DaemonSet. Read-only. No app changes, no sidecars, ~0.1–0.5% CPU. Auto-detects provider + region via IMDS.

helm install tollwing-agent ./deploy/helm/tollwing-agent

Pin --set agent.provider / agent.region only if IMDS is blocked.

Open Grafana

Live per-pod, per-namespace dollars across all 9 AWS billing paths. The agent exposes tollwing_* Prometheus metrics on :9990/metrics; import the included 23-panel dashboard. Or run the CLI: tollwing-cli -server http://tollwing-server:8080 -hours 24.

Read-only observability. Apache-2.0. Verify every number.

Safe by construction

Yes, it runs in your kernel. Here’s why that’s safe.

Verifier-checked before it loads

Every Tollwing eBPF program is checked by the Linux kernel verifier before it runs. Unlike a kernel module, a verified program cannot crash your kernel, loop forever, or read arbitrary memory. At worst it produces incorrect data, never an outage.

Read-only

The agent attaches cgroup/connect4 and sock_ops hooks to count bytes. It never modifies, drops, or redirects a single packet.

Honest overhead, with the mechanism named

We aggregate in-kernel with PERCPU maps, so there is no per-packet userspace handoff. Overhead stays ~0.1–0.5% CPU per node. We will not tell you it is zero, and we publish our benchmark methodology so you can confirm it on your own nodes.

It’s Apache-2.0, so your security team can read exactly what runs before you deploy it. Read the architecture →

See the bill by pod. Find the expensive conversation. Cut the waste.

See the bill by pod & namespace

Every byte classified across all 9 AWS billing paths (same-zone, cross-AZ, cross-region, internet egress, NAT gateway, VPC peering, transit gateway, VPC endpoint, cloud-service public endpoint), in live dollars, not flow counts.

Find the expensive conversation

e.g. kafka-broker-2 → kafka-broker-0 cross-AZ replication, or spark-exec reaching S3 over a NAT gateway ($0.045/GB) instead of a VPC endpoint ($0.01/GB).

Drop it into your stack

tollwing_* Prometheus/OTel metrics, a 23-panel Grafana dashboard, an OpenCost-compatible plugin, and a CLI (tollwing-cli -hours 24).

Share the win

A one-page Cost Savings Report for the people who own the budget.

Everyone tells you the bill is high. Only Tollwing tells you which pod, talking to which pod, over which path, is paying it.

Per-pod network dollarsPre-DNAT Service intent9 AWS billing pathsCost mathOpen source
Tollwing✓ pod + conversation✓ connect4✓ 9-waybytes × dated rate✓ Apache-2.0
Kubecost / OpenCostK8s cost allocationpost-DNAT blind spotlimited network viewcloud/K8s cost modelOpenCost Apache-2.0
Datadog CCM + CNMbroad cost + network visibilitynot Service-intent capturenot a billing-path classifierplatform cost + flow metricsSaaS
AWS Network Flow Monitorworkload flow metricsnonetwork performance focusbytes, not pod dollarsAWS service

Kubecost, OpenCost, Datadog, and AWS Network Flow Monitor are excellent at their own jobs. Tollwing is narrower: per-pod Kubernetes network cost, classified by AWS billing path. Cross-AZ Service traffic is a structural blind spot of the post-DNAT view. Tollwing reproduces the head-to-head itself: make sim-differential.

a tollwing maintainer

“We built Tollwing after watching cross-AZ charges get billed to the wrong pod, again and again. Tags tell you how things were provisioned, not how they actually talk. So we capture the connection before kube-proxy rewrites it, and we made every number checkable: run make demo and verify the math yourself. We’d rather you falsify it than trust us.”

New project, fresh public launch. Real platform-engineer quotes will replace this as they arrive. We won’t fake them in the meantime.

Free vs Enterprise

The full attribution engine is free, and Apache-2.0, forever.

Everything you’ve seen above is free and Apache-2.0: the eBPF agent, the 9-way per-pod classifier, the pre-DNAT service-intent capture, exact-once cross-AZ pricing, the cost engine, the demo, the Grafana dashboard, the CLI, the OpenCost plugin. That is the whole engine, not a teaser.

Community

Free

$0

For proving and running the AWS network-cost engine on one cluster.

  • 9-way per-pod classifier + the eBPF agent
  • Pre-DNAT service intent + exact-once cross-AZ pricing
  • Prometheus/OTel + Grafana + OpenCost plugin
  • CLI + Cost Savings Report
  • Single cluster, ≤100 nodes, AWS
Run make demo
Tollwing Enterprise · early access

Contact

Early

Self-hosted, license-gated. For teams that need reconciled bills, more clusters, and governance controls.

  • CUR reconciliation to your actual discounted rates
  • Multi-cluster + long retention
  • Anomaly detection, auto-remediation, what-if
  • GCP/Azure
  • SSO/RBAC (early access)
  • Signed offline license, no phone-home
Talk about Enterprise

Community is AWS-only and single-cluster on purpose. AWS’s billing-path complexity is where the hidden cross-AZ and NAT spend actually lives, so prove the value on one AWS cluster first and scale only if it pays off. If you never upgrade, nothing you built on the free tier stops working. There’s no hosted dependency and no phone-home.

Want a second set of eyes on a network-cost bill?

No form, no nurture sequence. Email the maintainer for Enterprise, design-partner access, or a pre-install read on whether Tollwing is likely to find anything useful.

hello@tollwing.com

Tollwing ships with a governance constitution, 11 ADRs (the pre-DNAT capture is DEC-003), and a CI governance gate, because we hold the cost logic to a high bar and you should be able to see exactly how. Constitution · Architecture · ADR log

See the connection before it’s rewritten, and the pod that’s actually paying.

Run make demo Star on GitHub Contact

Read the docs →