Tollwing is an open-source eBPF agent that attributes every byte of Kubernetes traffic to the exact pod, the conversation, and the AWS billing path paying for it. Live, in dollars, across all 9 paths. No app changes, ~0.1–0.5% CPU.
① THE DIFFERENTIATOR: cross-AZ cost that post-DNAT-only tools miss cart [shop/us-east-1a] ──1.0 GiB──▶ checkout (ClusterIP) [shop/us-east-1b] billing path attributed to cost cross_az checkout $0.01 ◀── correct ─────────────────────────────────────── total $0.01 Post-DNAT-only tools (Kubecost / OpenCost) see only the rewritten destination IP, so they bill this to cart, or miss it.
$0.01 a GiB sounds like nothing. At Kafka RF=3 replication scale it is tens of thousands of dollars a month, billed to a pod you cannot currently see.
Real output from make demo. Pure Go: no cloud account, no cluster, no kernel. Every dollar is bytes × a dated rate, re-derived independently by an oracle (make sim).
Your AWS bill lumps every byte that crosses an availability zone, hits a NAT gateway, or leaves the VPC into one opaque line: EC2-Other, DataTransfer. It tells you the number is big. It never tells you which workload spent it. Your tags describe how things were provisioned, not how they actually talk at runtime. So when the data-transfer line jumps, you are guessing.
A Kafka cluster with RF=3 replication crosses two AZ boundaries on every produced byte. That can be tens of thousands of dollars a month, and no existing tool attributes a dollar of it to a pod.
You can see the bill is high. You cannot see which pod is paying it. There is a structural reason every other tool gets this wrong, and here it is.
Heuristic tools fill the gaps they can’t see with estimates. That is how a charge lands on the wrong pod. Tollwing does the opposite. Every dollar is bytes × a dated rate: traceable, never estimated. It counts each cross-AZ interaction’s cost exactly once, with no double-counting. And when a single agent genuinely cannot see a flow’s zone, it marks that leg Unknown instead of inventing a number. An independent oracle then re-derives every figure in make sim.
Never invent or double-count a dollar. Every figure resolves to bytes × a dated rate.
Refuse to guess. When a leg can’t be seen, mark it Unknown instead of fabricating it.
That is why the demo bills cross-AZ to the right service exactly once.
When a pod calls a Kubernetes Service, kube-proxy rewrites the destination mid-flight (DNAT), like a mail forwarder swapping the address on the envelope. Any tool that reads the envelope after forwarding sees the wrong recipient, and often the wrong availability zone. That is how the cross-AZ charge lands on the wrong pod, or vanishes.
We see the connection twice: what it meant to reach, and where it actually landed.
On the calling node, a cgroup/connect4 eBPF hook captures the pre-DNAT destination, the ClusterIP the pod meant to reach. A ClusterIP has no zone, so that agent leaves the zone Unknown rather than guessing it (P5). The agent on the backend node sees where the connection actually landed, in a zone it knows, and prices the cross-AZ movement exactly once, attributed to the Service (P4, no double-count). In-kernel PERCPU maps aggregate the bytes, so there is no per-packet userspace handoff. The two-phase capture is DEC-003; leaving the dialer leg Unknown rather than guessing it is DEC-010.
git clone https://github.com/tollwing/tollwing cd tollwing make demo
billing path example flow data cost ──────────────────────────────────────────────────────── same_zone api → cache 1.0 GiB $0.00 cross_az cart → checkout 1.0 GiB $0.01 cross_region api → replica 1.0 GiB $0.02 nat_gateway worker → nat 2.0 GiB $0.09 internet_egress api → internet 5.0 GiB $0.36 vpc_peering api → peer-db 1.0 GiB $0.01 transit_gateway api → tgw-peer 1.0 GiB $0.02 vpc_endpoint api → s3-endpoint 1.0 GiB $0.01 cloud_service_public api → s3-public 1.0 GiB $0.00
Each row is an independent scenario, priced by the same engine. The dollar column owns the only color on the page. NAT vs VPC-endpoint is the boring, huge fix: $0.045/GB → $0.01/GB.
Want to watch it beat the incumbents on a real cluster? make sim-differential deploys OpenCost and Kubecost’s network daemon next to the Tollwing agent and prints all three numbers side by side. OpenCost reports $0, Kubecost reports same-zone/free with no service field, Tollwing attributes cross_az to the dialed Service.
Read a per-pod, per-billing-path cost report locally. No cloud account, no cluster, no kernel.
Deploy the eBPF agent as a DaemonSet. Read-only. No app changes, no sidecars, ~0.1–0.5% CPU. Auto-detects provider + region via IMDS.
helm install tollwing-agent ./deploy/helm/tollwing-agent
Pin --set agent.provider / agent.region only if IMDS is blocked.
Live per-pod, per-namespace dollars across all 9 AWS billing paths. The agent exposes tollwing_* Prometheus metrics on :9990/metrics; import the included 23-panel dashboard. Or run the CLI: tollwing-cli -server http://tollwing-server:8080 -hours 24.
Read-only observability. Apache-2.0. Verify every number.
Every Tollwing eBPF program is checked by the Linux kernel verifier before it runs. Unlike a kernel module, a verified program cannot crash your kernel, loop forever, or read arbitrary memory. At worst it produces incorrect data, never an outage.
The agent attaches cgroup/connect4 and sock_ops hooks to count bytes. It never modifies, drops, or redirects a single packet.
We aggregate in-kernel with PERCPU maps, so there is no per-packet userspace handoff. Overhead stays ~0.1–0.5% CPU per node. We will not tell you it is zero, and we publish our benchmark methodology so you can confirm it on your own nodes.
It’s Apache-2.0, so your security team can read exactly what runs before you deploy it. Read the architecture →
Every byte classified across all 9 AWS billing paths (same-zone, cross-AZ, cross-region, internet egress, NAT gateway, VPC peering, transit gateway, VPC endpoint, cloud-service public endpoint), in live dollars, not flow counts.
e.g. kafka-broker-2 → kafka-broker-0 cross-AZ replication, or spark-exec reaching S3 over a NAT gateway ($0.045/GB) instead of a VPC endpoint ($0.01/GB).
tollwing_* Prometheus/OTel metrics, a 23-panel Grafana dashboard, an OpenCost-compatible plugin, and a CLI (tollwing-cli -hours 24).
A one-page Cost Savings Report for the people who own the budget.
| Per-pod network dollars | Pre-DNAT Service intent | 9 AWS billing paths | Cost math | Open source | |
|---|---|---|---|---|---|
| Tollwing | ✓ pod + conversation | ✓ connect4 | ✓ 9-way | bytes × dated rate | ✓ Apache-2.0 |
| Kubecost / OpenCost | K8s cost allocation | post-DNAT blind spot | limited network view | cloud/K8s cost model | OpenCost Apache-2.0 |
| Datadog CCM + CNM | broad cost + network visibility | not Service-intent capture | not a billing-path classifier | platform cost + flow metrics | SaaS |
| AWS Network Flow Monitor | workload flow metrics | no | network performance focus | bytes, not pod dollars | AWS service |
Kubecost, OpenCost, Datadog, and AWS Network Flow Monitor are excellent at their own jobs. Tollwing is narrower: per-pod Kubernetes network cost, classified by AWS billing path. Cross-AZ Service traffic is a structural blind spot of the post-DNAT view. Tollwing reproduces the head-to-head itself: make sim-differential.
“We built Tollwing after watching cross-AZ charges get billed to the wrong pod, again and again. Tags tell you how things were provisioned, not how they actually talk. So we capture the connection before kube-proxy rewrites it, and we made every number checkable: run make demo and verify the math yourself. We’d rather you falsify it than trust us.”
New project, fresh public launch. Real platform-engineer quotes will replace this as they arrive. We won’t fake them in the meantime.
Everything you’ve seen above is free and Apache-2.0: the eBPF agent, the 9-way per-pod classifier, the pre-DNAT service-intent capture, exact-once cross-AZ pricing, the cost engine, the demo, the Grafana dashboard, the CLI, the OpenCost plugin. That is the whole engine, not a teaser.
For proving and running the AWS network-cost engine on one cluster.
Self-hosted, license-gated. For teams that need reconciled bills, more clusters, and governance controls.
Community is AWS-only and single-cluster on purpose. AWS’s billing-path complexity is where the hidden cross-AZ and NAT spend actually lives, so prove the value on one AWS cluster first and scale only if it pays off. If you never upgrade, nothing you built on the free tier stops working. There’s no hosted dependency and no phone-home.
No form, no nurture sequence. Email the maintainer for Enterprise, design-partner access, or a pre-install read on whether Tollwing is likely to find anything useful.
Tollwing ships with a governance constitution, 11 ADRs (the pre-DNAT capture is DEC-003), and a CI governance gate, because we hold the cost logic to a high bar and you should be able to see exactly how. Constitution · Architecture · ADR log