Tollwing is an open-source eBPF agent that meters every byte of pod TCP traffic in-kernel (UDP and QUIC are one flag away: -udp), attributes it to the pod that sent or received it, and prices it across 9 AWS billing paths, live, in dollars. Of the metered bytes, the few whose billing path can’t be proven are booked Unknown, never guessed; bytes it doesn’t meter are absent, never estimated. Every dollar is metered bytes × a dated rate. No app changes.
① THE DIFFERENTIATOR: cross-AZ cost that post-DNAT-only tools miss cart [shop/us-east-1a] ──1.0 GiB──▶ checkout (ClusterIP) [shop/us-east-1b] billing path attributed to cost cross_az checkout $0.01 ◀── correct ─────────────────────────────────────── total $0.01 Post-DNAT-only tools (Kubecost / OpenCost) see only the rewritten destination IP, so they bill this to cart, or miss it.
A cent a gigabyte, charged in each direction, sounds like nothing. At Kafka RF=3 replication scale it is tens of thousands of dollars a month, billed to a pod you cannot currently see.
Real output from make demo. Pure Go: no cloud account, no cluster, no kernel. Every dollar is bytes × a dated rate, re-derived independently by an oracle (make sim).
Your AWS bill lumps every byte that crosses an availability zone, hits a NAT gateway, or leaves the VPC into one opaque line: EC2-Other, DataTransfer. It tells you the number is big. It never tells you which workload spent it. Your tags describe how things were provisioned, not how they actually talk at runtime. So when the data-transfer line jumps, you are guessing.
A Kafka cluster with RF=3 replication crosses two AZ boundaries on every produced byte. That can be tens of thousands of dollars a month. Datadog spreads it across workloads by traffic share; Kubecost’s heuristic buckets default pod-to-pod RFC1918 traffic to in-zone, i.e. free (issue #2464); AWS shows bytes, not dollars. As of July 2026, no tool we know of meters that flow to the pod pair, prices it by billing path, and shows you the dollars.
You can see the bill is high. You cannot see which pod is paying it. There is a structural reason post-DNAT tools get this wrong, and here it is.
Heuristic tools fill the gaps they can’t see with estimates. That is how a charge lands on the wrong pod. Tollwing does the opposite. Every dollar is bytes × a dated rate: traceable, never estimated. It counts each cross-AZ interaction’s cost exactly once, with no double-counting. And when a single agent genuinely cannot see a flow’s zone, it marks that leg Unknown instead of inventing a number. An independent oracle then re-derives every figure in make sim.
Never invent or double-count a dollar. Every figure resolves to bytes × a dated rate.
Refuse to guess. When a leg can’t be seen, mark it Unknown instead of fabricating it.
That is why the demo bills cross-AZ to the right service exactly once.
When a pod calls a Kubernetes Service, kube-proxy rewrites the destination mid-flight (DNAT), like a mail forwarder swapping the address on the envelope. Any tool that reads the envelope after forwarding sees the wrong recipient, and often the wrong availability zone. That is how the cross-AZ charge lands on the wrong pod, or vanishes.
We see the connection twice: what it meant to reach, and where it actually landed.
On the calling node, a cgroup/connect4 eBPF hook captures the pre-DNAT destination, the ClusterIP the pod meant to reach. A ClusterIP has no zone, so that agent leaves the zone Unknown rather than guessing it (P5). The agent on the backend node sees where the connection actually landed, in a zone it knows, and prices the cross-AZ movement exactly once, attributed to the Service (P4, no double-count). In-kernel PERCPU maps aggregate the bytes, so there is no per-packet userspace handoff. The two-phase capture is DEC-003; leaving the dialer leg Unknown rather than guessing it is DEC-010.
git clone https://github.com/tollwing/tollwing cd tollwing make demo
billing path example flow data cost ──────────────────────────────────────────────────────── same_zone api → cache 1.0 GiB $0.00 cross_az cart → checkout 1.0 GiB $0.01 cross_region api → replica 1.0 GiB $0.01 nat_gateway worker → nat 2.0 GiB $0.27 internet_egress api → internet 5.0 GiB $0.45 vpc_peering api → peer-db 1.0 GiB $0.01 transit_gateway api → tgw-peer 1.0 GiB $0.01 vpc_endpoint api → s3-endpoint 1.0 GiB $0.01 cloud_service_public api → s3-public 1.0 GiB $0.00
Each row is an independent scenario, priced by the same engine, billing only the direction the provider bills: cross-region and transit gateway charge the sending side only, and the NAT row stacks $0.045/GB processing on the egress it fronts. NAT vs VPC-endpoint is the boring, huge fix: $0.045/GB → $0.01/GB.
On a real cluster the difference is stark: where OpenCost reports $0 and Kubecost defaults the ClusterIP leg to same-zone/free with no service field (kubecost#2464, closed unfixed), Tollwing attributes cross_az to the dialed Service.
Read a per-pod, per-billing-path cost report locally. No cloud account, no cluster, no kernel.
Deploy the eBPF agent as a DaemonSet. Read-only. No app changes, no sidecars, designed to a 0.1–0.5%-of-one-core overhead budget (what’s measured so far). Auto-detects provider + region via IMDS.
helm install tollwing-agent ./deploy/helm/tollwing-agent
Pin --set agent.provider / agent.region only if IMDS is blocked.
Live per-pod, per-namespace dollars across all 9 AWS billing paths. The agent exposes tollwing_* Prometheus metrics on :9990/metrics, your Prometheus scrapes them, and the included 23-panel dashboard renders them. No control-plane server required for the live single-cluster view.
Read-only observability. Apache-2.0. Verify every number.
Every Tollwing eBPF program is checked by the Linux kernel verifier before it runs. Unlike a kernel module, a verified program cannot crash your kernel, loop forever, or read arbitrary memory. At worst it produces incorrect data, never an outage.
The agent attaches cgroup/connect4 and sock_ops hooks to count bytes. It never modifies, drops, or redirects a single packet.
We aggregate in-kernel with PERCPU maps, so there is no per-packet userspace handoff. The agent is designed to a 0.1–0.5%-of-one-core overhead budget, and we won’t quote a measured number until a reproducible benchmark ships. We will not tell you it is zero, and we publish what’s measured so far, and what isn’t, so you can measure it on your own nodes instead of taking ours.
It’s Apache-2.0, so your security team can read exactly what runs before you deploy it. Read the architecture →
Every metered byte classified deterministically across 9 AWS billing paths (same-zone, cross-AZ, cross-region, internet egress, NAT gateway, VPC peering, transit gateway, VPC endpoint, cloud-service public endpoint), in live dollars, not flow counts. Anything unprovable lands in an explicit Unknown bucket, never in a guess.
e.g. kafka-broker-2 → kafka-broker-0 cross-AZ replication, or spark-exec reaching S3 over a NAT gateway ($0.045/GB) instead of a VPC endpoint ($0.01/GB).
tollwing_* Prometheus metrics your Grafana reads directly, the included 23-panel dashboard, and a standalone FOCUS-aligned JSON cost-export sidecar for external cost tooling. No new backend to run.
When one cluster proves it, Tollwing Enterprise adds the control plane: long-term history, a multi-cluster view, CLI + REST API, a one-page Cost Savings Report, and alerting.
As of July 2026, nothing else we can find ships all three at once: per-pod resolution, 9-way billing-path classification, and dollars derived from metered bytes. Here is the honest version of that table.
| Per-pod network dollars | Pre-DNAT Service intent | Billing-path granularity | Cost math | Open source | |
|---|---|---|---|---|---|
| Tollwing | ✓ pod + conversation | ✓ connect4 | ✓ 9-way + explicit Unknown | bytes × dated rate | ✓ Apache-2.0 |
| Kubecost / OpenCost | per pod via conntrack | post-DNAT blind spot (#2464) | 3 heuristic buckets | cloud/K8s cost model | OpenCost Apache-2.0 |
| Datadog CCM + CNM | workload-level, no per-pod | not Service-intent capture | 4 CUR transfer types | top-down bill spread | SaaS |
| AWS Container Network Observability | bytes, not dollars | no | per-workload cross-AZ + external | no dollars | AWS service |
Kubecost, OpenCost, Datadog, and AWS Container Network Observability are excellent at their own jobs; this table is about one narrow slice. Datadog CCM + CNM allocates real bill lines down to the workload by traffic share — a top-down spread across 4 CUR transfer types that requires CNM on every host and tracks no individual pod. Tollwing meters each flow bottom-up, pre-DNAT, and prices it per path: different question, different answer. Kubecost meters per pod but into 3 heuristic buckets, and loses ClusterIP intent (kubecost#2464, closed unfixed). AWS Container Network Observability shows per-workload bytes, not dollars.
Comparison as of 2026-07-02, from each vendor’s public docs. If it is wrong or goes stale, open an issue — we would rather correct this table than defend it.
“We built Tollwing after watching cross-AZ charges get billed to the wrong pod, again and again. Tags tell you how things were provisioned, not how they actually talk. So we capture the connection before kube-proxy rewrites it, and we made every number checkable: run make demo and verify the math yourself. We’d rather you falsify it than trust us.”
New project, fresh public launch. Real platform-engineer quotes will replace this as they arrive. We won’t fake them in the meantime.
Everything that attributes the cost is free and Apache-2.0: the eBPF agent, the 9-way per-pod classifier, the pre-DNAT service-intent capture, exact-once cross-AZ pricing, the cost engine, the service-dependency graph, the demo, and the FOCUS-aligned JSON cost-export sidecar. It runs on its own: the agent exposes Prometheus metrics your Grafana reads directly, no control-plane server required. That is the whole attribution engine, not a teaser.
“Forever” is not a slogan here; it is a versioned contract: OPEN-CORE.md. What shipped free stays free, the public tree stays Apache-2.0, accuracy and honesty fixes are always free, and the free agent contains no license code at all — there is nothing in it to unlock.
For seeing per-pod network cost live on one AWS cluster, in your own Grafana.
Self-hosted, license-gated. The control plane on top of the same agent: store it, scale it across clusters, alert on it, act on it.
The free agent is AWS-only and single-cluster on purpose: AWS’s billing-path complexity is where the hidden cross-AZ and NAT spend actually lives, so prove the value on one cluster first, in your own Grafana, and scale only if it pays off. Enterprise adds the control plane on top of the same agent: history beyond your Prometheus retention, a view across clusters, and the tooling to alert and act. If you never upgrade, the agent keeps working. There’s no hosted dependency and no phone-home, ever, and OPEN-CORE.md commits that the boundary only ever moves toward free.
No form, no nurture sequence. Email the maintainer for Enterprise, design-partner access, or a pre-install read on whether Tollwing is likely to find anything useful.
Tollwing ships with a governance constitution, a public ADR log (the pre-DNAT capture is DEC-003; the open-core boundary is DEC-013), and a CI governance gate, because we hold the cost logic to a high bar and you should be able to see exactly how. Constitution · Open-core boundary · Architecture · ADR log