Part 4 — Notes from the field: API Platform as the Foundation for AI

When Platforms Go Wrong

Sep 11, 2025

abandoned building surround with trees — Photo by David Baker on Unsplash

TL;DR: The past nine months of consulting with organisations have been an eye-opener, and I thought I would share some insights on what I have learned. As the AI build and agent race heats up, I’ve seen teams blame their API platform when the real issue was misalignment: the chosen platform didn’t fit how the organisation builds software or uses the cloud. Feature‑checklists drove procurement, not delivery realities. The result: fragmentation, slow delivery, and waste that can exceed $1M in a year once you factor in production hardening, rework, and platform team burn. With AI on every roadmap, a poorly implemented platform doesn’t just slow product teams—it accelerates agent sprawl and security risk. Here are some observations.

Why good teams end up with the wrong platform

Most product teams are excellent at building applications and domain APIs. That mindset doesn’t automatically translate to running an API platform—a multi-tenant, multi-team product that must balance paved roads, guardrails, and self-service. When platform selection over‑indexes on features and under‑weights fit‑for‑delivery, you get:

Mismatch with delivery model: The platform demands patterns (e.g., centralised gateways and bespoke pipelines) that fight how teams ship (e.g., trunk‑based, GitOps, ephemeral environments).
Mismatch with cloud posture: The platform’s opinion (control plane location, data plane requirements, VPC/VNet peering, private service connect, multi‑region) clashes with how you consume AWS/Azure/GCP.
Mismatch with operating model: The vendor demo assumed a single golden team; your reality is ten squads with different maturity levels, change windows, and compliance constraints.

The platform becomes the scapegoat for delivery friction that organisational misfit actually causes.

The hidden cost of misadventure (why it adds up to $1M+)

It’s not just the licence. Add these everyday line items:

Productionisation tax: bespoke CI/CD, custom runners, env promotion, secrets management, test harnesses, audit trails.

Rework: undoing multi‑gateway topology mistakes; re‑publishing APIs to meet auth, rate‑limit, or observability standards not baked in from day one.

Platform team burn: two to four engineers spending most cycles on manual enablement and firefighting instead of productising paved roads.

Opportunity cost: delayed product features, duplicated integration work, and snowflake exceptions that become permanent.

Across a year, these efficiently compound to seven figures—without delivering the reliability, velocity, or AI‑readiness you bought the platform for.

Symptoms you’re living with the misalignment

Gateway drift: multiple gateways/clusters, each with different policies and pipelines.

Policy sprawl: every team writes its own JWT validation, mTLS, and rate‑limits; nobody trusts the central templates.

Manual enablement: Onboarding a new API still requires a platform engineer to participate in a Zoom call. This may also be true for your consumers when they need keys or credentials to your APIs

Telemetry gaps: you can’t answer “what’s the 95th percentile latency per product per tenant?” without a war room.

Security exceptions: ad‑hoc bypasses for “just this one customer integration.” Exceptions become the norm.

AI raises the stakes: from APIs to agents

AI agents intensify existing cracks:

Agent sprawl: business units spin up their own integrations, bypassing platform guardrails.

Inconsistent security: model‑to‑service calls skip central authN/Z and data redaction.

Unobservable conversations: prompts and tool invocations lack traceability tied to API products/tenants.

Design target: an API platform that treats agents like first‑class clients—subject to the same products, quotas, policies, and audit as human or app consumers. Concretely:

Tooling interfaces: standardise agent tools over APIs/events (e.g., HTTP+JSON, gRPC, AsyncAPI) with declarative scopes.

Policy‑as‑code: reusable authn/z, threat protection, PII redaction, and rate‑limits as versioned modules.

End‑to‑end observability: traces linking prompt → tool call → API product → backend, with tenant and data lineage tags.

Context governance: access control for RAG stores and cache layers tied to API entitlements.

Closing thought

“The share of companies abandoning most of their AI initiatives jumped to 42% in 2025, up from 17% last year.”
— S&P Global Market Intelligence (via Cybersecurity Dive, Mar 18, 2025)

According to a Gartner report, over 85% of AI projects fail to meet expectations (Pixitech, 2025), with poor data quality, siloed systems, and process integration issues (where APIs play a pivotal role) as the most frequently cited root causes.

AI isn’t forgiving. Whatever cracks exist in your API platform will be amplified by agents, automation, and integration velocity. Start by aligning the platform to how you build and how you cloud. Make the paved road the obvious path, and you’ll unlock the upside you expected when you signed the contract.

Rajeev’s Substack

Discussion about this post

Ready for more?