We were promised that DevOps would eliminate the wall between development and operations. Developers would own their deployments. Operations would enable self-service. Everyone would move faster.
What we got was something else entirely.
Developers now spend 4-6 hours per week on infrastructure tasks—provisioning, debugging, waiting for access, chasing down configuration issues. That's half a day every week spent not building features. For a team of twenty engineers, that's ten person-weeks of productivity lost every month to infrastructure friction.
The irony? Most companies have "implemented DevOps." They have CI/CD pipelines. They use Infrastructure as Code. They run Kubernetes. They check all the boxes. But their developers are still stuck in ticket queues, still waiting for someone else to provision a database, still debugging Terraform failures at 2 AM.
Platform engineering was supposed to fix this. Instead of every team building their own infrastructure toolchain from scratch, central platform teams would build golden paths—opinionated, well-supported routes to production that developers actually want to use.
The theory is sound. The execution is failing. Only 22% of teams report high satisfaction with their internal platforms. Nearly 30% of platform teams admit they don't measure success in any formal way. And organizations are running cloud infrastructure at an average 35% waste rate, costing the industry over $100 billion annually.
Where Platform Engineering Goes Wrong
I've audited dozens of platform implementations across companies from seed-stage startups to Fortune 100 enterprises. The failures follow predictable patterns. Here are the four most common mistakes I see:
1. Building for Operations, Not Developers
This is the cardinal sin. Platform teams sit in infrastructure organizations. They're staffed by engineers who came from operations backgrounds. They measure success by uptime, cost reduction, and security compliance.
Developers measure success by shipping speed.
When these priorities conflict—and they always do—the platform optimizes for the metrics the platform team owns. That's why you end up with provisioning workflows that require six approval steps, or deployment pipelines that take thirty minutes to run comprehensive security scans on every commit, or environments that are "secure by default" and unusable without a week of configuration work.
The platform team reports that they've reduced cloud costs by 15%. Meanwhile, developer productivity dropped 20% and nobody uses the new "optimized" workflows.
2. The "Move Fast and Break Things" Fallacy
Some platform teams swing too far the other way. They read that Netflix or Spotify or Meta gives engineers unlimited production access and assume the same approach works for their 200-person SaaS company.
It doesn't.
Here's what actually happens: developers get access to powerful infrastructure tools they don't fully understand. They provision resources that aren't properly tagged for cost allocation. They leave test environments running for months. They accidentally expose databases to the internet because the defaults weren't secure.
Six months later, the platform team spends three months building guardrails and restrictions that should have existed from day one. Developer trust erodes. The platform gains a reputation as "constantly changing the rules."
3. Solving the Wrong Problems
Platform teams love infrastructure problems. They're comfortable with Kubernetes configurations and Terraform modules and networking rules. So they build solutions for those problems.
The actual bottlenecks developers face are often simpler: unclear documentation, inconsistent naming conventions, mysterious error messages, or just figuring out which team owns a particular service.
A developer trying to ship a feature doesn't need a more powerful infrastructure CLI. They need to know: which database should I use, how do I get credentials, and who do I ask when something breaks?
I worked with one company whose platform team had built an elaborate internal PaaS on top of Kubernetes. Developer adoption was 12%. After user research, they discovered the real pain point: developers couldn't find the documentation for basic tasks. The platform team spent two weeks organizing existing docs and adding a search interface. Developer satisfaction scores improved more than from the entire PaaS project.
4. Measuring Activity Instead of Outcomes
Nearly 30% of platform teams don't measure success in any formal way. Of those that do, most track vanity metrics: number of services onboarded, lines of Terraform written, cloud cost reductions achieved.
These metrics optimize for the platform team, not the developers they serve.
The teams that succeed track different numbers: time from code commit to production, developer NPS, mean time to recover from incidents, reduction in infrastructure-related support tickets. They treat platform engineering as a product problem, not an infrastructure problem.
The Platform as Product Framework
Here's the four-step framework I use to build platforms developers actually adopt. This works whether you're starting from scratch or rescuing a struggling platform initiative.
Step 1: Start with User Research (Yes, Really)
Before writing a single line of code, talk to your developers. Not infrastructure engineers—application developers who ship features to customers.
The goal isn't to validate your existing platform plans. It's to discover what developers actually need. You'll be surprised how often the solutions are simple and the problems are communication, not technology.
Step 2: Define Your Golden Paths
A golden path is an opinionated, supported workflow that handles 80% of common use cases. It's not a one-size-fits-all solution, and it's not a mandatory standard. It's the path of least resistance—the easiest way to do something that also happens to be the right way.
Start with one golden path. Common starting points:
- Web service deployment: Code commit → automated build → staging deployment → production deployment with canary rollout
- Database provisioning: Request → approved database type → automated provisioning → credentials delivered via secrets manager
- Environment creation: Template selection → resource sizing → automated setup → developer access granted
The key characteristics of a golden path:
- It works out of the box: No additional configuration needed for the standard case
- It's well-documented: Clear instructions, troubleshooting guides, and escalation paths
- It's supported: Someone owns it, maintains it, and responds when it breaks
- It's optional: Developers can choose another path if they have good reason
Golden paths reduce cognitive load. Instead of making hundreds of micro-decisions about infrastructure configurations, developers make one decision: use the golden path. Everything else is handled.
Step 3: Measure Developer Experience
You can't improve what you don't measure. Platform teams need metrics that reflect the developer experience, not infrastructure health.
The Four Golden Metrics for Platform Engineering:
Developer Experience (DX) Score: Regular surveys asking developers to rate platform tools on a scale of 1-10. Track trends over time and segment by team, tenure, and tool type.
Time to First Deployment: How long does it take a new developer to get their first code into production? This measures onboarding friction and documentation quality.
Deployment Frequency: How often do teams deploy? The best platforms make deploying safe and easy, which enables smaller, more frequent changes.
Infrastructure-Related Support Tickets: Track tickets opened for infrastructure help, broken down by category. A rising trend indicates friction in the platform. Teams with mature IDPs report 40% fewer of these tickets.
Review these metrics monthly in a platform team retrospective. The goal isn't to hit arbitrary targets—it's to identify where developer friction is highest and prioritize accordingly.
Step 4: Treat Support as a Feature
Platform teams often underestimate the support burden. Developers will have questions. Things will break in unexpected ways. Edge cases will emerge.
How you handle this determines whether developers trust the platform.
Establish clear support SLAs and communicate them. Typical structure:
- Golden path issues: Respond within 2 hours, resolve within 24 hours
- Documentation requests: Respond within 24 hours, update docs within 1 week
- Feature requests: Review weekly, respond with roadmap alignment or workaround
- Custom infrastructure needs: Consult within 48 hours, provide alternatives or escalation path
Most importantly, every support interaction is an opportunity to improve. If multiple developers ask the same question, your documentation has a gap. If a particular error keeps occurring, your tooling needs adjustment. Track support themes and feed them into your platform roadmap.
The Cost of Getting It Wrong
Let's talk numbers. Gartner projects public cloud spending will hit $723.4 billion in 2025. The Flexera State of the Cloud Report consistently finds that 27% of that spend is wasted—unchanged for three consecutive years. That's nearly $200 billion in waste across the industry.
But cloud waste isn't just an infrastructure problem. It's a productivity problem.
When developers spend hours provisioning resources, debugging infrastructure issues, or waiting for access approvals, they're not building features. They're not fixing bugs. They're not talking to customers. The opportunity cost dwarfs the infrastructure cost.
A typical senior engineer costs $150,000-200,000 annually. If that engineer spends just 20% of their time on infrastructure tasks due to platform friction, that's $30,000-40,000 in lost productivity per person per year. For a team of twenty engineers, you're looking at $600,000-800,000 in annual value that's not being delivered to customers.
And that's before you factor in the attrition cost. Developers who spend their days fighting tools instead of shipping code don't stick around. Teams using well-designed internal platforms report 30% higher retention rates. In a competitive hiring market, that's worth real money.
The Real Talk
Platform engineering isn't a technology problem. It's an organizational design problem. It requires platform teams to think like product managers, not infrastructure operators. It requires leadership to measure developer productivity, not just cloud costs. It requires patience to build the right thing instead of rushing to build the shiny thing.
The companies winning right now aren't the ones with the most sophisticated Kubernetes setups. They're the ones whose developers can go from idea to production in hours, not days. They're the ones where infrastructure is invisible—reliable, fast, and out of the way.
If your platform initiative is struggling, stop building and start listening. Your developers will tell you exactly what they need. The question is whether you're ready to hear it.
Run the user research this week. Pick one golden path to polish. Set up one developer experience metric. Small improvements compound into platforms people actually use.
Want help with this?
I'll audit your platform engineering initiative and identify the friction points costing you developer productivity.
Based in Detroit. Serving platforms globally.