After being involved in the Kubernetes project for about a year I have had some interesting reflections about the configuration and infrastructure space. I have talked to a few Platform engineers and have gained some interesting insights into how these organizations operate at scale. My observation is that in most organizations have a tough time measuring or financing their Platform Teams. A lot of the Platform teams are not sure how to evaluate whether or not their effectiveness. This situation paired with the fact that the investment in Platform is generally removed from the business KPIs leads to underfunding in Platform resources.
Most of the Platform projects are evaluated based on activity instead of impact. In the conversations and allocation of resources the head of Platform Engineering generally can come up with only a backlog of outstanding projects without a clear indicator of why speeding up delivery of those projects will improve the company bottom line. The usual narratives are:
1. Deploying a new type of infrastructure
2. Helping launch a new application
3. Taking on a set of applications which were acquired through M&A
The connection between those activities and company goals usually stop around uptime and security. Both of those are important but both of those are in the realm of “power grid” metrics - there is nothing to report until disaster strikes.
Given finite resource and ability for line of business (LOB) applications to have a more clear connection to revenue the Platform investments seem to get pushed down the stack. I have talked to some Platform teams where half a dozen engineers are helping a multi thousand software development organization.
Platform teams are frequently tasked with creating DevOps infrastructure and often scramble to keep up with the demand. The unique tooling like Terraform have a low “return on learning” for LOB application engineers and they try to push it off their plate.
One answer that teams come up with us using general purpose programming stacks for infrastructure projects, but that also leads to some hard problems. End to end testing of a module that instantiates a load balancer for instance requires a non trivial test setup.
What is the way you measure your Platform Team’s success and if you are on one how do you know that you are making an impact? We all know that number of bugs, lines of code, etc are not the right metrics. Being business does not equal being important. DM @mikebz on Twitter or leave a comment here.
Comments