Skip to main content

Kubernetes, GitOps and owning your data

Computer and Database

This year I have been working on the ConfigSync project at Google.  The concept of syncing a git repository for a Kubernetes cluster has been around forever, for instance git-sync v2.0 was published in 2016.  What is the draw of using git as the source of truth while it often is a mirror of what’s in the cluster?  There are some obvious reasons like sharing some common code and preserving versions.  Today, I’d like to focus on the data ownership aspect of Git and Kubernetes combination, which is one of the most powerful features of Kubernetes and should be considered by platform designers in the future.

First key thing about Kubernetes is the Kubernetes Resource Model (KRM) which is described here.  While under the hood JSON or YAML gets sent over the wire to the cluster, the API style is very different than a most REST APIs which expect you to have a sequence of calls.  The focus of KRM is the final desired state which gets sent or retrieved from the cluster.

There are multiple competing and complimentary Kubernetes implementations but all of them tend to do a decent job of instantiating the application configuration state defined by KRM.  Once you make the switch to thinking about your application configuration as data it makes sense to utilize a version control system to keep track of changes, record the authors, have pre-submit hooks and triggers.  Most importantly keeping the configuration data version control out of the system that operates on it gives users the ability to implement multi cloud scenarios where the same(ish) data is used with systems like GKE and AKS.

How does this relate to other platform design?  There are to key system features that contribute to platform adoption:

  1. If the platform is designed to store/retrieve desired state in a standard and extensible way it provides a way for users to version, edit and verify such data outside of the system.  This can aid in compliance and data retention laws.
  2. When there are multiple systems that can work on the data all such vendors might benefit from the market expansion for systems that operate on such data.  Kubernetes is one such example, HTML is another, there are other examples as well.

Imagine if your social media data was easily exportable like KRM?  You could migrate your profile from one application that allows you to browse social media data to the next possibly prioritizing different privacy or usability features.  Same ideas can work in CRM systems.   

The resource model doesn’t have to be in the same shape, but the principles of KRM can be applied to many domains giving customers more control and more choice.  While it might be a tough sell as you are proving to your VCs what your mote is, you could be building an architecture that becomes the standard and expand the market as a whole.

Comments

Popular posts from this blog

SDET / QA Engineer Interview Checklist

After interviewing and hiring hundreds of engineers over the past 12+  years I have come up with a few checklists.  I wanted to share one of those with you so you could conduct comprehensive interviews of QA Engineers for your team. I use this checklist when I review incoming resumes and during the interview.  It keeps me from missing areas that ensure a good team and technology fit.  I hope you make good use of them.  If you think there are good questions or topics that I have missed - get in touch with me! SDE/T or QA Engineer interview checklist from Mike Borozdin If you like this checklist you might want to check out these posts: Emotional Intelligence in Software Teams   and  Good-bye manual tester, hello crowdsourcing!

Code versus Configuration

At Ethos we are building a distributed mortgage origination system and in mortgage there is a lot of different user types with processes that vary depending on geography.  One of our ongoing discussions is about how much of the logic resides in code vs. being in a workflow system or configuration.  After researching this topic for a bit, I have arrived at a conclusion that the logic should live outside of code very infrequently, which might come as a surprise to a lot of enterprise software engineers. Costs of configuration files and workflow engines First thing that I assume is true is that having any logic outside of the code has costs associated with it.  Debugging highly configurable system involves not only getting the appropriate branch from source control, you also need to make sure that the right configuration values or the database.  In most cases this is harder for programmers to deal with.  In many FinTech companies where the production data is not made readily acce

Intuitive Programming - Comments

Comments are a topic of vibrant discussion.  Ever since programmers could leave some text in the program that was ignored by the machine the debate started: “what’s a good comment, what’s a bad comment, why comment?” There are endless instructions to programmers that say many of the following things: 1) Describe your function! 2) Don’t write in the comment what you wrote in the code. 3) Tell people why you are doing what you are doing. What I think has been missing from this discourse is the audience for comments and through those audiences there is intent.  The code is being read, skimmed or analyzed by people and tools.  So what are the audiences and reading modes? 1) Maintaining and enhancing the code 2) Skimming through the entire module or file to figure out what the overall structure is 3) Reviewing the test files to check out the test coverage and edge cases 4) Seeing the docstrings of functions while being in a separate file altogether 5) Reviewi