The Kubernetes Ownership Model

Published in

ITNEXT

8 min readOct 30, 2020

So you have decided to adopt Kubernetes in your organization? Congratulations! You may have made an awesome decision or a terrible one, I don’t know. And this blog will not help you answer that. Instead the focus of this blog is to address the initial challenges an organization of any size faces when adopting Kubernetes and how you can avoid some of the pitfalls if you plan carefully from the start. Let’s get started!

What is the Ownership Model?

The Kubernetes Ownership Model describes the responsibilities and accountabilities of the people or teams responsible for managing different aspects of the Kubernetes implementation. It ensures that people don’t step over each other’s toes and governance becomes simpler. Let me present the model that has worked for me.

The model not only describes the owning teams but also the tools that enable the ownership. These tools encourage Infrastructure as Code principles like GitOps so that you get auditable and repeatable systems. These are by no means the only tools that can be used for the job, there are many others. I provide here an opinionated list that has worked for me.

Another thing to remember is that the owning teams are representative roles. These roles (like IAM roles) can be assumed by the same person. This is common in startups where a person dons multiple hats to serve various needs.

Why do we need the Ownership Model?

When taking the first steps towards adopting Kubernetes, you will quickly realize that before deploying your own containerized application, you must install a myriad of tools that will enable different capabilities that your application needs. Such capabilities include Layer 7 routing like Nginx Ingress, DNS management with External DNS, TLS certificates with Cert Manager and many more. Before long you will face questions like “Who is responsible for these applications?”, “Where should we deploy it?”, “How do we track various application deployments?” and so on.

Even before this comes the question of Kubernetes itself. “Who is responsible for maintaining the Kubernetes clusters (EKS, AKS, GKE, KOPS etc)?”, “Can anyone request/create a new Kubernetes cluster?” and more.

All these questions and more can be addressed by implementing the Ownership Model.

The Core Infrastructure Layer

The first layer in the model describes the Kubernetes cluster itself and the worker nodes. Irrespective of whether you decide to create your own cluster with Kops or opt for a managed offering from any of the cloud providers like EKS, AKS etc., there will be some management needed. You are still responsible for upgrading the control plane, updating the worker nodes AMIs etc. Persons responsible for this will need elevated rights that should not be with regular DevOps teams.

When setting up this layer at my company (we chose EKS and AKS), I found Terraform to be the most straightforward tool for the job. It is a mature offering from HashiCorp with a strong community. Especially for EKS which requires some post-installations steps with setting up aws-auth config map, I found the Terraform EKS Module to be extremely useful. In fact you should be using Terraform to manage all your Cloud infrastructure.

Below is an example of a simple EKS cluster and a node group with the EKS module.

module "my_dev_eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "12.2.0"

  cluster_name                    = "my-dev-eks"
  cluster_version                 = "1.18"
  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true

  cluster_endpoint_public_access_cidrs = [
    "0.0.0.0/0"
  ]

  subnets     = flatten([module.my_dev_eks_vpc.public_subnets, module.my_dev_eks_vpc.private_subnets])
  vpc_id      = module.my_dev_eks_vpc.vpc_id
  enable_irsa = true

  cluster_log_retention_in_days = 7
  cluster_enabled_log_types     = ["api", "scheduler", "audit"]

  tags = var.tags

  node_groups_defaults = {
    ami_type  = "AL2_x86_64"
    disk_size = 20
  }

  # to manage aws-auth config map.
  map_users = [
    {
      userarn  = "arn:aws:iam::xxxxx:user/user@domain.com"
      username = "username"
      groups   = ["system:masters"]
    }
  ]  node_groups = {
    node_a = {
      instance_type       = var.nodegroup_kafka_brokers
      subnets             = [module.my_dev_eks_vpc.private_subnets[0]]
      ami_release_version = "1.18.8-20201007"
      version             = "1.18"
      desired_capacity    = 1
      max_capacity        = 3
      min_capacity        = 1
      key_name            = "my ssh key"
    }
  }
}

Just make sure to only allow specific people write access to the Git repository holding the terraform files and you are good to go.

The Infrastructure Applications Layer

The second layer in the model describes the applications powering the infrastructure needed by your actual business applications. The promise of Kubernetes is not just in its scheduling and container orchestration but also the infrastructure automation with cool applications built by the Kubernetes community most of which are now adopted by CNCF.

I remember the time when requesting a firewall change took over 14 business days. Today it is a commit in a Git repository and then some applications doing their magic in a matter of seconds. What a time to be an Ops person!

This layer can be owned by the same team owning the first layer — the system admins. But if you want to make a separation, these set of applications can be handled by a separate Infrastructure Admin teams. The right answer here is whatever fits your existing organization structure.

So what are the applications in this layer? This layer deals with infrastructure concerns like DNS, TLS certificates, TCP/HTTP routing, Continuous Deployments etc. and aims to enable them as automatically as possible. Kubernetes Controller applications like External DNS, Cert Manager, Ingress Controllers, ArgoCD etc. make this happen by building on the Kubernetes API extensions (Custom Resource Controllers or CRDs). Adopt these applications in the architecture and make your life easy!

Next question is how to adopt these tools. Almost all Infrastructure applications have one thing in common: Helm Charts. All of them offer sophisticated helm charts that allow configuring every aspect of the application. I have found it to be the fastest way to test out the application and even deploy in production. It also helps with multiple environments (you do have separate Kubernetes cluster for PROD and TEST right?), you don’t want to find yourself upgrading Nginx Ingress to a new version directly in PROD. One misconfiguration and suddenly no traffic to your application — ouch!

So we have decided to adopt these applications using their Helm Charts. Great! How do we apply IaC principles on these helm charts? Are you deploying Helm Charts with configurations set via --set ... ? That’s a classic example of imperative commands. Where are you tracking it? Can you say with certainty the configurations currently applied to the application running in PROD? I found out answers to these questions in Helmfile. If Helm is the package manager for Kubernetes resources, then Helmfile is the package manager for Helm itself. Crazy right? Helmfile allows you to define your helm deployments in multiple environments declaratively. Here is an example of helmfile that can deploy ingress-nginx and external-dns helm charts in two different Kubernetes clusters declaratively.

environments:
  my-dev-eks:
  my-prod-eks:repositories:
  - name: ingress-nginx
    url: https://kubernetes.github.io/ingress-nginx
  - name: bitnami
    url: https://charts.bitnami.com/bitnamihelmDefaults:
  {{ if eq .Environment.Name "my-dev-eks" }}
  kubeContext: arn:aws:eks:eu-central-1:xxxx:cluster/my-dev-eks
  {{ end }}
  {{ if eq .Environment.Name "my-prod-eks" }}
  kubeContext: arn:aws:eks:eu-central-1:xxxx:cluster/my-prod-eks
  {{ end }}releases:
  - name: nginx-ingress
    namespace: nginx
    createNamespace: true
    chart: ingress-nginx/ingress-nginx
    labels:
      app: nginx-ingress
    version: 3.3.0
    values:
      - apps/nginx-ingress/values-{{ .Environment.Name }}.yaml
    secrets:
      - apps/nginx-ingress/secrets-{{ .Environment.Name }}.yaml  - name: external-dns
    namespace: external-dns
    createNamespace: true
    chart: bitnami/external-dns
    labels:
      app: external-dns
    version: 3.2.3
    values:
      - apps/external-dns/values-{{ .Environment.Name }}.yaml
    secrets:
      - apps/external-dns/secrets-{{ .Environment.Name }}.yaml

Once defined, deploying Nginx Ingress in my-dev-eks cluster is as simple as:

helmfile -e my-dev-eks -l app=ingress-nginx apply

-e to select the environment, -l to select the release via labels.

Simple right? Of course instead of running this command manually, you can setup a pipeline that automatically runs this command when certain configuration changes but that is beyond the scope of this blog.

The Business Applications Layer

Finally we come to the last layer of the model, the layer that was ultimately the reason why we adopted Kubernetes in the first place — the Business Applications layer. These are the applications that provide value to your customers and make money for your business. This is the end game. Easy to forget the real goal and lose yourself in Kubernetes!

These are the applications that you are most familiar with. You have probably already containerized it or on your way to it. You may have also decided on how you will create your Kubernetes resources: Kustomize, Helm Charts, Jsonnet etc. What’s the best way to deploy them to different Kubernetes clusters?

With GitOps in mind, at my company I decided to go with ArgoCD. FluxCD was a close second but ArgoCD ultimately won due to features like multi-tenancy, multi-cluster, integration with Azure AD/OIDC and an impressive UI. These were important for me. After over 6 months of use, I am super happy with the choice. Using Helm charts for Kubernetes resources and ArgoCD for continuous deployment, the developers job reduces to simply committing the new version in Git repository. Put this logic in your existing CI pipeline and you have an end-to-end CI/CD pipeline that can deploy your new commit to the Kubernetes cluster automatically.

Here is an example of ArgoCD Application CRD spec that reads from the Git repository containing Kubernetes resources defined in source and applies them to the Kubernetes cluster defined in target :

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-awesome-app
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  destination:
    namespace: prod
    server: https://kubernetes.default.svc
  source:
    path: my-awesome-helm-charts
    repoURL: git@gitlab.com/my-awesome-app-resources-repo
    targetRevision: master
    helm:
      valueFiles:
        - values/values.yaml
        - values/secrets.yaml
  project: my-apps
  syncPolicy:
    automated: {}

You may be wondering why I did not go with Helmfile for Business applications even though I was using Helm Charts. The answer to that lies in Deployment Cadence.

Deployment Cadence

Deployment cadence defines how frequently new versions of the software are released and deployed. In the Kubernetes Ownership Model, the deployment cadence is slowest in the first layer and fastest at the final business applications layer.

In the first layer, new versions of EKS and AKS are released usually after a month of new Kubernetes versions. The EKS worker node AMI versions are more frequently released, we usually chose to deploy them once a month or even less. For such a low cadence, Terraform fits the bill. It doesn’t require anything running live in the cluster or anywhere else. It is latent code called by the pipeline when requested. All the Terraform state is maintained in different S3 buckets.

In the second layer, the infrastructure applications have far more frequent releases. Depending on the criticality and the usefulness of the release, you may chose to deploy them weekly or so. Helmfile also does not need any active component running all the time. It is another latent piece of code only executed when needed. Perfectly suited for this cadence.

Finally in the business applications layer, deployment cadence is at its highest. New features are being added all the time and deployment is happening every other day if not every day (that’s the dream). At this cadence, you need maximum control and agility. ArgoCD has its own application controller running within the cluster polling the Git repository every two minutes looking for a change to be applied to the cluster. As good as Helmfile is, it just won’t cut it for the cadence requirements of this layer. Also interesting to note ArgoCD does not actually deploy Helm releases, it does an equivalent of helm template | kubectl apply -f - effectively converting the helm chart to a collection of raw Kubernetes resources applied directly to the cluster. I prefer this to Helm releases for my business applications.

Conclusion

In this blog, I wanted to highlight how the Kubernetes adoption journey is so much more than just deploying your containerized business applications. You can get started very quickly but to maintain anything at Production scale without compromising on governance, security and reliability, you need to apply Infrastructure as Code principles across your infrastructure stack. I hope the Kubernetes Ownership Model presented here helps you get started on this journey safely!

ITNEXT

The Kubernetes Ownership Model

What is the Ownership Model?

Why do we need the Ownership Model?

The Core Infrastructure Layer

The Infrastructure Applications Layer

The Business Applications Layer

Deployment Cadence

Conclusion

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in ITNEXT

Written by Abhinav Sonkar

No responses yet