Skip to content

Troubleshooting Kubernetes Leases

Updated  by Bryan.Seay@sysdig.com

Summary

Agent version 12.0.0 and later will try to use Kubernetes Leases to control how data is pulled from the API Server. If it cannot create leases then it will fall back to a previous algorithm. This document addresses how to fix problems when the agent tries to create leases.

Symptom

Agent logs show the following error:

Error, lease_pool_manager[2989554]: Cannot access leases objects: leases.coordination.k8s.io is forbidden: User "system:serviceaccount:sysdig-agent:sysdig-agent" cannot list resource "leases" in API group "coordination.k8s.io" in the namespace "sysdig-agent"

Resolution

Prerequisites

  • Sysdig Agent v12.0.0 or above (a subset of features exist since v11.3.0)

  • Kubernetes v1.14 or above

Benefits of Using Leases

Using leases, the agent can efficiently control when and how it pulls data from the API Server. In small kubernetes clusters (less than 50 nodes), this is a nice-to-have feature which gives an easy insight into what the agent is doing. In large clusters (greater than 200 nodes), using leases is strongly recommended to ensure that the agent does not overload the API server. 

Agent Privileges to Create and Update Leases

For most Kubernetes objects, the agent has `get, list and watch` privileges. But for leases, it uses `get, list, watch, create, and update`. This is needed so that the agent can create and update the lease objects that are used to make distributed decisions.


If the agent isn’t given create and update permissions, then it will fail right after boot and fall back to the previous method of gathering Kubernetes data. This method has a larger impact on the API Server and is not recommended for Kubernetes clusters larger than 200 nodes or any cluster where the API Server(s) do not have a significant amount of cpu headroom.

Configuring Existing Agent Installation to Use Leases

Note: This is only applicable to users who configured an agent before September 2021 and who aren’t using helm charts to upgrade their agent version.


Existing users need their clusterrole and daemonset update to match the latest version:
sysdig-cloud-scripts/sysdig-agent-clusterrole.yaml at master · draios/sysdig-cloud-scripts 

sysdig-cloud-scripts/sysdig-agent-daemonset-v2.yaml at master · draios/sysdig-cloud-scripts

Step 1: Add lease permissions to clusterrole 

The following patch will automatically update the agent’s clusterrole if using the `sysdig-agent` namespace. It will add the ability to read and write leases.


$ kubectl patch clusterrole sysdig-agent -n sysdig-agent --patch='[{"op": "add", "path": "/rules/-", "value": {"apiGroups": ["coordination.k8s.io"], "resources": ["leases"], "verbs": ["get", "list", "create", "update", "watch"]}}]' --type json


Alternatively, edit the ClusterRole and add the following:

rules:

- apiGroups:

  - coordination.k8s.io

  resources:

  - leases

  verbs:

  - get

  - list

  - create

  - update

  - watch

Step 2: Add DownwardAPI to daemonset

The following will pass the agent’s pod name and namespace down to the agent so that the agent knows this information before ever contacting the API Server.


Edit the Daemonset and add the green lines:

spec:

  template:

    spec:

      volumes:

      - name: podinfo

        downwardAPI:

          defaultMode: 420

          items:

          - fieldRef:

              apiVersion: v1

              fieldPath: metadata.namespace

            path: namespace

          - fieldRef:

              apiVersion: v1

              fieldPath: metadata.name

            path: name

      containers:

      - name: sysdig-agent

        volumeMounts:
        - mountPath: /etc/podinfo

          name: podinfo


Known Issues

The cold-start leases are intentionally spreading out the load on the API Server. Since it takes longer for any given agent to build its cache, this can lead to missing metadata when an agent pod or process is restarted.

Internal Only Comments/Notes

This is only applicable to users who configured an agent before September 2021 and who aren’t using helm charts to upgrade their agent version.