Kubernetes Persistent Volumes and Claims-explained in detail

Introduction:
Understanding Kubernetes Persistent Volumes (PV) and Persistent Volume Claims (PVC) is crucial for managing data in your cluster efficiently. PVs act as storage resources, while PVCs provide a way for users to request specific storage resources. Together, they form a robust system for handling data persistence in containerized environments.

Managing storage in Kubernetes presents challenges due to the dynamic and ephemeral nature of containerized environments. Ensuring data persistence, scalability, and efficient resource utilization are critical concerns. Navigating these challenges requires a solid understanding of Kubernetes storage concepts.

First, we will learn what is a persistent Volume in Kuberenetes and how to create it and next, we will learn what is a Persistent volume claim and how it is bound to persistent volumes lastly we will learn how to integrate all these in our application pod.

Prerequisites.

Must have a basic knowledge of Kubernetes and its components.
You can visit our previous articles
An overview of the kubernetes cluster
Understanding-the-core-building-blocks-of-kubernetes.

What is the difference between an Ephemeral volume and a Persistent Volume?

In Kubernetes, both volumes and persistent volumes (PVs) are used to manage and provide storage to pods, but they serve different purposes and have distinct characteristics:

Ephemeral Volume:
- Scope: Associated with a particular pod.
- Lifetime: Tied to the pod’s lifecycle. When the pod is deleted, the volume is also deleted.
- Usage: Used to share data between containers in the same pod.
- Definition: Defined within the pod specification, under the volumes field.
- Management: Managed by the pod that uses it.
Persistent Volume (PV):
- Scope: Exists independently of any pod.
- Lifetime: Persists beyond the lifecycle of any pod using it.
- Usage: Shared across multiple pods. Can be claimed by PersistentVolumeClaims (PVCs).
- Definition: Defined as a cluster-level resource. Created separately from pods and can be dynamically provisioned or pre-existing.
- Management: Managed by the cluster administrator. Bound to PVCs, which are then used by pods.

Ephemeral volumes are scoped into individual pods and are suitable for sharing data between containers within the same pod. Persistent volumes, on the other hand, are cluster-level resources that provide shared storage to multiple pods, and their lifecycle is independent of any specific pod. Persistent volumes are particularly useful for scenarios where data needs to persist beyond the lifecycle of individual pods.

Table of Contents

Persistent Volumes

PVs decouple storage provisioning from the application lifecycle, providing a persistent storage solution that survives pod restarts and rescheduling.
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes but have a lifecycle independent of any individual Pod that uses the PV.

There are two ways PVs may be provisioned: statically or dynamically.

Static

A cluster administrator creates a number of PVs. They carry the details of the real storage, which is available for use by cluster users. They exist in the Kubernetes API and are available for consumption.

Dynamic

When none of the static PVs the administrator created to match a user’s PersistentVolumeClaim, the cluster may try to dynamically provision a volume, especially for the PVC. This provisioning is based on StorageClasses.

Types of Storage Supported by PVs:

Local Storage: Directly attached to the node, suitable for applications requiring high-performance access.
Networked Storage: Utilizes network protocols (NFS, iSCSI) for shared storage accessible across nodes.
Cloud Provider Storage: Leverages cloud-specific storage solutions like AWS EBS or Azure Disk are supported on older Kubernetes versions but are deprecated now.

Types of Persistent Volumes.

Persistent volume types are implemented as plugins. Kubernetes currently supports the following plugins:

csi – Container Storage Interface (CSI)
fc – Fibre Channel (FC) storage
hostPath – HostPath volume (for single node testing only; WILL NOT WORK in a multi-node cluster; consider using local volume instead)
iscsi – iSCSI (SCSI over IP) storage
local – local storage devices mounted on nodes.
nfs – Network File System (NFS) storage.

Hostpath volume.

apiVersion: v1
kind: Pod
metadata:
  name: test-pd
spec:
  containers:
  - image: registry.k8s.io/test-webserver
    name: test-container
    volumeMounts:
    - mountPath: /test-pd
      name: test-volume
  volumes:
  - name: test-volume
    hostPath:
      # directory location on host
      path: /data
      # this field is optional
      type: Directory

This example assumes that you want to use local storage on the host machine, Here the container directory /test-pd is mapped to the directory on the node /data.

This is similar to our docker volumes which typically store data in the /var/lib/docker/volumes directory but relying on hostPath for in kubernetes has limitations, especially in scenarios where pods can be rescheduled to different nodes. The use of hostPath ties the storage directly to a specific node’s local filesystem, and if a pod gets rescheduled to a different node, it might not have access to the same path, leading to potential data loss or inconsistency.

Hence we can use other storage options like an NFS or CSI.

NFS

This is a sample YAML manifest that uses NFS as a storage option.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-pvc
spec:
  capacity:
    storage: 5Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Recycle
  storageClassName: slow
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    path: /tmp
    server: 172.17.0.2

The server represents the NFS server IP address and also the path to map in the NFS server.

Components in Persistent volumes

Capacity

Generally, a PV will have a specific storage capacity. This is set using the PV’s capacity attribute. Currently, storage size is the only resource that can be set or requested. Future attributes may include IOPS, throughput, etc.

Volume Mode

Kubernetes supports two volume modes: Filesystem (default) and Block. Filesystem mounts volumes as directories, creating a filesystem if the volume is backed by an empty block device. Setting volumeMode to Block presents the volume as a raw block device, offering fast access without a filesystem layer. Applications in Pods must handle raw block devices in Block mode. See Raw Block Volume Support for an example.

Access Modes

A persistent volume can be mounted on a host in any way supported by the resource provider. As shown in the table below, providers will have different capabilities and each PV’s access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV’s capabilities.

The access modes are:

ReadWriteOnce the volume can be mounted as read-write by a single node.

ReadWriteOnce access mode still can allow multiple pods to access the volume when the pods are running on the same node.

ReadOnlyMany the volume can be mounted as read-only by many nodes.ReadWriteManythe volume can be mounted as read-write by many nodes.

In the CLI, the access modes are abbreviated to:

RWO – ReadWriteOnce
ROX – ReadOnlyMany
RWX – ReadWriteMany
RWOP – ReadWriteOncePod

class

A persistent volume (PV) in Kubernetes can be associated with a class using the storageClassName attribute. A PV of a specific class can only be bound to PersistentVolumeClaims (PVCs) that request that class. If a PV has no storageClassName, it has no class and can only be bound to PVCs that request no specific class.

Previously, the volume.beta.kubernetes.io/storage-class annotation was used instead of storageClass Name. Although the annotation still works, it will be fully deprecated in a future Kubernetes release.

Reclaim Policy

Current reclaim policies are:

Retain — manual reclamation
Recycle — basic scrub (rm -rf /thevolume/*)
Delete — associated storage asset such as AWS EBS or GCE PD volume is deleted
Currently, only NFS and HostPath support recycle

Mount Options

A Kubernetes administrator can specify additional mount options for when a Persistent Volume is mounted on a node.

Note: Not all Persistent Volume types support mount options.

The following volume types support mount options:

azureFile
cephfs (deprecated in v1.28)
cinder (deprecated in v1.18)
gcePersistentDisk (deprecated in v1.28)
iscsi
nfs
rbd (deprecated in v1.28)
vsphereVolume

Mount options are not validated. If a mount option is invalid, the mount fails.

In the past, the annotation volume.beta.kubernetes.io/mount-options was used instead of the mountOptions attribute. This annotation is still working; however, it will become fully deprecated in a future Kubernetes release.

Phase

A PersistentVolume will be in one of the following phases once it has deployed you can check the status using kubectl get pv command

Available a free resource that is not yet bound to a claim.
Bound the volume is bound to a claim.
Released the claim has been deleted, but the associated storage resource is not yet reclaimed by the cluster.
Failed the volume has failed its (automated) reclamation.

PersistentVolumeClaims (PVC)

Now we have learned about pv next let’s look into an example of PVC manifest in Kubernetes
and different components in it.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 8Gi
  selector:
    matchLabels:
      release: "stable"
    matchExpressions:
      - {key: environment, operator: In, values: [dev]}

PersistentVolumeClaims (PVC) components

A PersistentVolumeClaim (PVC) is like a request form for storage in a Kubernetes cluster. Here’s a breakdown of key concepts:

Access Modes:
- Describes how the storage can be accessed. For example, ReadWriteOnce means it can be accessed by a single node.
Volume Modes:
- Determines how the storage is used — either as a filesystem or a block device.
Resources:
- Specifies the amount of storage needed. Similar to ordering a specific quantity of a resource for your application.
Selector:
- Acts like a filter for available storage. Only volumes with matching labels can be used by the PVC.
Class:
- Specifies the type or class of storage. A PVC can request a specific class of storage, and only PVs of that class can be used.
  If not specified, PVCs without a requested class may use a default class set by the administrator. It helps in the dynamic provisioning of disks in clouds
- New PVCs without a specified class wait for a default to be available. When it arrives, existing PVCs may get updated to match the new default.

Binding in Kubernetes Storage.

When a user creates a PersistentVolumeClaim (PVC) with specific storage requirements, a control loop in the control plane actively seeks a matching PersistentVolume (PV) and binds them together.
In the case of dynamic provisioning, the loop automatically binds a newly provisioned PV to the PVC. Even if the user requests less storage, they receive at least what was asked for. Once bound, the PVC-to-PV binding is exclusive, maintaining a one-to-one mapping using a bi-directional ClaimRef.
Unmatched claims persist until a suitable volume becomes available, at which point they bind. This process ensures efficient and exclusive mapping of storage resources in the Kubernetes cluster.

Claims in pods

Pods access storage by using the claim as a volume. Persistent volume Claims must exist in the same namespace as the Pod using the claim.
The cluster finds the claim in the Pod’s namespace and uses it to get the PersistentVolume backing the claim. The volume is then mounted to the host and into the Pod.

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
    - name: myfrontend
      image: nginx
      volumeMounts:
      - mountPath: "/var/www/html"
        name: mypd
  volumes:
    - name: mypd
      persistentVolumeClaim:
        claimName: myclaim

This example above uses the PVC my-claim and maps the container path /var/www/html to the volume. The PVC then bounds to the respective PV and binds them together.

But the problem is that we need to create persistent volumes and volume claims each time the pods request a volume it can be a difficult task because there might be 100 pods running.
in the case of cloud providers like AWS EBS, the administrator needs to provision 100 EBS volumes based on the size request from the pod
So what if there is a way to dynamically create Volumes on demand?
For that, you can use storage classes in Kubernetes.
Will see an example of how to dynamically provision storage using CSI provisioners in Kubernetes.

Container Storage Interface (CSI) in Kubernetes

Before CSI, adding new storage support to Kubernetes was tough. Storage plugins were part of Kubernetes core, making it hard for vendors to align with releases and causing reliability issues. CSI standardizes how storage systems interact with container orchestrators like Kubernetes, making the volume layer extensible.

How to use a CSI Volume?

Dynamic Provisioning:

You can enable automatic creation/deletion of volumes for CSI Storage plugins that support dynamic provisioning by creating a StorageClass pointing to the CSI plugin.

The following StorageClass, for example, enables dynamic creation of “fast-storage” volumes by a CSI volume plugin called “csi-driver.example.com” like for aws ebs it is ebs.csi.aws.com so need to change the provisioner according to your use case.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-storage
provisioner: csi-driver.example.com
parameters:
  type: pd-ssd

Dynamic provisioning is triggered by the creation of an PersistentVolumeClaim object.

The following, for example, triggers dynamic provisioning using the StorageClass above.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-request-for-storage
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: fast-storage

Here is the list of available CSI- drivers in Kubernetes.
https://kubernetes-csi.github.io/docs/drivers.html

Attaching and mounting CSI volumes in a pod

You can reference a PersistentVolumeClaim that is bound to a CSI volume in any pod or pod template to create that volume dynamically when the pod is deployed.

kind: Pod
apiVersion: v1
metadata:
  name: my-pod
spec:
  containers:
    - name: my-frontend
      image: nginx
      volumeMounts:
      - mountPath: "/var/www/html"
        name: my-csi-volume
  volumes:
    - name: my-csi-volume
      persistentVolumeClaim:
        claimName: my-request-for-storage

The StorageClass defines the provisioning details, the PVC requests the storage, and the Pod uses the dynamically provisioned volume by referencing the PVC to map the container path /var/www/html using volumeMounts.

When the pod referencing a CSI volume is scheduled, Kubernetes will trigger the appropriate operations against the external CSI plugin (ControllerPublishVolume, NodeStageVolume, NodePublishVolume, etc.) to ensure the specified volume is attached, mounted, and ready to use by the containers in the pod.

Summary:
In this article we have learned what is Persistent volumes and different kinds of Persistent volumes in Kubernetes also we learned how to claim these volumes using Persistent volume claims and lastly we have learned how to dynamically provison these volumes using storage classes.

One comment

Alberto D. Harris
December 11, 2023 / 7:24 Reply
Wonderful site you have here but I was wondering if you knew of any user discussion forums that cover the same topics discussed in this article? I’d really love to be a part of group where I can get advice from other knowledgeable individuals that share the same interest. If you have any recommendations, please let me know. Thanks a lot!