Kubernetes
Kubernetes PVC Cleanup: Find Persistent Volumes Nobody Uses
Kubernetes PVC cleanup should start with the claim, the mounted workload, and the storage class, not with a list of expensive disks. A persistent volume claim can look abandoned because the pod is gone, but the data may still be needed for rollback, audit, migration, or a batch job that has not run yet.
The useful output is a PVC retirement decision: who owns the data, what mounted it, when it was last written, whether a backup or export exists, what reversible action comes first, and what rule prevents the same orphaned storage from returning. PVC cleanup is cost cleanup, but it is also data-risk cleanup.
Key takeaways
- Treat every PVC as data until an owner proves it is disposable.
- Check mounts, StatefulSets, Jobs, storage class behavior, reclaim policy, snapshots, and restore expectations before deleting anything.
- Prefer detach, snapshot, archive, or expiry review before final removal.
- Record the evidence in the same place the workload is managed, especially if GitOps or Helm can recreate the claim.
- Prevent future waste by changing how stateful workloads request storage.
Separate Unmounted From Disposable
An unmounted PVC is only a candidate. It is not proof that the data is safe to remove. A PVC can be temporarily unmounted during a migration, a failed rollout, a restore exercise, or a StatefulSet rename. Cleanup starts by separating “not mounted now” from “no longer needed.”
Use a review table that speaks in storage terms, not generic asset terms.
| Field | Why it matters |
|---|---|
| Claim | Namespace, PVC name, requested size, access mode, storage class, and bound PV |
| Mount path | Pod, StatefulSet, Deployment, Job, or CronJob that mounted the claim |
| Data role | Cache, queue, upload staging, database files, search index, artifact store, or unknown |
| Last write evidence | Application metric, filesystem timestamp, database checkpoint, backup log, or owner confirmation |
| Reclaim behavior | Whether deleting the claim retains or deletes the underlying volume |
| First action | Keep, resize, snapshot, detach, archive, mark expiring, or remove after approval |
This table keeps the review honest. A cache volume with a rebuild path is different from a database volume with the only copy of production-like test data.
Evidence That A PVC Has No Current Consumer
The strongest PVC evidence connects Kubernetes objects to application reality. Kubernetes can show a bound claim, pods that reference it, events, storage class, and metadata. It does not know whether the data still has legal, debugging, or migration value.
| Check | What to look for | Cleanup signal |
|---|---|---|
| Pod mounts | volumes.persistentVolumeClaim.claimName, mounted paths, and recent pod restarts | No active or expected pod references the claim |
| Controller source | StatefulSet volume claim templates, Helm values, GitOps manifests, and release history | The claim is not recreated by current deployment config |
| Bound volume | PV reclaim policy, storage class, capacity, zone, and CSI driver | Final action and recovery path are understood |
| Data protection | Snapshots, backup jobs, restore tests, retention policy, and export location | Data is retained elsewhere or explicitly disposable |
| Workload schedule | CronJobs, paused Jobs, migration plans, and release calendars | Quiet period covers the real use pattern |
| Owner review | Service owner, data owner, platform owner, or product owner signoff | Somebody accountable accepts the risk |
Avoid single-metric decisions. Low I/O can mean “unused”, but it can also mean “cold data”, “ready for restore”, or “only written during incidents.” A missing pod can mean “orphaned”, but it can also mean “StatefulSet temporarily removed during a failed deploy.”
When evidence conflicts, choose an intermediate state: snapshot, mark for expiry, attach an owner, or create a ticket with a short review window.
Read-Only PVC Scan
Use kubectl read-only commands to build the candidate list. The current kubectl docs support get across all namespaces, wide output, label and field selectors, JSON output, and describe for detailed inspection.
kubectl get pvc --all-namespaces
kubectl get pv -o wide
kubectl get pods --all-namespaces -o json
kubectl describe pvc $PVC_NAME -n $NAMESPACE
kubectl get events -n $NAMESPACE --field-selector involvedObject.name=$PVC_NAME
The JSON pod output is useful for finding claimName references in workload volumes. It does not prove the underlying filesystem is empty, backed up, or safe to discard. Pair it with application and backup evidence.
Decide Between Resize, Archive, And Removal
PVC cleanup does not always mean deletion. The correct action depends on data role, confidence, and reversibility.
| Situation | Better first move | Why |
|---|---|---|
| Oversized active PVC | Resize when the storage class and filesystem support the path, or create a smaller replacement during migration | The workload still needs state, just not that much |
| Unmounted cache volume | Mark expiring, confirm rebuild path, then remove after owner approval | Cache loss is usually recoverable but still operationally noisy |
| Old migration volume | Snapshot or archive metadata, then delete after the migration owner signs off | Migration data can be useful for rollback or audit |
| Database-like files | Verify backup, restore test, retention requirement, and application owner approval | Data loss risk dominates storage savings |
| Unknown owner | Label, ticket, and quarantine with a review date | Lack of ownership is not deletion evidence |
Track the cleanup candidate with a simple priority score:
| Score | Good sign | Bad sign |
|---|---|---|
| Impact | Meaningful spend, risk, toil, noise, or confusion disappears | The item is cheap and low-risk but politically distracting |
| Confidence | Owner, purpose, and dependency path are understood | The team is guessing from age or name |
| Reversibility | Restore, recreate, re-enable, or rollback path exists | Deletion would be the first real test |
| Prevention | A rule can stop recurrence | The same pattern will return next month |
Start with high-impact, high-confidence, reversible candidates. Defer confusing items only if they get an owner and a date; otherwise “defer” becomes another word for keeping waste permanently.
PVC Cases That Need Patience
Some cleanup candidates are supposed to look quiet. Do not rush these cases:
- StatefulSet PVCs after a failed rollout, rename, or chart migration.
- Claims used by CronJobs that run monthly, quarterly, or after delayed upstream delivery.
- Volumes containing database files, uploaded customer assets, search indexes, or queues.
- Claims with
Retainreclaim behavior where deleting the PVC leaves a PV that still needs a plan. - Claims in regulated, audit, incident, or security-analysis namespaces.
For these cases, use a longer observation window, explicit owner approval, and a staged reduction. The point is not to avoid cleanup; it is to avoid making the first proof of dependency an outage.
Run The PVC Review
Run Kubernetes PVC cleanup as a data retirement review, not an open-ended cluster hygiene project.
- Export PVCs, PVs, storage classes, and pod volume references for one cluster or namespace group.
- Add owner, data role, mount history, reclaim behavior, backup evidence, and risk if wrong.
- Remove false positives such as active StatefulSet claims and restore targets.
- Ask owners to choose keep, resize, snapshot, archive, expire, remove, or investigate.
- Apply the least permanent useful action first and record the watch signal.
- Complete final removal only after the review window covers the workload’s real schedule.
- Save the evidence with the workload manifest, Helm release, GitOps app, or platform ticket.
For broader cleanup planning, use the cleanup library to pair this guide with related notes. Use the main cloud cost checklist to decide whether the cleanup work has enough upside for a focused sprint. For infrastructure cleanup, the main cloud cost optimization checklist is a useful companion.
Prevent Orphaned PVCs At Creation Time
Prevention should change how stateful workloads request storage. Owner labels help, but PVC waste usually returns when teams can create durable storage without declaring its data role, backup plan, and retirement path.
- Require PVC labels for owner, data role, environment, retention class, and backup policy.
- Make temporary environments use storage classes and quotas designed for short-lived data.
- Add Helm or GitOps review checks for large requested sizes and missing expiry metadata.
- Document whether each stateful workload can rebuild, restore, or safely discard its volume.
- Put PVC age, requested size, storage class, and mount status into the platform dashboard.
The recurring review should be short: sort by impact, pick the unclear items, assign owners, and close the loop on anything nobody claims. If the review keeps producing the same class of candidate, fix the creation path instead of celebrating repeated cleanup.
Example Decision Record
Use a compact record so the cleanup can be reviewed later without reconstructing the whole investigation.
| Field | Example entry for this cleanup |
|---|---|
| Candidate | Unused persistent volume claims in Kubernetes clusters |
| Why it looked stale | Bound claim with no current pod mount, old namespace, oversized request, or completed migration |
| Evidence checked | Pod volume references, StatefulSet templates, PV reclaim policy, storage class, backups, snapshots, and owner signoff |
| First reversible move | Snapshot, archive metadata, mark expiring, or detach from a retired workload |
| Watch signal | Restore request, failed batch job, missing uploaded file, application error, or owner complaint |
| Final action | Remove only after backup and retention checks match the data role |
| Prevention rule | Require owner, data role, retention class, backup policy, and expiry metadata for new claims |
This record is intentionally small. If the decision needs a long narrative, the candidate is probably not ready for removal yet. Keep investigating until the owner, evidence, reversible move, and prevention rule are clear.
FAQ
How often should teams review Kubernetes PVCs?
Use a window long enough to include batch schedules, traffic peaks, and deployment cycles for the first decision, then set a recurring cadence based on change rate. Fast-moving non-production systems may need monthly review; slower systems can be quarterly if every unclear item has an owner and a review date.
What is the safest first action for an unused PVC?
The safest first action is usually ownership repair plus mount evidence. After that, snapshot or mark the claim for expiry before final deletion, especially when the data role is unclear.
What should not be removed quickly?
Do not rush claims with database-like files, customer uploads, queue state, audit data, restore targets, rare batch workloads, or Retain reclaim behavior that leaves another storage object to manage.
How do you make the decision useful later?
Write the decision as a small operational record: candidate, owner, evidence, chosen action, watch signals, rollback path, final date, and prevention rule. That format helps future engineers, search engines, and AI assistants understand the cleanup without guessing.