Back

DevOps

Azure Log Analytics Cleanup: Reduce Retention Waste

Azure Log Analytics workspaces often start as a shared place to send diagnostic settings, Kubernetes logs, Application Insights data, firewall logs, and platform events. The waste appears later, when every source keeps the same retention period even though each one has a different investigation value. A workspace can look harmless in the portal while a small number of verbose tables quietly dominate storage and query cost.

Azure Log Analytics cleanup is the work of setting retention by evidence instead of habit. The goal is not to delete logs aggressively. The goal is to decide which tables need hot query access, which logs should move to a cheaper retention path, which producers should reduce noise, and which streams should stop being collected.

This guide is for platform, SRE, security, and FinOps teams reviewing Azure Monitor costs. By the end, you should have a workspace retention matrix, a list of noisy tables, owner-approved exceptions, and a creation rule that prevents every new diagnostic setting from inheriting an expensive default.

Start With Workspaces, Not Log Lines

Do not begin by arguing over a single retention number. Start by listing workspaces and the reason each workspace exists. A production security workspace, a dev cluster workspace, and a temporary migration workspace should not have the same cleanup path.

Use a read-only Azure CLI inventory to find obvious review candidates:

az monitor log-analytics workspace list \
  --query "[].{name:name,resourceGroup:resourceGroup,retentionInDays:retentionInDays,sku:sku.name,location:location,tags:tags}" \
  --output table

This output shows workspace-level retention and ownership clues. It does not prove that old logs are unused, and it does not show table-level ingest. Treat it as the first routing list for the review.

Build a workspace inventory with these fields:

FieldAzure-specific evidence to captureWhy it changes the cleanup decision
Workspace purposeSecurity operations, app diagnostics, AKS logs, migration, sandbox, or shared platform telemetryDetermines whether old logs are compliance evidence, debugging context, or disposable noise
Current retentionWorkspace retention plus any table-level retention exceptionsSeparates default policy from deliberate exceptions
Top data typesUsage by table, solution, or diagnostic sourceShows whether one source creates most of the bill
Connected resourcesDiagnostic settings, Application Insights links, Defender, Sentinel, AKS, firewalls, and subscriptionsPrevents shortening retention for logs another team depends on
Owner and reviewerTeam tag, resource group owner, workspace access group, or runbookGives the cleanup decision a person who can accept risk

The first cleanup win is often not deletion. It is discovering that non-production resources send verbose diagnostics to a production workspace, or that a retired service still streams logs because its diagnostic setting outlived the app.

Find The Tables That Drive Retention Cost

Workspace retention is too blunt to review alone. One table may contain high-value security events, while another table contains chatty debug logs from a test environment. The cleanup decision should be made at the table or source level whenever the workspace mixes workloads.

In the Log Analytics query experience, use the workspace’s usage data to rank high-volume tables:

Usage
| where TimeGenerated > ago(30d)
| summarize TotalQuantity = sum(Quantity) by DataType, Solution
| order by TotalQuantity desc

This query helps identify the largest data types over the last 30 days. It is not a billing invoice and it does not prove whether a table is valuable; it points the review toward the streams that deserve owner attention first.

Look for these patterns before changing retention:

PatternEvidence checkGood cleanup move
Debug logs from non-production appsApp environment tags, deployment slots, diagnostic categories, and recent incident referencesShorten hot retention and reduce log level at the producer
AKS container logs with high churnNamespace, pod, and cluster ownership; restart loops; verbose sidecarsFix noisy workloads before moving retention around
Firewall or gateway logsSecurity use cases, allowlist investigations, incident timelines, and export requirementsKeep approved security lookback; archive only after the owner signs off
Duplicate diagnostic sinksSame resource streaming to multiple workspaces, Event Hubs, and storage accountsRemove duplicate collection after confirming consumers
Temporary migration logsMigration ticket, cutover date, and post-cutover support windowSet a specific expiry date and review owner

The most useful review is table-specific: “ContainerLogV2 from dev AKS namespaces needs seven hot days and no archive” is better than “reduce Log Analytics retention.”

Separate Retention, Ingestion, And Query Waste

Teams sometimes treat Log Analytics cleanup as a storage-only task. That misses two other sources of waste: collecting logs nobody needs and running broad queries over long windows. A cleaner workspace usually needs changes in all three places.

Retention cleanup asks how long logs should remain queryable. Ingestion cleanup asks whether the logs should be collected at all. Query cleanup asks whether dashboards, alerts, and investigations scan more data than they need.

Use this decision table:

SymptomWhat to inspectBetter first action
Old data rarely queriedSaved queries, dashboards, workbooks, alert rules, and incident notesShorten hot retention or archive after the required lookback
New data arrives too fastDiagnostic categories, app log level, sampling, Kubernetes namespaces, and noisy fieldsReduce collection or sampling before retention changes
Queries are expensive or slowTime filters, table filters, joins, and workbook defaultsNarrow query windows and table selection
Workspace has unclear ownershipTags, RBAC groups, resource group owner, linked services, and cost centerAssign an owner before changing retention

Do not rush security or audit logs just because they are large. Security logs can be quiet until an investigation needs them. Do not rush customer support logs when support promises a longer lookback period than engineering remembers. Those commitments should drive retention.

Choose A Safer Retention Change

Make the first change reversible and observable. For a table that appears over-retained, a staged path is usually safer than a direct purge:

  1. Name the owner and the reason the table is collected.
  2. Confirm the required lookback for incidents, customer support, audit, compliance, and product debugging.
  3. Check recent queries, workbooks, alerts, and exports that read the table.
  4. Reduce noisy producers where possible.
  5. Shorten hot retention for the least risky table or environment first.
  6. Watch for failed investigations, broken workbooks, alert gaps, and owner complaints.
  7. Document the final retention rule and exception date.

For dev, preview, and sandbox workspaces, the answer may be a short retention period plus better local debugging. For production security streams, the answer may be keeping longer retention but moving some data out of hot query paths. For compliance logs, the cleanup may be proving that the current retention is justified and should be budgeted explicitly.

Prevention: Make Retention Part Of Onboarding

The repeat waste usually starts when a new resource enables diagnostic settings without a retention decision. Fix the creation path:

  • Require each diagnostic setting to name a log owner, purpose, environment, and review date.
  • Keep a short approved-retention menu, such as dev diagnostics, production app logs, security events, and compliance exceptions.
  • Add a monthly review of top workspace tables by volume and unresolved owner.
  • Ask pull requests that add new logging categories to explain the investigation question those logs answer.
  • Give temporary migration and incident workspaces an expiry date at creation time.

This is more effective than a yearly purge because it changes how logs enter Azure Monitor. New logs arrive with a reason to exist, and exceptions have an owner before they become permanent.

FAQ

What is the safest first Azure Log Analytics cleanup?

Rank workspaces and tables by volume, then repair ownership and retention intent before deleting anything. The safest first technical change is often reducing noisy non-production logging or shortening retention for a clearly temporary stream.

How long should Azure Log Analytics retention be?

Use the period your team must support for incidents, audits, customer support, and compliance. Different tables can justify different periods. A global default is easy to manage but often too expensive or too risky.

Should old logs be archived instead of deleted?

Archive when the logs have real investigation, compliance, or customer support value but do not need fast query access. Do not archive data just to avoid deciding; archive exceptions should still have owners and review dates.

What should not be rushed?

Slow down on security events, access logs, payment or customer-impact timelines, legal-retention records, and anything used by Sentinel, Defender, alert rules, workbooks, or incident response. These logs can look unused until the day they matter.

Summary

Azure Log Analytics cleanup should produce a retention matrix, not a blanket deletion. Inventory workspaces, rank high-volume tables, separate retention from ingestion and query waste, and make table-specific decisions with owners. The durable fix is to make retention a required part of every new diagnostic setting so expensive defaults do not become permanent.