Azure Log Analytics Cleanup: Reduce Retention Waste

Azure Log Analytics workspaces often start as a shared place to send diagnostic settings, Kubernetes logs, Application Insights data, firewall logs, and platform events. The waste appears later, when every source keeps the same retention period even though each one has a different investigation value. A workspace can look harmless in the portal while a small number of verbose tables quietly dominate storage and query cost.

Azure Log Analytics cleanup is the work of setting retention by evidence instead of habit. The goal is not to delete logs aggressively. The goal is to decide which tables need hot query access, which logs should move to a cheaper retention path, which producers should reduce noise, and which streams should stop being collected.

This guide is for platform, SRE, security, and FinOps teams reviewing Azure Monitor costs. By the end, you should have a workspace retention matrix, a list of noisy tables, owner-approved exceptions, and a creation rule that prevents every new diagnostic setting from inheriting an expensive default.

Start With Workspaces, Not Log Lines

Do not begin by arguing over a single retention number. Start by listing workspaces and the reason each workspace exists. A production security workspace, a dev cluster workspace, and a temporary migration workspace should not have the same cleanup path.

Use a read-only Azure CLI inventory to find obvious review candidates:

az monitor log-analytics workspace list \
  --query "[].{name:name,resourceGroup:resourceGroup,retentionInDays:retentionInDays,sku:sku.name,location:location,tags:tags}" \
  --output table

This output shows workspace-level retention and ownership clues. It does not prove that old logs are unused, and it does not show table-level ingest. Treat it as the first routing list for the review.

Build a workspace inventory with these fields:

Field	Azure-specific evidence to capture	Why it changes the cleanup decision
Workspace purpose	Security operations, app diagnostics, AKS logs, migration, sandbox, or shared platform telemetry	Determines whether old logs are compliance evidence, debugging context, or disposable noise
Current retention	Workspace retention plus any table-level retention exceptions	Separates default policy from deliberate exceptions
Top data types	Usage by table, solution, or diagnostic source	Shows whether one source creates most of the bill
Connected resources	Diagnostic settings, Application Insights links, Defender, Sentinel, AKS, firewalls, and subscriptions	Prevents shortening retention for logs another team depends on
Owner and reviewer	Team tag, resource group owner, workspace access group, or runbook	Gives the cleanup decision a person who can accept risk

The first cleanup win is often not deletion. It is discovering that non-production resources send verbose diagnostics to a production workspace, or that a retired service still streams logs because its diagnostic setting outlived the app.

Find The Tables That Drive Retention Cost

Workspace retention is too blunt to review alone. One table may contain high-value security events, while another table contains chatty debug logs from a test environment. The cleanup decision should be made at the table or source level whenever the workspace mixes workloads.

In the Log Analytics query experience, use the workspace’s usage data to rank high-volume tables:

Usage
| where TimeGenerated > ago(30d)
| summarize TotalQuantity = sum(Quantity) by DataType, Solution
| order by TotalQuantity desc

This query helps identify the largest data types over the last 30 days. It is not a billing invoice and it does not prove whether a table is valuable; it points the review toward the streams that deserve owner attention first.

Look for these patterns before changing retention:

Pattern	Evidence check	Good cleanup move
Debug logs from non-production apps	App environment tags, deployment slots, diagnostic categories, and recent incident references	Shorten hot retention and reduce log level at the producer
AKS container logs with high churn	Namespace, pod, and cluster ownership; restart loops; verbose sidecars	Fix noisy workloads before moving retention around
Firewall or gateway logs	Security use cases, allowlist investigations, incident timelines, and export requirements	Keep approved security lookback; archive only after the owner signs off
Duplicate diagnostic sinks	Same resource streaming to multiple workspaces, Event Hubs, and storage accounts	Remove duplicate collection after confirming consumers
Temporary migration logs	Migration ticket, cutover date, and post-cutover support window	Set a specific expiry date and review owner

The most useful review is table-specific: “ContainerLogV2 from dev AKS namespaces needs seven hot days and no archive” is better than “reduce Log Analytics retention.”

Separate Retention, Ingestion, And Query Waste

Teams sometimes treat Log Analytics cleanup as a storage-only task. That misses two other sources of waste: collecting logs nobody needs and running broad queries over long windows. A cleaner workspace usually needs changes in all three places.

Retention cleanup asks how long logs should remain queryable. Ingestion cleanup asks whether the logs should be collected at all. Query cleanup asks whether dashboards, alerts, and investigations scan more data than they need.

Use this decision table:

Symptom	What to inspect	Better first action
Old data rarely queried	Saved queries, dashboards, workbooks, alert rules, and incident notes	Shorten hot retention or archive after the required lookback
New data arrives too fast	Diagnostic categories, app log level, sampling, Kubernetes namespaces, and noisy fields	Reduce collection or sampling before retention changes
Queries are expensive or slow	Time filters, table filters, joins, and workbook defaults	Narrow query windows and table selection
Workspace has unclear ownership	Tags, RBAC groups, resource group owner, linked services, and cost center	Assign an owner before changing retention

Do not rush security or audit logs just because they are large. Security logs can be quiet until an investigation needs them. Do not rush customer support logs when support promises a longer lookback period than engineering remembers. Those commitments should drive retention.

Choose A Safer Retention Change

Make the first change reversible and observable. For a table that appears over-retained, a staged path is usually safer than a direct purge:

Name the owner and the reason the table is collected.
Confirm the required lookback for incidents, customer support, audit, compliance, and product debugging.
Check recent queries, workbooks, alerts, and exports that read the table.
Reduce noisy producers where possible.
Shorten hot retention for the least risky table or environment first.
Watch for failed investigations, broken workbooks, alert gaps, and owner complaints.
Document the final retention rule and exception date.

For dev, preview, and sandbox workspaces, the answer may be a short retention period plus better local debugging. For production security streams, the answer may be keeping longer retention but moving some data out of hot query paths. For compliance logs, the cleanup may be proving that the current retention is justified and should be budgeted explicitly.

Prevention: Make Retention Part Of Onboarding

The repeat waste usually starts when a new resource enables diagnostic settings without a retention decision. Fix the creation path:

Require each diagnostic setting to name a log owner, purpose, environment, and review date.
Keep a short approved-retention menu, such as dev diagnostics, production app logs, security events, and compliance exceptions.
Add a monthly review of top workspace tables by volume and unresolved owner.
Ask pull requests that add new logging categories to explain the investigation question those logs answer.
Give temporary migration and incident workspaces an expiry date at creation time.

This is more effective than a yearly purge because it changes how logs enter Azure Monitor. New logs arrive with a reason to exist, and exceptions have an owner before they become permanent.

FAQ

What is the safest first Azure Log Analytics cleanup?

Rank workspaces and tables by volume, then repair ownership and retention intent before deleting anything. The safest first technical change is often reducing noisy non-production logging or shortening retention for a clearly temporary stream.

How long should Azure Log Analytics retention be?

Use the period your team must support for incidents, audits, customer support, and compliance. Different tables can justify different periods. A global default is easy to manage but often too expensive or too risky.

Should old logs be archived instead of deleted?

Archive when the logs have real investigation, compliance, or customer support value but do not need fast query access. Do not archive data just to avoid deciding; archive exceptions should still have owners and review dates.

What should not be rushed?

Slow down on security events, access logs, payment or customer-impact timelines, legal-retention records, and anything used by Sentinel, Defender, alert rules, workbooks, or incident response. These logs can look unused until the day they matter.

Summary

Azure Log Analytics cleanup should produce a retention matrix, not a blanket deletion. Inventory workspaces, rank high-volume tables, separate retention from ingestion and query waste, and make table-specific decisions with owners. The durable fix is to make retention a required part of every new diagnostic setting so expensive defaults do not become permanent.