DevOps
Azure Log Analytics Cleanup: Reduce Retention Waste
Azure Log Analytics workspaces often start as a shared place to send diagnostic settings, Kubernetes logs, Application Insights data, firewall logs, and platform events. The waste appears later, when every source keeps the same retention period even though each one has a different investigation value. A workspace can look harmless in the portal while a small number of verbose tables quietly dominate storage and query cost.
Azure Log Analytics cleanup is the work of setting retention by evidence instead of habit. The goal is not to delete logs aggressively. The goal is to decide which tables need hot query access, which logs should move to a cheaper retention path, which producers should reduce noise, and which streams should stop being collected.
This guide is for platform, SRE, security, and FinOps teams reviewing Azure Monitor costs. By the end, you should have a workspace retention matrix, a list of noisy tables, owner-approved exceptions, and a creation rule that prevents every new diagnostic setting from inheriting an expensive default.
Start With Workspaces, Not Log Lines
Do not begin by arguing over a single retention number. Start by listing workspaces and the reason each workspace exists. A production security workspace, a dev cluster workspace, and a temporary migration workspace should not have the same cleanup path.
Use a read-only Azure CLI inventory to find obvious review candidates:
az monitor log-analytics workspace list \
--query "[].{name:name,resourceGroup:resourceGroup,retentionInDays:retentionInDays,sku:sku.name,location:location,tags:tags}" \
--output table
This output shows workspace-level retention and ownership clues. It does not prove that old logs are unused, and it does not show table-level ingest. Treat it as the first routing list for the review.
Build a workspace inventory with these fields:
| Field | Azure-specific evidence to capture | Why it changes the cleanup decision |
|---|---|---|
| Workspace purpose | Security operations, app diagnostics, AKS logs, migration, sandbox, or shared platform telemetry | Determines whether old logs are compliance evidence, debugging context, or disposable noise |
| Current retention | Workspace retention plus any table-level retention exceptions | Separates default policy from deliberate exceptions |
| Top data types | Usage by table, solution, or diagnostic source | Shows whether one source creates most of the bill |
| Connected resources | Diagnostic settings, Application Insights links, Defender, Sentinel, AKS, firewalls, and subscriptions | Prevents shortening retention for logs another team depends on |
| Owner and reviewer | Team tag, resource group owner, workspace access group, or runbook | Gives the cleanup decision a person who can accept risk |
The first cleanup win is often not deletion. It is discovering that non-production resources send verbose diagnostics to a production workspace, or that a retired service still streams logs because its diagnostic setting outlived the app.
Find The Tables That Drive Retention Cost
Workspace retention is too blunt to review alone. One table may contain high-value security events, while another table contains chatty debug logs from a test environment. The cleanup decision should be made at the table or source level whenever the workspace mixes workloads.
In the Log Analytics query experience, use the workspace’s usage data to rank high-volume tables:
Usage
| where TimeGenerated > ago(30d)
| summarize TotalQuantity = sum(Quantity) by DataType, Solution
| order by TotalQuantity desc
This query helps identify the largest data types over the last 30 days. It is not a billing invoice and it does not prove whether a table is valuable; it points the review toward the streams that deserve owner attention first.
Look for these patterns before changing retention:
| Pattern | Evidence check | Good cleanup move |
|---|---|---|
| Debug logs from non-production apps | App environment tags, deployment slots, diagnostic categories, and recent incident references | Shorten hot retention and reduce log level at the producer |
| AKS container logs with high churn | Namespace, pod, and cluster ownership; restart loops; verbose sidecars | Fix noisy workloads before moving retention around |
| Firewall or gateway logs | Security use cases, allowlist investigations, incident timelines, and export requirements | Keep approved security lookback; archive only after the owner signs off |
| Duplicate diagnostic sinks | Same resource streaming to multiple workspaces, Event Hubs, and storage accounts | Remove duplicate collection after confirming consumers |
| Temporary migration logs | Migration ticket, cutover date, and post-cutover support window | Set a specific expiry date and review owner |
The most useful review is table-specific: “ContainerLogV2 from dev AKS namespaces needs seven hot days and no archive” is better than “reduce Log Analytics retention.”
Separate Retention, Ingestion, And Query Waste
Teams sometimes treat Log Analytics cleanup as a storage-only task. That misses two other sources of waste: collecting logs nobody needs and running broad queries over long windows. A cleaner workspace usually needs changes in all three places.
Retention cleanup asks how long logs should remain queryable. Ingestion cleanup asks whether the logs should be collected at all. Query cleanup asks whether dashboards, alerts, and investigations scan more data than they need.
Use this decision table:
| Symptom | What to inspect | Better first action |
|---|---|---|
| Old data rarely queried | Saved queries, dashboards, workbooks, alert rules, and incident notes | Shorten hot retention or archive after the required lookback |
| New data arrives too fast | Diagnostic categories, app log level, sampling, Kubernetes namespaces, and noisy fields | Reduce collection or sampling before retention changes |
| Queries are expensive or slow | Time filters, table filters, joins, and workbook defaults | Narrow query windows and table selection |
| Workspace has unclear ownership | Tags, RBAC groups, resource group owner, linked services, and cost center | Assign an owner before changing retention |
Do not rush security or audit logs just because they are large. Security logs can be quiet until an investigation needs them. Do not rush customer support logs when support promises a longer lookback period than engineering remembers. Those commitments should drive retention.
Choose A Safer Retention Change
Make the first change reversible and observable. For a table that appears over-retained, a staged path is usually safer than a direct purge:
- Name the owner and the reason the table is collected.
- Confirm the required lookback for incidents, customer support, audit, compliance, and product debugging.
- Check recent queries, workbooks, alerts, and exports that read the table.
- Reduce noisy producers where possible.
- Shorten hot retention for the least risky table or environment first.
- Watch for failed investigations, broken workbooks, alert gaps, and owner complaints.
- Document the final retention rule and exception date.
For dev, preview, and sandbox workspaces, the answer may be a short retention period plus better local debugging. For production security streams, the answer may be keeping longer retention but moving some data out of hot query paths. For compliance logs, the cleanup may be proving that the current retention is justified and should be budgeted explicitly.
Prevention: Make Retention Part Of Onboarding
The repeat waste usually starts when a new resource enables diagnostic settings without a retention decision. Fix the creation path:
- Require each diagnostic setting to name a log owner, purpose, environment, and review date.
- Keep a short approved-retention menu, such as dev diagnostics, production app logs, security events, and compliance exceptions.
- Add a monthly review of top workspace tables by volume and unresolved owner.
- Ask pull requests that add new logging categories to explain the investigation question those logs answer.
- Give temporary migration and incident workspaces an expiry date at creation time.
This is more effective than a yearly purge because it changes how logs enter Azure Monitor. New logs arrive with a reason to exist, and exceptions have an owner before they become permanent.
FAQ
What is the safest first Azure Log Analytics cleanup?
Rank workspaces and tables by volume, then repair ownership and retention intent before deleting anything. The safest first technical change is often reducing noisy non-production logging or shortening retention for a clearly temporary stream.
How long should Azure Log Analytics retention be?
Use the period your team must support for incidents, audits, customer support, and compliance. Different tables can justify different periods. A global default is easy to manage but often too expensive or too risky.
Should old logs be archived instead of deleted?
Archive when the logs have real investigation, compliance, or customer support value but do not need fast query access. Do not archive data just to avoid deciding; archive exceptions should still have owners and review dates.
What should not be rushed?
Slow down on security events, access logs, payment or customer-impact timelines, legal-retention records, and anything used by Sentinel, Defender, alert rules, workbooks, or incident response. These logs can look unused until the day they matter.
Summary
Azure Log Analytics cleanup should produce a retention matrix, not a blanket deletion. Inventory workspaces, rank high-volume tables, separate retention from ingestion and query waste, and make table-specific decisions with owners. The durable fix is to make retention a required part of every new diagnostic setting so expensive defaults do not become permanent.