Simple Log Aggregation with Azure Metrics Aggregator
I've been pushing to move our monitoring focus from infrastructure related alerts to more of a Service Level Objective minded approach. Our projects run on Azure, and we use the Azure native tooling to monitor and send alerts when thresholds are reached.
Azure Monitor provides a few types of alerts we can use, and of these the Log Search Alert is the most powerful. My use case however is a bit hard to implement with just the application's own data, as there is a limit of how much data can be queried for the alert. If for example I wanted to make a generic alert of If we have used more than x% of our error budget within the last 10 minutes, trigger the alert
, I couldn't really calculate the error budget in real time for the whole 28 day period we usually use for calculating it. I believe the limit is something like 2 days of data.
So I figured a simple solution is just to create a tool (AMAG) that can be used to run arbitary KQL queries (with certain limitations) against a Log Analytics workspace (and even a resource in case of metrics). The resulting aggregation will then be saved as a metric value or a log value in a custom table inside Log Analytics.
This allows me to schedule some aggregation jobs wherever I want, like an Azure Pipeline which save aggregated numbers each run. Then my alerts just compare the current status against those aggregates, making it easy to check how much error budget has been spent so far in the alerting logic.
There are some prerequisites setting this up, but the repo will get you started.
Check it out below! If you find any issues, create a issue in the GH repo, or even better, create a PR.