Aggregating Metrics with Grafana and InfluxDB

Eoin McCartan

Introduction

Hello, I’m Eoin, a Placement Software Engineer at Hamilton Robson. Since joining this summer and onboarding into the Belfast team, part of my varied role has involved actively developing a partially managed cloud solution. This solution integrates AWS Managed Grafana & Influx DB with multiple other AWS services to visualize real-time data from IoT devices.
If you’re an engineer working with Grafana & InfluxDB and attempting to aggregate data at a large scale, this blog will help you by showing how I managed to find a solution that is quick, easy, and simple to implement.
As I was working on building a Grafana Dashboard to visualise the timeseries data, I ran into multiple issues when it came to displaying the aggregated time windows. In this technical blog I am going to walk through the errors I came across, and ultimately how I solved them!

Understanding Why Data Aggregation is Important

Why is data agreggation important? It’s quite straightforward: displaying data at large scale and across a long time-range can become difficult to pinpoint significant spikes or drops.

Example of Non-Aggregated Data

In this example below, I am displaying 7 days’ worth of data, as you can see there are an absurd amount of datapoints and they all blur into one line.

Example of Aggregated Data

My goal is that if a user wishes to view 7 days’ worth of data, I can aggregate the values by presenting 50 data points on the panel, which takes the average of all data points queried and plots them across 50 points, making patterns easier to read and analyse.

Common Data Aggregation Challenges

We are using the Flux v2.0 BETA language for writing our queries to InfluxDB.

1. Query Type Errors

Trying to aggregate data within a query proves to be more difficult than you would expect. InfluxDB does not support conditional or arithmetic operations on values without first converting them to a valid type. For example, attempting to calculate the difference between two dates will result in type errors.

2. Query Performance Issues

As queries become more complex, their performance can slow down significantly or, in some cases, return no data at all. When dealing with data reported at intervals of 5 minutes or less, viewing data from 6 months, 1 year or even 2 years ago may become difficult.

3. Limited Documentation

Documentation regarding the aggregateWindow function was extremely limited, there is no mention of how to perform conditional statements or arithmetic operations on the values.
It is only when you encounter errors, and research how to convert values to different types using Flux, that you will stumble up Flux Standard Lib.

Solution

InfluxDB Query

The simplest approach is to perform aggregation directly within the query, dynamically adjusting with the time range scope.
It’s also important to note that this query is completely dynamic, as you change the date range selector in Grafana the panels will automatically aggregate upon querying the new data.
from(bucket: "sensorData")

|> range(start: v.timeRangeStart, stop: v.timeRangeStop)

|> filter(fn: (r) => r["_measurement"] == "exampleSensorData" and (r["_field"] == "temperature" or r["_field"] == "humidity"))

|> aggregateWindow(every: duration(v:(uint(v: v.timeRangeStop) - uint(v: v.timeRangeStart))/uint(v: 50)), fn:last)

|> yield(name: "mean")

You could also implement conditional statements, example:

|> aggregateWindow(every: if int(v: v.timeRangeStop) - int(v: v.timeRangeStart) < 60 * 60 * 1000000000 then 1m else 10m, fn: last, createEmpty: false)

 

Query Breakdown

|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  • Filters the data to only include the records within the specific time range.
  • timeRangeStart and timeRangeStop are dynamic variables provided by Grafana itself.
|> aggregateWindow(every: duration(v:(uint(v: v.timeRangeStop) - uint(v: v.timeRangeStart))/uint(v: 50)), fn:last)
  • Aggregates the data into time windows, every: determines the size of each window based on the duration from the time range.
  • It is then divided by 50 as we wanted to scale down the points more, this can be modified to your liking.
  • fn:last is used to ensure we are selecting the last value within each window.
|> yield(name: "mean")
  • Outputs the result of the query with the name mean, this ensures the aggregated data can be called in the Grafana panel.

Preview Example

If you found this post useful, make sure you follow us on LinkedIn and let us know what your experience with Grafana is!

LETS TALK.

Want to find out how the subject of this blog could help your business? 

Our blended team of experts go over and above with our services to our customers, no matter what the challenge. Get in touch to find out how we can work together.