Blog
The System Dashboard That Actually Works Across Your Entire Fleet
With countless system dashboards already available for Grafana (Windows dashboards, Linux dashboards, and variations on the themes of CPU usage, memory consumption, disk space, network traffic, etc.), why build another one?
At first, building another system metrics dashboard felt unnecessary—and we were just as skeptical. But once we evaluated existing options against the realities of production monitoring, the decision became unavoidable: we had to build another one.
Let me explain why…
Dashboard Fatigue Is Real and Deserved
Most observability teams suffer from dashboard fatigue. Instead of improving over time, dashboards become overly complex (more panels, more metrics, more variations, more rows) without delivering meaningful value to the team.
The problem is not the lack of system metrics. The problem is that most dashboards are metric-first, not operational-first. They show what is available, not what is useful.
During an incident, when a system is on fire, operators rarely ask:
“What is the exact CPU utilization at second-level granularity?”
They ask instead:
- Which systems are unhealthy?
- How severe is the issue?
- Is this isolated or systemic?
- Where should I look next?
Many existing dashboards struggle to answer these questions quickly.
Why Build Another One (or… Two?)
We did not build new system dashboards to provide more metrics, better PromQL, or prettier charts, although we’re not unhappy about the last part.
We built them because we identified a fundamental gap in the observability ecosystem. Most system dashboards are:
- OS-specific (Windows or Linux, rarely both)
- Metric-centric rather than decision-centric
- Unsuitable for heterogeneous enterprise environments
Our goal was simple: design dashboards that help operators make decisions quickly, across all the systems they actually run.
Jurassic Stack: Legacy Systems That Refuse to Go Extinct
The Grafana ecosystem has historically paid limited attention to platforms such as IBM AIX and HP-UX.
These platforms may not be fashionable, but try telling that to:
- Banks running core transaction processing on IBM AIX
- Manufacturers running production-critical industrial control systems on HP-UX
- Insurance companies whose decades-old policy systems are too risky to migrate
In practice, these systems are often more critical than newer cloud-native workloads. They just don’t get written about on tech blogs.
Yet most dashboards implicitly assume a homogeneous Linux or Windows environment.
We deliberately chose a different path: one dashboard model that works across Windows, Linux, IBM AIX, and HP-UX.
The Technical Foundation: Built for Longevity
The architecture is deliberately straightforward:
- MetricsHub collects metrics across Windows, Linux, IBM AIX, and HP-UX
- OpenTelemetry ensures vendor neutrality and future-proofing
- Prometheus provides reliable, scalable storage
- Grafana delivers visualization and alerting
What matters is the outcome: a normalized metric model across all platforms.
Get the same operational experience whether you’re troubleshooting a Windows Server 2025 box or an IBM Power 750 running AIX 6.1.
Two Dashboards, Two Fundamental Questions
Rather than trying to do everything in a single view, we deliberately separated concerns into two dashboards, each designed to answer a specific operational question.
The Overview Dashboard: “Is my fleet healthy?”
The Overview dashboard provides immediate situational awareness across all monitored systems.
At a glance, it answers:
- How many hosts are unhealthy?

- How many issues are critical?

- Are problems concentrated on a specific OS family?

- Which systems require immediate attention?

The design is alert-driven rather than metric-driven. Problems are surfaced first; details follow.
Aggregation by host and operating system helps identify patterns quickly, without forcing users to interpret dozens of charts.
The Host Dashboard: “Why is this system unhealthy?”
The host-level dashboard is designed for fast diagnosis, not exhaustive analysis.
It focuses on a carefully curated set of metrics:
- CPU utilization: Is the host overloaded?
- Memory usage: Are we approaching out-of-memory conditions?
- Filesystem capacity and utilization: Do we need capacity planning?
- Network health: Are we dropping packets?
- System uptime: Did this just reboot unexpectedly?
- Active alerts in context: Are there active alerts we should correlate with?

There are no decorative charts. No nice-to-have metrics.
The goal is to reduce cognitive load and enable fast correlation. During an incident, speed matters more than completeness.
Why These Dashboards Stand Out
What makes these dashboards different is not a single feature, but a combination of deliberate design choices:
-
Unified monitoring across modern and legacy operating systems
The only Grafana dashboards covering IBM AIX, HP-UX, Windows, and Linux without distinction -
Normalized metrics via MetricsHub + OpenTelemetry
One data model, regardless of platform -
Clear separation of concerns
Fleet awareness vs. host diagnosis—different questions, different dashboards -
Alert-first, operationally focused design
Built for troubleshooting -
Thoughtful panel selection aimed at real-world usage
Every visualization earns its place
The result: dashboards that reflect how systems are actually monitored, not just what metrics are available to collect.
Why This Matters for Grafana
Grafana’s strength has always been its ability to turn complex data into actionable insight.
These dashboards embrace that philosophy by prioritizing clarity, consistency, and operational relevance.
They show that:
- Observability does not stop at cloud-native workloads
- Legacy systems deserve first-class visibility
- Good dashboards are designed around decisions, not metrics
Sometimes, the most valuable dashboards are not the most complex ones, but the ones that help you understand what is happening and what to do next, as quickly as possible.
That is why we built one… well, two.
View MetricsHub Live Try it free Talk to an expert






