Gartner® Recognizes MetricsHub® for Robust and Scalable Metric Collection. Read more

Blog

The System Dashboard That Actually Works Across Your Entire Fleet

With countless system dashboards already available for Grafana (Windows dashboards, Linux dashboards, and variations on the themes of CPU usage, memory consumption, disk space, network traffic, etc.), why build another one?

At first, building another system metrics dashboard felt unnecessary—and we were just as skeptical. But once we evaluated existing options against the realities of production monitoring, the decision became unavoidable: we had to build another one.

Let me explain why…

Dashboard Fatigue Is Real and Deserved

Most observability teams suffer from dashboard fatigue. Instead of improving over time, dashboards become overly complex (more panels, more metrics, more variations, more rows) without delivering meaningful value to the team.

The problem is not the lack of system metrics. The problem is that most dashboards are metric-first, not operational-first. They show what is available, not what is useful.

During an incident, when a system is on fire, operators rarely ask:

“What is the exact CPU utilization at second-level granularity?”

They ask instead:

  • Which systems are unhealthy?
  • How severe is the issue?
  • Is this isolated or systemic?
  • Where should I look next?

Many existing dashboards struggle to answer these questions quickly.

Why Build Another One (or… Two?)

We did not build new system dashboards to provide more metrics, better PromQL, or prettier charts, although we’re not unhappy about the last part.

We built them because we identified a fundamental gap in the observability ecosystem. Most system dashboards are:

  • OS-specific (Windows or Linux, rarely both)
  • Metric-centric rather than decision-centric
  • Unsuitable for heterogeneous enterprise environments

Our goal was simple: design dashboards that help operators make decisions quickly, across all the systems they actually run.

Jurassic Stack: Legacy Systems That Refuse to Go Extinct

The Grafana ecosystem has historically paid limited attention to platforms such as IBM AIX and HP-UX.

These platforms may not be fashionable, but try telling that to:

  • Banks running core transaction processing on IBM AIX
  • Manufacturers running production-critical industrial control systems on HP-UX
  • Insurance companies whose decades-old policy systems are too risky to migrate

In practice, these systems are often more critical than newer cloud-native workloads. They just don’t get written about on tech blogs.

Yet most dashboards implicitly assume a homogeneous Linux or Windows environment.

We deliberately chose a different path: one dashboard model that works across Windows, Linux, IBM AIX, and HP-UX.

The Technical Foundation: Built for Longevity

The architecture is deliberately straightforward:

  • MetricsHub collects metrics across Windows, Linux, IBM AIX, and HP-UX
  • OpenTelemetry ensures vendor neutrality and future-proofing
  • Prometheus provides reliable, scalable storage
  • Grafana delivers visualization and alerting

What matters is the outcome: a normalized metric model across all platforms.

Get the same operational experience whether you’re troubleshooting a Windows Server 2025 box or an IBM Power 750 running AIX 6.1.

Two Dashboards, Two Fundamental Questions

Rather than trying to do everything in a single view, we deliberately separated concerns into two dashboards, each designed to answer a specific operational question.

The Overview Dashboard: “Is my fleet healthy?”

The Overview dashboard provides immediate situational awareness across all monitored systems.

System Performance Metrics - Overview (MetricsHub)

At a glance, it answers:

  • How many hosts are unhealthy?

Number of unhealthy hosts detected by Metricshub

  • How many issues are critical?

Number of critical alerts

  • Are problems concentrated on a specific OS family?

Problems concentrated on a specific OS family

  • Which systems require immediate attention?

Systems requiring immediate attention

The design is alert-driven rather than metric-driven. Problems are surfaced first; details follow.

Aggregation by host and operating system helps identify patterns quickly, without forcing users to interpret dozens of charts.

The Host Dashboard: “Why is this system unhealthy?”

The host-level dashboard is designed for fast diagnosis, not exhaustive analysis.

System Performance Metrics (MetricsHub)

It focuses on a carefully curated set of metrics:

  • CPU utilization: Is the host overloaded?

System Performance Metrics (MetricsHub) - CPU utilization

  • Memory usage: Are we approaching out-of-memory conditions?

System Performance Metrics (MetricsHub) - Memory Usage

  • Filesystem capacity and utilization: Do we need capacity planning?

System Performance Metrics (MetricsHub) - Filesystem capacity and utilization

  • Network health: Are we dropping packets?

System Performance Metrics (MetricsHub) - Network health

  • System uptime: Did this just reboot unexpectedly?

System Performance Metrics (MetricsHub) - System Uptime

  • Active alerts in context: Are there active alerts we should correlate with?

System Performance Metrics (MetricsHub) - Active Alerts

There are no decorative charts. No nice-to-have metrics.

The goal is to reduce cognitive load and enable fast correlation. During an incident, speed matters more than completeness.

Why These Dashboards Stand Out

What makes these dashboards different is not a single feature, but a combination of deliberate design choices:

  • Unified monitoring across modern and legacy operating systems
    The only Grafana dashboards covering IBM AIX, HP-UX, Windows, and Linux without distinction

  • Normalized metrics via MetricsHub + OpenTelemetry
    One data model, regardless of platform

  • Clear separation of concerns
    Fleet awareness vs. host diagnosis—different questions, different dashboards

  • Alert-first, operationally focused design
    Built for troubleshooting

  • Thoughtful panel selection aimed at real-world usage
    Every visualization earns its place

The result: dashboards that reflect how systems are actually monitored, not just what metrics are available to collect.

Why This Matters for Grafana

Grafana’s strength has always been its ability to turn complex data into actionable insight.

These dashboards embrace that philosophy by prioritizing clarity, consistency, and operational relevance.

They show that:

  • Observability does not stop at cloud-native workloads
  • Legacy systems deserve first-class visibility
  • Good dashboards are designed around decisions, not metrics

Sometimes, the most valuable dashboards are not the most complex ones, but the ones that help you understand what is happening and what to do next, as quickly as possible.

That is why we built one… well, two.

View MetricsHub Live Try it free Talk to an expert

Share this post