MetricsHub
MetricsHub Enterprise 1.1.00
-
Home
- Troubleshooting
Degraded Performance
If you observe delays in data collection, missing data points, or timeouts, enable the self-monitoring feature as described in the Monitoring Configuration[1] page. This feature provides detailed metrics about job execution times, helping you identify inefficiencies such as misconfigurations, bottlenecks, or performance issues in specific components.
When self-monitoring is enabled, the metricshub.job.duration
metric provides insights into task execution times. Key attributes include:
job.type
: The operation performed by MetricsHub. Possible values are:discovery
: Identifies and registers components.collect
: Gathers telemetry data from monitored components.simple
: Executes a straightforward task.beforeAll
orafterAll
: Performs preparatory or cleanup operations.
monitor.type
: The component being monitored, such as:- Hardware metrics:
cpu
,memory
,physical_disk
, ordisk_controller
. - Environmental metrics:
temperature
orbattery
. - Logical entities:
connector
.
- Hardware metrics:
connector_id
: The unique identifier for the connector, such as HPEGen10IloREST for the HPE Gen10 iLO REST connector.
These metrics can be viewed in Prometheus/Grafana or in the metricshub-agent-$resourceId-$timestamp.log
file. Refer to the MetricsHub Log Files[2] page for details on locating and interpreting log files.
Example
Example of metrics emitted for the HPEGen10IloREST
connector:
metricshub.job.duration{job.type="discovery", monitor.type="enclosure", connector_id="HPEGen10IloREST"} 0.020
metricshub.job.duration{job.type="discovery", monitor.type="cpu", connector_id="HPEGen10IloREST"} 0.030
metricshub.job.duration{job.type="discovery", monitor.type="temperature", connector_id="HPEGen10IloREST"} 0.025
metricshub.job.duration{job.type="discovery", monitor.type="connector", connector_id="HPEGen10IloREST"} 0.015
metricshub.job.duration{job.type="collect", monitor.type="cpu", connector_id="HPEGen10IloREST"} 0.015
In this example:
- during
discovery
:- The
enclosure
monitor takes0.020
seconds. - The
cpu
monitor takes0.030
seconds. - The
temperature
monitor takes0.025
seconds. - The
connector
monitor takes0.015
seconds.
- The
- during
collect
, thecpu
metrics collection takes0.015
seconds.
These metrics indicate that MetricsHub is functioning as expected, with task durations well within acceptable ranges. Jobs exceeding 5 seconds may require further investigation.
For example, if a job takes more than 5 seconds, as shown below:
metricshub.job.duration{job.type="collect", monitor.type="network", connector_id="WbemGenNetwork"} 5.8
- Identify, the
job.type
,monitor.type
, andconnector.id
. In this example, collecting network metrics with theWbemGenNetwork
is the bottleneck - Check the
metricshub-agent-$resourceId-$timestamp.log
file for the start and end timestamps of each job step to identify where performance degradation occurs.
You can also:
- Verify resource availability: Ensure the monitored system has sufficient CPU, memory, and storage resources to handle monitoring tasks.
- Check MetricsHub configuration: Review your configuration to ensure MetricsHub is set up correctly .
- Restart services: If configurations appear correct, try restarting relevant services.
- Inspect network configurations: Check for network latency or connectivity issues between MetricsHub and the monitored resources, and ensure network settings (e.g., firewalls or proxies) are not causing delays.
- Examine logs: Look for warnings or errors in the MetricsHub logs[2] or the monitored system's logs to identify potential problems.
- Review timeouts: Ensure timeout settings are appropriate for the environment to prevent unnecessary delays or retries.