Monitoring Configuration

MetricsHub extracts metrics from the resources configured in the config/metricshub.yaml file. These resources can be hosts, applications, or other components running in your IT infrastructure. Each resource is typically associated with a physical location, such as a data center or server room, or a logical location, like a business unit. In MetricsHub, these locations are referred to as sites. In highly distributed infrastructures, multiple resources can be organized into resource groups to simplify management and monitoring.

To reflect this organization, you are asked to define your resource group first, followed by your site and its corresponding resources in the config/metricshub.yaml file stored in:

  • C:\ProgramData\MetricsHub\config on Windows systems
  • ./metricshub/lib/config on Linux systems

Important: We recommend using an editor supporting the Schemastore[1] to edit MetricsHub's configuration YAML files (Example: Visual Studio Code[2] and vscode.dev[3], with RedHat's YAML extension[4]).

Step 1: Configure resource groups

Note: For centralized infrastructures, resourceGroups are not required. Simply configure resources as explained in Step 2[5].

Create a resource group for each site to be monitored under the resourceGroups: section:

resourceGroups:
  <resource-group-name>: 
    attributes:
      site: <site-name> # Specify where resources are hosted

Replace:

  • <resource-group-name> with the actual name of your resource group
  • <site-name> with the name of a logical or physical location. This value must be unique.

Example:

resourceGroups:
  boston:
    attributes:
      site: boston

At this stage, you can configure sustainability metrics reporting. For more details, refer to the Sustainability[6] page.

Step 2: Configure resources

Resources can either be configured:

  • under the resources section located at the top of the config/metricshub.yaml file (recommended for centralized infrastructures)

    attribute:
      site: <central-site>
    
    resources:
      <resource-id>:
        attributes:
          host.name: <hostname>
          host.type: <type>
        <protocol-configuration>
    
  • or under the resource group you previously specified (recommended for highly distributed infrastructures)

    resourceGroups: 
      <resource-group-name>: 
        attributes:
          site: <site-name> 
        resources:
          <resource-id>:
            attributes:
              host.name: <hostname>
              host.type: <type>
            <protocol-configuration>
    

The syntax to adopt for configuring your resources will differ whether your resources have unique or similar characteristics (such as device type, protocols, and credentials).

Syntax for unique resources

resources:
  <resource-id>:
    attributes:
      host.name: <hostname> 
      host.type: <type>  
    <protocol-configuration> 

Syntax for resources sharing similar characteristics

resources:
  <resource-id>:
    attributes:
      host.name: [ <hostname1>, <hostname2>, etc. ]
      host.type: <type>
      host.extra.attribute: [ <extra-attribute-for-hostname1>, <extra-attribute-for-hostname2>, etc. ]
    <protocol-configuration>

Whatever the syntax adopted, replace:

  • <hostname> with the actual hostname or IP address of the resource
  • <type> with the type of resource to be monitored. Possible values are:
    • win[7] for Microsoft Windows systems
    • linux[8] for Linux systems
    • network[9] for network devices
    • oob for Out-of-band management cards
    • storage[10] for storage systems
    • aix[11] for IBM AIX systems
    • hpux[12] for HP UX systems
    • solaris[13] for Oracle Solaris systems
    • tru64[14] for HP Tru64 systems
    • vms[14] for HP Open VMS systems. Check out the Connector Directory[15] to find out which type corresponds to your system.
  • <protocol-configuration> with the protocol(s) MetricsHub will use to communicate with the resources: http[16], ipmi[17], jdbc[18], oscommand[19], ping[20], ssh[21], snmp[22], wbem[23],wmi[24], or winrm[25]. Refer to Protocols and Credentials[26] for more details.

Note: You can use the ${env::ENV_VARIABLE_NAME} syntax in the config/metricshub.yaml file to call your environment variables.

Example

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      myBostonHost1:
        attributes:
          host.name: my-boston-host-01
          host.type: storage
        <protocol-configuration>
      myBostonHost2:
        attributes:
          host.name: my-boston-host-02
          host.type: storage
        <protocol-configuration>
  chicago:
    attributes:
      site: chicago
    resources:
      myChicagoHost1:
        attributes:
          host.name: my-chicago-host-01
          host.type: storage
        <protocol-configuration>
      myChicagoHost2:
        attributes:
          host.name: my-chicago-host-02
          host.type: storage
        <protocol-configuration>

Protocols and credentials

HTTP

Use the parameters below to configure the HTTP protocol:

Parameter Description
http Protocol used to access the host.
hostname The name or IP address of the resource. If not specified, the host.name attribute will be used.
port The HTTPS port number used to perform HTTP requests (Default: 443).
username Name used to establish the connection with the host via the HTTP protocol.
password Password used to establish the connection with the host via the HTTP protocol.
timeout How long until the HTTP request times out (Default: 60s).

Example

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      myHost1:
        attributes:
          host.name: my-host-01
          host.type: storage
        protocols:
          http:
            https: true
            port: 443
            username: myusername
            password: mypwd
            timeout: 60

ICMP Ping

Use the parameters below to configure the ICMP ping protocol:

Parameter Description
hostname The name or IP address of the resource. If not specified, the host.name attribute will be used.
ping Protocol used to test the host reachability through ICMP.
timeout How long until the ping command times out (Default: 5s).

Example

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      myHost1:
        attributes:
          host.name: my-host-01
          host.type: linux
        protocols:
          ping:
            timeout: 10s

IPMI

Use the parameters below to configure the IPMI protocol:

Parameter Description
ipmi Protocol used to access the host.
hostname The name or IP address of the resource. If not specified, the host.name attribute will be used.
username Name used to establish the connection with the host via the IPMI protocol.
password Password used to establish the connection with the host via the IPMI protocol.

Example

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      myHost1:
        attributes:
          host.name: my-host-01
          host.type: oob
        protocols:
          ipmi:
            username: myusername
            password: mypwd

JDBC

Use the parameters below to configure JDBC to connect to a database:

Parameter Description
jdbc JDBC configuration used to connect to a database on the host
hostname The name or IP address of the resource. If not specified, the host.name attribute will be used.
timeout How long until the SQL query times out (Default: 120s).
username Name used to authenticate against the database.
password Password used to authenticate against the database.
url The JDBC connection URL to access the database.
type The type of database (e.g., Oracle, PostgreSQL, MSSQL, Informix, Derby, H2).
port The port number used to connect to the database.
database The name of the database instance to connect to on the server.

Example

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      db-host:
        attributes:
          host.name: my-host-02
          host.type: win
        protocols:
          jdbc:
            hostname: my-host-02
            username: dbuser
            password: dbpassword
            url: jdbc:mysql://my-host-02:3306/mydatabase
            timeout: 120s
            type: mysql
            port: 3306
            database: mydatabase

OS commands

Use the parameters below to configure OS Commands that are executed locally:

Parameter Description
osCommand Protocol used to access the host.
hostname The name or IP address of the resource. If not specified, the host.name attribute will be used.
timeout How long until the local OS Commands time out (Default: 120s).
useSudo Whether sudo is used or not for the local OS Command: true or false (Default: false).
useSudoCommands List of commands for which sudo is required.
sudoCommand Sudo command to be used (Default: sudo).

Example

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      myHost1:
        attributes:
          host.name: my-host-01
          host.type: linux
        protocols:
          osCommand:
            timeout: 120
            useSudo: true
            useSudoCommands: [ cmd1, cmd2 ]
            sudoCommand: sudo

SSH

Use the parameters below to configure the SSH protocol:

Parameter Description
ssh Protocol used to access the host.
hostname The name or IP address of the resource. If not specified, the host.name attribute will be used.
timeout How long until the command times out (Default: 120s).
port The SSH port number to use for the SSH connection (Default: 22).
useSudo Whether sudo is used or not for the SSH Command (true or false).
useSudoCommands List of commands for which sudo is required.
sudoCommand Sudo command to be used (Default: sudo).
username Name to use for performing the SSH query.
password Password to use for performing the SSH query.
privateKey Private Key File to use to establish the connection to the host through the SSH protocol.

Example

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      myHost1:
        attributes:
          host.name: my-host-01
          host.type: linux
        protocols:
          ssh:
            timeout: 120
            port: 22
            useSudo: true
            useSudoCommands: [ cmd1, cmd2 ]
            sudoCommand: sudo
            username: myusername
            password: mypwd
            privateKey: /tmp/ssh-key.txt

SNMP

Use the parameters below to configure the SNMP protocol:

Parameter Description
snmp Protocol used to access the host.
hostname The name or IP address of the resource. If not specified, the host.name attribute will be used.
version The version of the SNMP protocol (v1, v2c).
community The SNMP Community string to use to perform SNMP v1 queries (Default: public).
port The SNMP port number used to perform SNMP queries (Default: 161).
timeout How long until the SNMP request times out (Default: 120s).

Example

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      myHost1:
        attributes:
          host.name: my-host-01
          host.type: linux
        protocols:
          snmp:
            version: v1
            community: public
            port: 161
            timeout: 120s

      myHost2:
        attributes:
          host.name: my-host-02
          host.type: linux
        protocols:
          snmp:
            version: v2c
            community: public
            port: 161
            timeout: 120s

SNMP version 3

Use the parameters below to configure the SNMP version 3 protocol:

Parameter Description
snmpv3 Protocol used to access the host using SNMP version 3.
hostname The name or IP address of the resource. If not specified, the host.name attribute will be used.
timeout How long until the SNMP request times out (Default: 120s).
port The SNMP port number used to perform SNMP version 3 queries (Default: 161).
contextName The name of the SNMP version 3 context, used to identify the collection of management information.
authType The SNMP version 3 authentication protocol (MD5, SHA or NoAuth) to ensure message authenticity.
privacy The SNMP version 3 privacy protocol (DES, AES or NONE) used to encrypt messages for confidentiality.
username The username used for SNMP version 3 authentication.
privacyPassword The password used to encrypt SNMP version 3 messages for confidentiality.
password The password used for SNMP version 3 authentication.
retryIntervals The intervals (in milliseconds) between SNMP request retries.

Example

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      myHost3:
        attributes:
          host.name: my-host-03
          host.type: linux
        protocols:
          snmpv3:
            port: 161
            timeout: 120s
            contextName: myContext
            authType: SHA
            privacy: AES
            username: myUser
            privacyPassword: myPrivacyPassword
            password: myAuthPassword 

WBEM

Use the parameters below to configure the WBEM protocol:

Parameter Description
wbem Protocol used to access the host.
hostname The name or IP address of the resource. If not specified, the host.name attribute will be used.
protocol The protocol used to access the host.
port The HTTPS port number used to perform WBEM queries (Default: 5989 for HTTPS or 5988 for HTTP).
timeout How long until the WBEM request times out (Default: 120s).
username Name used to establish the connection with the host via the WBEM protocol.
password Password used to establish the connection with the host via the WBEM protocol.
vcenter vCenter hostname providing the authentication ticket, if applicable.

Example

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      myHost1:
        attributes:
          host.name: my-host-01
          host.type: storage
        protocols:
          wbem:
            protocol: https
            port: 5989
            timeout: 120s
            username: myusername
            password: mypwd

WMI

Use the parameters below to configure the WMI protocol:

Parameter Description
wmi Protocol used to access the host.
hostname The name or IP address of the resource. If not specified, the host.name attribute will be used.
timeout How long until the WMI request times out (Default: 120s).
username Name used to establish the connection with the host via the WMI protocol.
password Password used to establish the connection with the host via the WMI protocol.

Example

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      myHost1:
        attributes:
          host.name: my-host-01
          host.type: win
        protocols:
          wmi:
            timeout: 120s
            username: myusername
            password: mypwd

WinRM

Use the parameters below to configure the WinRM protocol:

Parameter Description
winrm Protocol used to access the host.
hostname The name or IP address of the resource. If not specified, the host.name attribute will be used.
timeout How long until the WinRM request times out (Default: 120s).
username Name used to establish the connection with the host via the WinRM protocol.
password Password used to establish the connection with the host via the WinRM protocol.
protocol The protocol used to access the host: HTTP or HTTPS (Default: HTTP).
port The port number used to perform WQL queries and commands (Default: 5985 for HTTP or 5986 for HTTPS).
authentications Ordered list of authentication schemes: NTLM, KERBEROS (Default: NTLM).

Example

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      myHost1:
        attributes:
          host.name: my-host-01
          host.type: win
        protocols:
          winrm:
            protocol: http
            port: 5985
            username: myusername
            password: mypwd
            timeout: 120s
            authentications: [ntlm]

Step 3: Configure additional settings

Customize resource hostname

By default, the host.name attribute specified for a resource determines both:

  • the hostname used to execute requests against the resource for collecting metrics
  • the hostname associated with each OpenTelemetry metric collected for the resource.

If your resource requires different hostnames for these purposes, you can customize the configuration as follows.

Example for unique resources

Here’s an example of customizing the hostname for a unique resource:

resources:
  myHost1:
    attributes:
      host.name: custom-hostname # Hostname applied to the collected metrics 
      host.type: linux
    protocols:
      snmp:
        hostname: my-host-01 # Hostname used for the SNMP requests
        version: v1
        community: public
        port: 161
        timeout: 1m

Example for resources sharing similar characteristics

For resources with shared characteristics, you can define multiple hostnames in the configuration:

resources:
  shared-characteristic-hosts:
    attributes:
      host.name: [ custom-hostname1, custom-hostname2 ] # Hostnames applied to the collected metrics 
      host.type: linux
    protocols:
      snmp:
        hostname: [ my-host-01, my-host-02 ] # Hostnames used for the SNMP requests
        version: v1
        community: public
        port: 161
        timeout: 1m

Important: Ensure the values of host.name are listed in the exact same order as those in hostname. Each value listed in host.name must correspond to the value at the same position in hostname. Misaligned orders will result in mismatched data and inconsistencies in the collected metrics for each resource.

Customize resource monitoring

If the connectors included in MetricsHub do not collect the metrics you need, you can configure one or several monitors to obtain this data from your resource and specify its corresponding attributes and metrics in MetricsHub.

A monitor defines how MetricsHub collects and processes data for the resource. For each monitor, you must provide the following information:

  • its name
  • the type of job it performs (e.g., simple for straightforward monitoring tasks)
  • the data sources from which metrics are collected
  • how the collected metrics are mapped to MetricsHub's monitoring model.

Configuration

Follow the structure below to declare your monitor:

<resource-group>:
  <resource-key>:
    attributes:
      # <attributes...>
    protocols:
      # <credentials...>
    monitors:
      <monitor-name>:
        <job>: # Job type, e.g., "simple"
          sources:
            <source-name>:
              # <source-content>
          mapping:
            source: <mapping-source-reference>
            attributes:
              # <attributes-mapping...>
            metrics:
              # <metrics-mapping...>

Refer to:

Basic Authentication settings

Enterprise Edition authentication

In the Enterprise Edition, the MetricsHub's internal OTLP Exporter authenticates itself with the OpenTelemetry Collector's OTLP gRPC Receiver[29] by including the HTTP Authorization request header with the credentials.

These settings are already configured in the config/metricshub.yaml file of MetricsHub Enterprise Edition. Changing them is not recommended unless you are familiar with managing communication between the MetricsHub OTLP Exporter and the OpenTelemetry Collector's OTLP Receiver.

To override the default value of the Basic Authentication Header, configure the otel.exporter.otlp.metrics.headers and otel.exporter.otlp.logs.headers parameters under the otel section:

# Internal OpenTelemetry SDK configuration
otel:
  # OpenTelemetry SDK Autoconfigure properties
  # https://github.com/open-telemetry/opentelemetry-java/tree/main/sdk-extensions/autoconfigure
  # MetricsHub Default configuration
  otel.metrics.exporter: otlp
  otel.exporter.otlp.metrics.endpoint: https://localhost:4317
  otel.exporter.otlp.metrics.protocol: grpc
  otel.exporter.otlp.metrics.headers: Authorization=Basic <base64-username-password>
  otel.exporter.otlp.logs.headers: Authorization=Basic <base64-username-password>
resourceGroups: # ...

where <base64-username-password> credentials are built by first joining your username and password with a colon (myUsername:myPassword) and then encoding the value in base64.

Warning: If you update the Basic Authentication Header, you must generate a new .htpasswd file for the OpenTelemetry Collector Basic Authenticator[30].

Community Edition authentication

If your OTLP Receiver requires authentication headers, configure the otel.exporter.otlp.metrics.headers and otel.exporter.otlp.logs.headers parameters under the otel section:

otel:
  otel.exporter.otlp.metrics.headers: <custom-header1>
  otel.exporter.otlp.logs.headers: <custom-header2>

resourceGroups: # ...

Monitoring settings

Collect period

By default, MetricsHub collects metrics from the monitored resources every minute. To change the default collect period:

  • For all your resources, add the collectPeriod parameter just before the resourceGroups section:

    collectPeriod: 2m
    
    resourceGroups: # ...
    
  • For a specific resource, add the collectPeriod parameter at the resource level. In the example below, we set the collectPeriod to 1m30s for myHost1:

    resourceGroups:
      boston:
        attributes:
          site: boston
        resources:
          myHost1:
            attributes:
              host.name: my-host-01
              host.type: linux
            protocols:
              snmp:
                version: v1
                community: public
                port: 161
                timeout: 120s
            collectPeriod: 1m30s # Customized
    

Warning: Collecting metrics too frequently can cause CPU-intensive workloads.

Connectors

When running MetricsHub, the connectors are automatically selected based on the device type provided and the enabled protocols. However, you have the flexibility to specify which connectors should be utilized or omitted.

The connectors parameter allows you to force, select, or exclude specific connectors. Connector names or category tags should be separated by commas, as illustrated in the example below:

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      myHost1:
        attributes:
          host.name: my-host-01
          host.type: win
        protocols:
          wmi:
            timeout: 120s
            username: myusername
            password: mypwd
        connectors: [ "#system" ]
  • To force a connector, precede the connector identifier with a plus sign (+), as in +MIB2.
  • To exclude a connector from automatic detection, precede the connector identifier with an exclamation mark (!), like !MIB2.
  • To stage a connector for processing by automatic detection, configure the connector identifier, for instance, MIB2.
  • To stage a category of connectors for processing by automatic detection, precede the category tag with a hash (#), such as #hardware or #system.
  • To exclude a category of connectors from automatic detection, precede the category tag to be excluded with an exclamation mark and a hash sign (!#), such as !#system.

Notes:

  • Any misspelled connector will be ignored.
  • Misspelling a category tag will prevent automatic detection from functioning due to an empty connectors staging.
Examples
  • Example 1:

    connectors: [ "#hardware" ]
    

The core engine will automatically detect connectors categorized under hardware.

  • Example 2:

    connectors: [ "!#hardware", "#system" ]
    

    The core engine will perform automatic detection on connectors categorized under system, excluding those categorized under hardware.

  • Example 3:

    connectors: [ DiskPart, MIB2, "#system" ]
    

    The core engine will automatically detect connectors named DiskPart, MIB2, and all connectors under the system category.

  • Example 4:

    connectors: [ +DiskPart, MIB2, "#system" ]
    

    The core engine will force the execution of the DiskPart connector and then proceed with the automatic detection of MIB2 and all connectors under the system category.

  • Example 5:

    connectors: [ DiskPart, "!#system" ]
    

    The core engine will perform automatic detection exclusively on the DiskPart connector.

  • Example 6:

    connectors: [ +Linux, MIB2 ]
    

    The core engine will force the execution of the Linux connector and subsequently perform automatic detection on the MIB2 connector.

  • Example 7:

    connectors: [ "!Linux" ]
    

    The core engine will perform automatic detection on all connectors except the Linux connector.

  • Example 8:

    connectors: [ "#hardware", "!MIB2" ]
    

    The core engine will perform automatic detection on connectors categorized under hardware, excluding the MIB2 connector.

To know which connectors are available, refer to Connectors Directory[31].

Otherwise, you can list the available connectors using the below command:

$ metricshub -l

For more information about the metricshub command, refer to MetricsHub CLI (metricshub)[32].

Patch Connectors

By default, MetricsHub loads connectors from the connectors subdirectory within its installation directory. However, you can extend this functionality by adding a custom directory for additional connectors. This can be done by specifying a patch directory in the metricshub.yaml configuration file.

To configure an additional connector directory, set the patchDirectory property to the path of your custom connectors directory, as shown in the example below:


patchDirectory: /opt/patch/connectors # Replace with the path to your patch connectors directory.

loggerLevel: ...

Customize data collection

MetricsHub allows you to customize data collection on your Windows or Linux servers, specifying exactly which processes or services to monitor. This customization is achieved by configuring the following connector variables:

Connector Variable Available for Usage
matchCommand Linux - Processes (ps)[33]
Windows - Processes (WMI)[34]
Used to specify the command lines to monitor on a Linux or Windows server.
matchName Linux - Processes (ps)[33]
Windows - Processes (WMI)[34]
Used to specify the processes to monitor on a Linux or Windows server.
matchUser Linux - Processes (ps)[33] Used to specify the users to include.
serviceNames Linux - Service (systemctl)[35]
Windows - Services (WMI)[36]
Used to specify the services to monitor on a Linux or Windows server.

Refer to the Connectors directory[37] and more especially to the Variables section of the connector to know the supported variables and their accepted values.

Procedure

In the config/metricshub.yaml file, locate the resource for which you wish to customize data collection and specify the variables attribute available under the additionalConnectors section:

resources:
  <host-id>:
    attributes:
      host.name: <hostname>
      host.type: <type>
    additionalConnectors:
      <connector-custom-id>: # Unique ID. Use 'uses' if different from the original connector ID
        uses: <connector-original-id> # Optional - Original ID if not in key
        force: true # Optional (default: true); false for auto-detection only
        variables:
          <variable-name>: <value>
Property Description
<connector-custom-id> Custom ID for this additional connector.
uses (Optional) Provide an ID for this additional connector. If not specified, the key ID will be used.
force (Optional) Set to false if you want the connector to only be activated when detected (Default: true - always activated).
variables Specify the connector variable to be used and its value (Format: <variable-name>: <value>).

Note: If a connector is added under the additionalConnectors section with missing or unspecified variables, those variables will automatically be populated with default values defined by the connector itself.

For practical examples demonstrating effective use of this feature, refer to the following pages:

Filter monitors

A monitor is any entity tracked by MetricsHub within the main resource, such as processes, services, storage volumes, or physical devices like disks.

To manage the volume of telemetry data sent to your observability platform and therefore reduce costs and optimize performance, you can specify which monitors to include or exclude.

You can apply monitor inclusion or exclusion in data collection for the following scopes:

  • All resources
  • All the resources within a specific resource group. A resource group is a container that holds resources to be monitored and generally refers to a site or a specific location.
  • A specific resource

This is done by adding the monitorFilters parameter in the relevant section of the config/metricshub.yaml file as described below:

Filter monitors Add monitorFilters
For all resources In the global section (top of the file)
For all the resources of a specific resource group Under the corresponding <resource-group-name> section
For a specific resource Under the corresponding <resource-id> section

The monitorFilters parameter accepts the following values:

  • +<monitor_name> for inclusion
  • "!<monitor_name>" for exclusion.

To obtain the monitor name:

  1. Refer to the MetricsHub Connector Library[31]
  2. Click the connector of your choice (e.g.: WindowsOS Metrics[40])
  3. Scroll-down to the Metrics section and note down the relevant monitor Type.

Warning: Excluding monitors may lead to missed outage detection or inconsistencies in collected data, such as inaccurate power consumption estimates or other metrics calculated by the engine. Use exclusions carefully to avoid overlooking important information. The monitoring of critical devices such as batteries, power supplies, CPUs, fans, and memories should not be disabled.

Example 1: Including monitors for all resources
monitorFilters: [ +enclosure, +fan, +power_supply ] # Include specific monitors globally
resourceGroups: ...
Example 2: Excluding monitors for all resources
monitorFilters: [ "!volume" ] # Exclude specific monitors globally
Example 3: Including monitors for all resources within a specific resource group
resourceGroups:
  <resource-group-name>:
    monitorFilters: [ +enclosure, +fan, +power_supply ] # Include specific monitors for this group
    resources: ...
Example 4: Excluding monitors for all resources within a specific resource group
resourceGroups:
  <resource-group-name>:
    monitorFilters: [ "!volume" ] # Exclude specific monitors for this group
    resources: ...
Example 5: Including monitors for a specific resource
resourceGroups:
  <resource-group-name>:
    resources:
      <resource-id>:
        monitorFilters: [ +enclosure, +fan, +power_supply ] # Include specific monitors for this resource
Example 6: Excluding monitors for a specific resource
resourceGroups:
  <resource-group-name>:
    resources:
      <resource-id>:
        monitorFilters: [ "!volume" ] # Exclude specific monitors for this resource

Discovery cycle

MetricsHub periodically performs discoveries to detect new components in your monitored environment. By default, MetricsHub runs a discovery after 30 collects. To change this default discovery cycle:

  • For all your resources, add the discoveryCycle just before the resourceGroups section:

    discoveryCycle: 15
    
    resourceGroups: # ...
    
  • For a specific host, add the discoveryCycle parameter at the resource level and indicate the number of collects after which a discovery will be performed. In the example below, we set the discoveryCycle to be performed after 5 collects for myHost1:

    resourceGroups:
      boston:
        attributes:
          site: boston
        resources:
          myHost1:
            attributes:
              host.name: my-host-01
              host.type: linux
            protocols:
              snmp:
                version: v1
                community: public
                port: 161
                timeout: 120s
            discoveryCycle: 5 # Customized
    

Warning: Running discoveries too frequently can cause CPU-intensive workloads.

Resource Attributes

Add labels in the attributes section to override the data collected by the MetricsHub Agent or add additional attributes to the Host Resource[41]. These attributes are added to each metric of that Resource when exported to time series platforms like Prometheus.

In the example below, we added a new app attribute and indicated that this is the Jenkins app:

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      myHost1:
        attributes:
          host.name: my-host-01
          host.type: windows
          app: Jenkins
        protocols:
          http:
            https: true
            port: 443
            username: myusername
            password: mypwd
            timeout: 60

Hostname resolution

By default, MetricsHub uses the configured host.name value as-is to populate the Host Resource[41] attributes. This ensures that the host.name remains consistent with what is configured.

To resolve the host.name to its Fully Qualified Domain Name (FQDN), set the resolveHostnameToFqdn configuration property to true as shown below:

resolveHostnameToFqdn: true

resourceGroups:

This ensures that each configured resource will resolve its host.name to FQDN.

To enable FQDN resolution for a specific resource group, set the resolveHostnameToFqdn property to true under the desired resource group configuration as shown below:

resourceGroups:
  boston:
    resolveHostnameToFqdn: true
    attributes:
      site: boston
    resources:
      # ...

This ensures that all resources within the boston resource group will resolve their host.name to FQDN.

To enable FQDN resolution for an individual resource within a resource group, set the resolveHostnameToFqdn under the resource configuration as shown below:

resourceGroups:
  boston:
    attributes:
      site: boston
    resources:
      my-host-01:
        resolveHostnameToFqdn: true
        attributes:
          host.name: my-host-01
          host.type: linux

In this case, only my-host-01 will resolve its host.name to FQDN, while other resources in the boston group will retain their original host.name values.

Warning: If there is an issue during the resolution, it may result in a different host.name value, potentially impacting metric identity.

Job pool size

By default, MetricsHub runs up to 20 discovery and collect jobs in parallel. To increase or decrease the number of jobs MetricsHub can run simultaneously, add the jobPoolSize parameter just before the resourceGroups section:

jobPoolSize: 40 # Customized

resourceGroups: # ...

Warning: Running too many jobs in parallel can lead to an OutOfMemory error.

Sequential mode

By default, MetricsHub sends the queries to the resource in parallel. Although the parallel mode is faster than the sequential one, too many requests at the same time can lead to the failure of the targeted system.

To force all the network calls to be executed in sequential order:

  • For all your resources, add the sequential parameter before the resourceGroups section (NOT RECOMMENDED) and set it to true:

    sequential: true
    
    resourceGroups: # ...
    
  • For a specific resource, add the sequential parameter at the resource level and set it to true. In the example below, we enabled the sequential mode for myHost1

    resourceGroups:
      boston:
        attributes:
          site: boston
        resources:
          myHost1:
            attributes:
              host.name: my-host-01
              host.type: linux
            protocols:
              snmp:
                version: v1
                community: public
                port: 161
                timeout: 120s
            sequential: true # Customized
    

Warning: Sending requests in sequential mode slows down the monitoring significantly. Instead of using the sequential mode, you can increase the maximum number of allowed concurrent requests in the monitored system, if the manufacturer allows it.

StateSet metrics compression

By default, MetricsHub compresses StateSet metrics to reduce unnecessary reporting of zero values and to avoid high cardinality in time series databases. This compression can be configured at various levels: globally, per resource group, or for a specific resource.

Compression configuration stateSetCompression

This configuration controls how StateSet metrics are reported, specifically whether zero values should be suppressed or not.

  • Supported values:
    • none: No compression is applied. All StateSet metrics, including zero values, are reported on every collection cycle.
    • suppressZeros (default): MetricsHub compresses StateSet metrics by reporting the zero value only the first time a state transitions to zero. Subsequent reports will include only the non-zero state values.

To configure the StateSet compression level, you can apply the stateSetCompression setting in the following scopes:

  1. Global configuration (applies to all resources):

    Add stateSetCompression to the root of the config/metricshub.yaml file:

    stateSetCompression: suppressZeros # set to "none" to disable the StateSet compression
    resourceGroups: ...
    
  2. Per resource group (applies to all resources within a specific group):

    Add stateSetCompression within a specific resourceGroup in config/metricshub.yaml:

    resourceGroups:
      <resource-group-name>:
        stateSetCompression: suppressZeros # set to "none" to disable the StateSet compression
        resources: ...
    
  3. Per resource (applies to a specific resource):

    Add stateSetCompression for an individual resource in config/metricshub.yaml:

    resourceGroups:
      <resource-group-name>:
        resources:
          <resource-id>:
            stateSetCompression: suppressZeros # set to "none" to disable the StateSet compression
    
How it works

By default, with suppressZeros enabled, MetricsHub optimizes metric reporting by suppressing repeated zero values after the initial transition. Only non-zero state metrics will continue to be reported.

Example: Monitoring the health status of a resource

Let’s say MetricsHub monitors the health status of a specific resource, which can be in one of three states: ok, degraded, or failed.

When compression is disabled (stateSetCompression: none), MetricsHub will report all states, including zeros, during each collection cycle. For example:

hw.status{state="ok"} 0
hw.status{state="degraded"} 1
hw.status{state="failed"} 0

Here, the resource is in the degraded state, but the metrics for the ok and failed states are also reported with values of 0. This leads to unnecessary data being sent.

When compression is enabled (stateSetCompression: suppressZeros), MetricsHub will only report the non-zero state, significantly reducing the amount of data collected. For the same scenario, the report would look like this:

hw.status{state="degraded"} 1

In this case, only the degraded state is reported, and the zero values for ok and failed are suppressed after the initial state transition.

Timeout, duration and period format

Timeouts, durations and periods are specified with the below format:

Unit Description Examples
s seconds 120s
m minutes 90m, 1m15s
h hours 1h, 1h30m
d days (based on a 24-hour day) 1d
No results.