Back to the list of connectors[1]

Nvidia-Smi

Description

This connector provides hardware information about most Nvidia GPUs. (Clocking).

Target

Typical platform: Any system with Nvidia GPUs

Operating systems: Microsoft Windows, Linux

Prerequisites

Leverages: NVIDIA drivers with NVIDIA-SMI support.

Technology and protocols: System Commands

Examples

CLI

metricshub HOSTNAME -t win -c +NvidiaSmi --wmi -u USER

metricshub.yaml

resourceGroups:
  <RESOURCE_GROUP>:
    resources:
      <HOSTNAME-ID>:
        attributes:
          host.name: <HOSTNAME> # Change with actual host name
          host.type: win
        selectConnectors: [ NvidiaSmi ] # Optional, to load only this connector
        protocols:
          wmi:
            username: <USERNAME> # Change with actual credentials
            password: <PASSWORD> # Encrypted using metricshub-encrypt

Connector Activation Criteria

The Nvidia-Smi connector will be automatically activated, and its status will be reported as OK if all the below criteria are met:

  • The command below succeeds on the monitored host
    • Command: nvidia-smi
    • Output contains: Driver Version (regex)

Metrics

Type Collected Metrics Specific Attributes
enclosure
  • hw.status{hw.type="enclosure", state="present"}
    fan
    • hw.fan.speed_ratio
    • hw.fan.speed_ratio.limit{limit_type="low.critical"}
    • hw.fan.speed_ratio.limit{limit_type="low.degraded"}
    • hw.status{hw.type="fan", state="present"}
    • hw.parent.type
    • id
    • name
    • sensor_location
    gpu
    • hw.energy{hw.type="gpu"}
    • hw.gpu.io{direction="receive"}
    • hw.gpu.io{direction="transmit"}
    • hw.gpu.memory.limit
    • hw.gpu.memory.utilization
    • hw.gpu.utilization{task="decoder"}
    • hw.gpu.utilization{task="encoder"}
    • hw.gpu.utilization{task="general"}
    • hw.power{hw.type="gpu"}
    • hw.status{hw.type="gpu", state="present"}
    • driver_version
    • firmware_version
    • hw.parent.type
    • id
    • info
    • model
    • name
    • serial_number
    • vendor
    temperature
    • hw.status{hw.type="temperature", state="present"}
    • hw.temperature
    • hw.temperature.limit{limit_type="high.critical"}
    • hw.temperature.limit{limit_type="high.degraded"}
    • hw.parent.type
    • id
    • name
    • sensor_location
    voltage
    • hw.status{hw.type="voltage", state="present"}
    • hw.voltage
    • hw.parent.type
    • id
    • name
    • sensor_location
    No results.