Skip to content

DEC Monitoring

Welcome to the monitoring setup for DEC! This guide explains how to access the Grafana dashboards we’ve set up to monitor CPU, memory, and node availability across the cluster.

This monitoring system is a prototype and will be improved over time. If you have any feedback or suggestions, please let us know! It is not yet exposed to the public internet, so you need to access it via SSH port forwarding.

Accessing Grafana

The Grafana service is running on mardec, you can access it from the CPPM network directly using the following URL: http://mardec.in2p3.fr:3000.

However, you need to set up SSH port forwarding to access it from any other network.

ssh -L 3000:localhost:3000 mardec.in2p3.fr
  • Leave this terminal window open during your session
  • Then open your browser and visit http://localhost:3000

Login Credentials

Use the following login:

Username: decuser
Password: (sent to you by email)

If you’ve lost the password, just ask the admin (nicely 😄).

Available Dashboards

Once logged in, you'll see:

  • All Nodes: CPU & Memory Usage
    → Live CPU & memory usage for all nodes in a single panel http://mardec.in2p3.fr:3000/goto/T3oMR9TNR?orgId=1

  • DEC Disk Usage Overview
    → Disk usage across all DEC disks http://mardec.in2p3.fr:3000/goto/5IjxRrTHg?orgId=1

  • DEC Memory Errors
    → Live edac errors for all nodes http://mardec.in2p3.fr:3000/goto/6UfV5CANR?orgId=1

(replace mardec.in2p3.fr with localhost in the URL if you are using SSH port forwarding)

Command Line Monitoring Tools

CPU Load Monitoring

Check CPU usage across the cluster with the script /softdec/tools/mardec_load.sh

Usage:

/softdec/tools/mardec_load.sh [par]

Parameters: - none (default): Gives the CPU load of nodes and create the machinefile for mpiexec with 1 thread per node - number: Specify number of threads for the machinefile (default = 1)

Memory Usage Monitoring

Check memory usage across the cluster with /softdec/tools/mardec_mem.sh

Usage:

/softdec/tools/mardec_mem.sh [par]

Parameters: - none (default): Gives the memory usage of nodes and create the machinefile for mpiexec with 1 thread per node - number: Specify number of threads for the machinefile (default = 1)