Image
alumet

Since this previous publication about ALUMET on LinkedIn, BULL's Digital Continuum R&D team has validated another innovative use case for ALUMET in the context of the EXIGENCE Project. This contribution has focused on converting heterogeneous energy signals across VMs, baremetal servers, and GPU-accelerated nodes into an eco-aware monitoring and orchestration framework for AI applications, all from a harmonized data model that extends the OpenTelemetry semantics catalog and greatly reduces the technical knowledge required for application developers to use the framework.

ALUMET's modular plugin architecture was the answer. One tool, three configurations to ensure that each node gets exactly the measurement strategy its hardware supports, in this particular case the selection was: TDP estimation for VMs, RAPL for baremetal CPU-only nodes, or RAPL+NVML for GPU-accelerated baremetal nodes.

The figure depicts the infrastructure made available to the project for the deployment of Use Case 2 which shows 3 different types of nodes in a single kubernetes cluster. This alone represents a core challenge in energy monitoring as each node type exposes power consumption differently.

More on the advantages about ALUMET's approach and justification of this selection is available on deliverable D4.3 (See all EXIGENCE'S public deliverables HERE).

Another core challenge was to provide eco-driven automations without specialized technical knowledge about the underlying infrastructure and the low-level metrics as a way to facilitate the adoption of the eco-aware orchestration framework and its compatibility to different clusters without requiring changes at the application level. To do this, the approach has been to enrich the metric exposure layer with eco-related exporters to later compose a harmonized data model extending OpenTelemetry semantics catalog. The final selection of exporters cover:

  • Real-time energy mix, carbon intensity, energy cost for the location of the node pulled from ENTSO-E's public API.
  • Hardware characterization per node: PUE, embodied emissions, lifecycle data pulled from the node's datasheets and historical data.

The generated data model additionally enhances all the previous metrics with the calculated SCI score (Software Carbon Intensity) which provides a non-manipulative green score that can be used to compare nodes and workloads across hardware types and industries. The data model covers both, per node metrics, and to each workload found thanks to ALUMET's plugins attribution and correlation plugins, to then drive eco-aware orchestration automations natively integrated in the kubernetes ecosystem:

  • A custom green scheduler plugin integrated into the scheduling framework of kubernetes to rank nodes by their SCI scores.
  • An implementation of the prometheus HPA to scale workloads based on the harmonized data model. E.g., scale out pods in a deployment when the carbon intensity is low, react to real-time energy prices, enabling a shift from performance-only scaling to sustainability-driven elasticity.

The result is a developer-friendly, eco-aware monitoring and orchestration framework built on top of the monitoring capabilities of ALUMET.

Alumet is a modular, open-source framework designed to measure and monitor metrics such as energy consumption and performance across hardware and software systems.

It provides a flexible, plugin-based pipeline that collects data from multiple sources, processes it, and exports results, enabling users to build efficient and customized measurement tools with minimal overhead.