This solution enhances the optimization of HPC system usage by identifying jobs that closely resemble a currently analyzed workload. It leverages the system’s execution history to build a clustering model in which each job is represented through an embedding capturing its essential characteristics. 

These embeddings are grouped into clusters of comparable jobs, enabling the method to determine similarity by locating the cluster associated with the job most akin to the current one.

Once similar jobs have been identified, their collective behavior provides valuable insights for improving system efficiency.

Depending on the context, the method can support actions such as reducing unnecessary resource consumption, preventing the execution of malicious or zombie jobs, or detecting abnormal system behavior that may require maintenance. The insights derived from similar past jobs can be applied at different stages: before launching the current job, during its execution, or after its completion to optimize future workloads.

Through this adaptive and data-driven strategy, the solution enables more informed decisions regarding job management and resource allocation, ultimately contributing to a more efficient and reliable HPC environment.

 

Status: Submitted to the European Patent Office