Energy reduction for AI loads - webinar highlights

 

Energy reduction for AI loads webinar highlights

 

Background

Cloud service providers and data centre owners have power consumption limits. Finding the optimal balance between the performance (Time to Solution) and power consumption (Energy to Solution) has significant economic implications. What’s more, the growing demand for ever larger AI/ML workloads on GPU makes it imperative to carefully manage power consumption while maximising the use of GPUs, which are high-value assets.
The jointly organised webinar by ANDREAS and AI-SPRINT "Energy reduction for AI loads" demonstrates how service providers and data centres can keep power consumption under control by simultaneously optimising workloads and lowering cloud resource usage costs for cloud users. 

 

Energy in DCs for AI and the ANDREAS solution

The webinar was opened by Fabrizio MaguglianiStrategic Planning and Business Development at E4 Computer Engineering SpA, who provided a complete overview of the requirements of today's data centres in terms of consumption, the growing demand for installed power, and drivers for innovation in the energy field. He concluded his presentation by spotlighting the main objectives of the ANDREAS project to support an efficient use of high-value computational resources.  

Katarzyna MaterkaQuality Manager at 7bulls.com, focused on  the ANDREAS architecture, which is designed to support containerised AI training jobs in on-demand GPU systems, while minimising the expected energy costs. She also highlighted the ANDREAS framework for handling a complete training process of Artificial Intelligence models, including job scheduler, job optimiser and REST based API integrator.

More insights on energy reduction and AI came from Danilo ArdagnaAssociate Professor at Politecnico di Milano in terms of how the job optimisation is performed. The solution is based on a greedy algorithm approach and experimental results, which allow  energy cost reduction of 5% and 10% and a total cost reduction between 30% and 62% compared with first-principle methods.

Paweł SkrzypekChief Technology Officer at AI Investments Ltd and Maciej RiedlDeveloper at 7bulls.com, concluded with a live demonstration.

 

ANDREAS evolution in AI-SPRINT

Danilo Ardagna, Scientific Coordinator of AI-SPRINT, explained how this EU-funded project aims to overcome the twofold challenge related to cloud and IoT systems with a huge amount of data, as well as dealing with their complexity in terms of data heterogeneity, lack of engineering skills, and unstructured development processes. 

AI-SPRINT plans to develop an integrated design runtime framework that provides simplified programming models which lower the time required for developing new AI applications. The framework developed by ANDREAS can be exploited by AI-SPRINT, in particular for the development of its thematic use cases (Personalised Healthcare, Maintenance & Inspection, and Farming 4.0).

 

Roundtable and Q&A

The webinar was followed by an interesting panel discussion, moderated by Stephanie ParkerSenior Research Analyst at Trust-IT Services, which brought together members of the AI-SPRINT project and external experts: Paweł BochniarzPartner at ASI ValueTech Seed Fund VCDaniele CesariniHPC Specialist at Cineca High Performance Computing DepartmentMichela MilanoFull Professor at Università di BolognaGianluca PalermoAssociate Professor at Politecnico di Milano and Maria VaqueroInnovation Manager at Cloud&Heat.

Discussions with the panel of experts focused on their rich and diverse viewpoints on how the landscape will evolve over the next 3 to 5 years and the priorities to focus on in the short term in the areas of:

  • Energy efficiency in supercomputers in the shift towards exascale.
  • Power-aware aspects of drug discovery in exascale architectures.
  • Energy efficiency in data centres for applications that need a lot of computing resources. 
  • AI and decision support in application areas like energy, computing and sustainability.
  • AI/ML with heavy GPU usage, e.g. in neural networks training and computer vision, from both an HPC and SME/start-up perspective. 

 

Participants and Poll questions

The webinar attracted a total of 29 participants, from 10 countries. 

The interactive features included live polling with the audience to better understand their familiarity with energy efficiency and transfer of research results. The following graphs show the main findings and takeaways from the short surveys:

Watch the recording