Approaching System Reliability in the AI Era

#reliability #electronics #packaging #AI #systems #liquid #cooling
Share

-- complex systems, model-based, testing, liquid cooling, power delivery ...


Ensuring hardware system reliability is increasingly critical in the evolving AI landscape, particularly within data centers. Drawing upon extensive experience leading reliability initiatives for cutting-edge hardware, this presentation will outline a general methodology for designing reliable complex AI systems. It will emphasize the necessity of a multidisciplinary approach, integrating model-based system engineering, rigorous reliability testing, and continuous system improvements, as exemplified by advancements in liquid cooling and power delivery technologies for high-performance AI processors. The talk will focus on the reliability approach needed for resilience in complex, AI-driven environments.



  Date and Time

  Location

  Hosts

  Registration



  • Date: 26 Jun 2025
  • Time: 07:00 PM UTC to 08:00 PM UTC
  • Add_To_Calendar_icon Add Event to Calendar
If you are not a robot, please complete the ReCAPTCHA to display virtual attendance info.
  • Contact Event Hosts
  • Starts 18 May 2025 03:53 PM UTC
  • Ends 26 June 2025 08:00 PM UTC
  • No Admission Charge


  Speakers

Venkata Chivukula of Microsoft Corp.

Biography:

Venkata Chivukula is a Senior System Technology Engineer at Microsoft, specializing in liquid cooling and data center power and cooling system innovation within the Cloud and AI organization. Prior to Microsoft, he was a Senior System Reliability Engineer at Google, leading the development of liquid-cooled TPUs (including v5e and Trillium), GPU systems (A100, H100, GB200), and advanced power delivery technologies over five years. His earlier experience includes technical roles at Qualcomm, Bosch, GlobalFoundries, and Intel, focusing on fingerprint sensors, MEMS microphones, RF modules, and CMOS process technology. He is an IEEE Senior Member with a PhD in Electrical Engineering from Rensselaer Polytechnic Institute and has authored over 30 journal papers and 10 conference publications on MEMS, acoustic sensors, and vertical power technology, earning multiple best paper awards. His awards include the Google Tech Impact Award, Feats of Engineering, Cloud Impact Awards, Qualcomm’s Qual Star Award, Bosch Quality Prize, and the Qorvo Innovation Award.

Address:United States