The IAP UW Workshop on the Future of AI in the Cloud will be conducted on Friday, May 9, 2025 at UW.
Venue: Husky Union Building, Room 334, UW, Seattle, WA
Time: 8:30AM–4PM (Badge Pick-up at 8:30AM – Advance Registration is Required- Please register here.
This workshop is co-organized by Prof. Stephanie Wang and the IAP.
Venue: Husky Union Building, Room 334, UW, Seattle, WA
Time: 8:30AM–4PM (Badge Pick-up at 8:30AM – Advance Registration is Required- Please register here.
This workshop is co-organized by Prof. Stephanie Wang and the IAP.
Agenda – Please see the Speaker Abstracts and Bios below, along with Testimonials from previous Workshops. Please check back later for additional speakers and updates.
8:30-8:55 – Badge Pick-up – Coffee/Tea and Breakfast Food/Snacks
8:55-9:00 – Welcome – Prof. Stephanie Wang, University of Washington
9:00-9:30 – Keynote: Dr. Ricardo Bianchini, Technical Fellow and Corporate Vice President at Microsoft, “Challenges and opportunities in datacenter power and sustainability in the AI era”
9:30-10:00 – Prof. Ratul Mahajan, University of Washington, "Application-defined networking”
10:00-10:30 – Dr. Ulf Hanebutte, Distinguished Engineer, Marvell, "Towards a flexible infrastructure supporting diverse AI workloads of today and tomorrow”
10:30-11:00 – Prof. Arvind Krishnamurthy, Short-Dooley Professor in Computer Science & Engineering
11:00-11:30 – Lightning Session for Student Posters
11:30-12:30 – Lunch and Poster Viewing
12:30-1:00 – Keynote: Dr. Vinod Grover, Senior Distinguished Engineer, Nvidia, "The Essence of CUDA C++ : Past, Present, and Future"
1:00-1:30 – Prof. Stephanie Wang, Assistant Professor in Computer Science & Engineering
1:30-2:00 – Prof. Natasha Jaques, Assistant Professor in Computer Science & Engineering
2:00-2:30 – Dr. Brad Beckmann, Fellow in Research and Advanced Development, AMD, "Advancing Energy Efficient AI Communication"
2:30-3:00 – Prof. Baris Kasikci, Associate Professor in Computer Science & Engineering
3:00-4:00 – Best Poster Award and Reception
ABSTRACTS and BIOS (alphabetical order by last name)
Dr. Brad Beckmann, Fellow in Research and Advanced Development, AMD, "Advancing Energy Efficient AI Communication."
Abstract: Reducing power consumption is the dominant challenge for ML system designs. AMD has achieved tremendous scalability in accelerator throughput by leveraging chiplet technology, but this improvement is not free. Much like the rise of multi-core processors two decades ago required software to embrace multi-threaded programming to achieve high performance, tomorrow’s processors will force software to optimize for intra-chip locality to achieve high performance. This talk will highlight how to partition future GPU programs within the chip for power efficiency and how to optimize the subsequent collective communication for the on-chip memory hierarchy.
Bio: Brad Beckmann is a Fellow in AMD Research and Advanced Development group. Brad leads a team of researcher pursuing next-generation hardware and software technologies for scale-up/scale-out GPU networking. Brad joined AMD in 2007 and has led projects innovating in GPU memory consistency models, GPU cache coherence, simulation, and on-chip networks. He also co-led the initial development and release of the gem5 simulator in 2011. He has published over 30 conference and journal papers and co-authored over 40 granted patents. Prior to AMD, Brad was a software developer for Microsoft’s Windows Server Performance team. Brad has a PhD in Computer Science from University of Wisconsin-Madison.
Dr. Ricardo Bianchini, Technical Fellow and Corporate Vice President, Microsoft, "Challenges and opportunities in datacenter power and sustainability in the AI era.”
Abstract: As society's interest in generative AI models and their capabilities continues to soar, we are witnessing an unprecedented surge in compute demand. This surge is stressing every aspect of the cloud ecosystem at a time when hyperscale providers are striving to become carbon-neutral. In this talk, I will address the challenges in managing the power, energy, and sustainability of this expanding AI infrastructure. I will also quickly overview some of my team's early efforts to tackle these challenges and explore potential research avenues going forward. Ultimately, we will need a large research and development effort to create a more sustainable and efficient future for AI.
Short bio: Dr. Ricardo Bianchini is a Technical Fellow and Corporate Vice President at Microsoft Azure, where he leads the team responsible for managing Azure’s Compute workload, server capacity, and datacenter infrastructure with a strong focus on efficiency and sustainability. Before joining Azure in 2022, Ricardo led the Systems Research Group and the Cloud Efficiency team at Microsoft Research (MSR). During his tenure at MSR, he created research projects in power efficiency and intelligent resource management that resulted in large-scale production systems across Microsoft. Prior to joining Microsoft in 2014, he was a Professor at Rutgers University, where he conducted research in datacenter power and energy management, cluster-based systems, and other cloud-related topics. Ricardo is a Fellow of both the ACM and IEEE.
Dr. Vinod Grover, Senior Distinguished Engineer, Nvidia, "The Essence of CUDA C++ : Past, Present, and Future"
Abstract: CUDA began as a way to harness GPU power for general-purpose computing. Over time, NVIDIA developed a vision of virtualized GPU architecture, built around C++ integration and the SIMT programming model. This approach enabled breakthroughs in high-performance computing and deep learning, culminating in innovations like Tensor Cores. Looking ahead, CUDA is evolving toward tile-based programming and mega-kernels, with large language models (LLMs) assisting in development for distributed systems.
Bio: Vinod Grover is a Senior Distinguished Engineer at NVIDIA. Since 2007, he has led the development of CUDA C++, a foundational technology for GPU programming. His recent work focuses on improving performance and productivity in deep learning using language and compiler technologies. Prior to NVIDIA, he held roles at Sun Microsystems and Microsoft. Vinod earned a B.S. in Physics from IIT Delhi and an M.S. in Computer Science from Syracuse University.
Dr. Ulf Hanebutte, Distinguished Engineer, Marvell, "Towards a flexible infrastructure supporting diverse AI workloads of today and tomorrow.”
Abstract: Accelerating AI workload of today while enabling a flexible infrastructure that provides opportunities to accelerate the AI workloads of tomorrow is a fundamental computer science challenge. To this end, concepts like Near-Memory-Computing and Data Acceleration and Offload, not long ago only research, are now product offerings. This talk will explore CXL based Near-Memory-Compute Accelerators in the context of AI workloads and provide an introduction to Marvell’s DAO (Data Acceleration Offload) high-performance open-source solution framework and the recently established DAO research facility to foster academic research and collaborations.
Bio: Dr. Ulf Hanebutte is a Distinguished Engineer at Marvell with focus on HW/SW co-design within AI/ML architecture. In this role he has contributed to multiple generations of ML inference accelerator HW and their SW stacks. Collaborating and solving big problems together has marked his extensive career, both at the National Labs and in the private sector, with projects ranging from HPC at Exa-scale to IoT for energy efficient buildings. He holds a Ph.D. from Northwestern University and a Dipl. Ing. in Aero Space Engineering from the University of Stuttgart.
Prof. Ratul Mahajan, University of Washington, "Application-defined networking.”
Abstract: Many new physical and virtual networks are built today to serve a handful of known applications, unlike the Internet which was built to support unknown applications. We argue that the implementation of such networks should be completely application-specific and not layered on top of general-purpose network abstractions from the Internet age. Such layering tends to more than double the latency of applications or makes it difficult to support application-specific handling.
We propose application-defined networking in which application developers specify network functionality in a high-level language and a controller generates a custom distributed implementation that runs across available hardware and software resources. We have instantiated this approach for microservices and service meshes. Our language can express common application network functions in only 7-28 lines of code, and the generated implementation lowers RPC processing latency by up to 82%.
Bio: Ratul Mahajan is an Associate Professor at the University of Washington (Paul G. Allen School of Computer Science). He is also the co-director of UW FOCI (Future of Cloud Infrastructure) and an Amazon Scholar. Prior to that, he was a Co-founder and CEO of Intentionet (acquired by Amazon), a company that pioneered intent-based networking and network verification, and a Principal Researcher at Microsoft Research.
Ratul is a computer systems researcher with a networking focus and has worked on a broad set of topics, including network verification, connected homes, network programming, optical networks, Internet routing and measurements, and mobile systems. He has published over fifty papers in top venues such as SIGCOMM, SOSP, MobiCom, CHI, and PLDI, and many of the technologies that he has helped develop are part of real-world systems at Microsoft and other companies.
Ratul has been recognized as an ACM Distinguished Scientist, an ACM SIGCOMM Rising Star, and a Microsoft Research Graduate Fellow. His papers have won the ACM SIGCOMM Test-of-Time Award, the IEEE William R. Bennett Prize, the ACM SIGCOMM Best Paper Awards (twice), and the HVC Best Paper Award. He got his PhD at the University of Washington and B.Tech at Indian Institute of Technology, Delhi, both in Computer Science and Engineering.
Testimonials from Previous Workshops
Professor David Patterson, the Pardee Professor of Computer Science, UC Berkeley, Turing Award Laureate, “I saw strong participation at the Cloud Workshop, with some high energy and enthusiasm; and I was delighted to see industry engineers bring and describe actual hardware, representing some of the newest innovations in the data center.”
Professor Christos Kozyrakis, Professor of Electrical Engineering & Computer Science, Stanford University, “As a starting point, I think of these IAP workshops as ‘Hot Chips meets ISCA’, i.e., an intersection of industry’s newest solutions in hardware (Hot Chips) with academic research in computer architecture (ISCA); but more so, these workshops additionally cover new subsystems and applications, and in a smaller venue where it is easy to discuss ideas and cross-cutting approaches with colleagues.”
Professor Hakim Weatherspoon, Professor of Computer Science, Cornell University, “I have participated in three IAP Workshops since the first one at Cornell in 2013 and it is great to see that the IAP premise was a success now as it was then, bringing together industry and academia in a focused workshop and an all-day exchange of ideas. It was a fantastic experience and I look forward to the next IAP Workshop.”
Professor Ken Birman, the N. Rama Rao Professor of Computer Science, Cornell University, “I actually thought it was a fantastic workshop, an unquestionable success, starting from the dinner the night before, through the workshop itself, to the post-event reception for the student Best Poster Awards.”
Dr. Carole-Jean Wu, Research Scientist, AI Infrastructure, Meta Research, and Professor of CSE, Arizona State University, “The IAP Cloud Computing workshop provides a great channel for valuable interactions between faculty/students and the industry participants. I truly enjoyed the venue learning about research problems and solutions that are of great interest to Meta, as well as the new enabling technologies from the industry representatives. The smaller venue and the poster session fostered an interactive environment for in-depth discussions on the proposed research and approaches and sparked new collaborative opportunities. Thank you for organizing this wonderful event! It was very well run.”
Nathan Pemberton, PhD student, UC Berkeley (currently Applied Scientist at AWS), "IAP workshops provide a valuable chance to explore emerging research topics with a focused group of participants, and without all the time/effort of a full-scale conference. Instead of rushing from talk to talk, you can slow down and dive deep into a few topics with experts in the field."
Dr. Pankaj Mehra, VP Product Planning, Samsung (currently Professor at Ohio State University and Founder at Elephance Memory), "Terrifically organized Workshops that give all parties -- students, faculty, industry -- valuable insights to take back"
Professor Vishal Shrivastav, Purdue University, “Attending the IAP workshops as a PhD student at Cornell was a great experience and very rewarding. I really enjoyed the many amazing talks from both the industry and academia. My personal conversations with several industry leaders at the workshop will definitely guide some of my future research."
Professor Ana Klimovic, ETH Zurich, “I attended three IAP workshops as a PhD student at Stanford, and I am consistently impressed by the quality of the talks and the breadth of the topics covered. These workshops bring top-tier industry and academia together to discuss cutting-edge research challenges. It is a great opportunity to exchange ideas and get inspiration for new research opportunities."
Dr. Richard New, VP Research, Western Digital, “IAP workshops provide a great opportunity to meet with professors and students working at the cutting edge of their fields. It was a pleasure to attend the event – lots of very interesting presentations and posters.”
8:30-8:55 – Badge Pick-up – Coffee/Tea and Breakfast Food/Snacks
8:55-9:00 – Welcome – Prof. Stephanie Wang, University of Washington
9:00-9:30 – Keynote: Dr. Ricardo Bianchini, Technical Fellow and Corporate Vice President at Microsoft, “Challenges and opportunities in datacenter power and sustainability in the AI era”
9:30-10:00 – Prof. Ratul Mahajan, University of Washington, "Application-defined networking”
10:00-10:30 – Dr. Ulf Hanebutte, Distinguished Engineer, Marvell, "Towards a flexible infrastructure supporting diverse AI workloads of today and tomorrow”
10:30-11:00 – Prof. Arvind Krishnamurthy, Short-Dooley Professor in Computer Science & Engineering
11:00-11:30 – Lightning Session for Student Posters
11:30-12:30 – Lunch and Poster Viewing
12:30-1:00 – Keynote: Dr. Vinod Grover, Senior Distinguished Engineer, Nvidia, "The Essence of CUDA C++ : Past, Present, and Future"
1:00-1:30 – Prof. Stephanie Wang, Assistant Professor in Computer Science & Engineering
1:30-2:00 – Prof. Natasha Jaques, Assistant Professor in Computer Science & Engineering
2:00-2:30 – Dr. Brad Beckmann, Fellow in Research and Advanced Development, AMD, "Advancing Energy Efficient AI Communication"
2:30-3:00 – Prof. Baris Kasikci, Associate Professor in Computer Science & Engineering
3:00-4:00 – Best Poster Award and Reception
ABSTRACTS and BIOS (alphabetical order by last name)
Dr. Brad Beckmann, Fellow in Research and Advanced Development, AMD, "Advancing Energy Efficient AI Communication."
Abstract: Reducing power consumption is the dominant challenge for ML system designs. AMD has achieved tremendous scalability in accelerator throughput by leveraging chiplet technology, but this improvement is not free. Much like the rise of multi-core processors two decades ago required software to embrace multi-threaded programming to achieve high performance, tomorrow’s processors will force software to optimize for intra-chip locality to achieve high performance. This talk will highlight how to partition future GPU programs within the chip for power efficiency and how to optimize the subsequent collective communication for the on-chip memory hierarchy.
Bio: Brad Beckmann is a Fellow in AMD Research and Advanced Development group. Brad leads a team of researcher pursuing next-generation hardware and software technologies for scale-up/scale-out GPU networking. Brad joined AMD in 2007 and has led projects innovating in GPU memory consistency models, GPU cache coherence, simulation, and on-chip networks. He also co-led the initial development and release of the gem5 simulator in 2011. He has published over 30 conference and journal papers and co-authored over 40 granted patents. Prior to AMD, Brad was a software developer for Microsoft’s Windows Server Performance team. Brad has a PhD in Computer Science from University of Wisconsin-Madison.
Dr. Ricardo Bianchini, Technical Fellow and Corporate Vice President, Microsoft, "Challenges and opportunities in datacenter power and sustainability in the AI era.”
Abstract: As society's interest in generative AI models and their capabilities continues to soar, we are witnessing an unprecedented surge in compute demand. This surge is stressing every aspect of the cloud ecosystem at a time when hyperscale providers are striving to become carbon-neutral. In this talk, I will address the challenges in managing the power, energy, and sustainability of this expanding AI infrastructure. I will also quickly overview some of my team's early efforts to tackle these challenges and explore potential research avenues going forward. Ultimately, we will need a large research and development effort to create a more sustainable and efficient future for AI.
Short bio: Dr. Ricardo Bianchini is a Technical Fellow and Corporate Vice President at Microsoft Azure, where he leads the team responsible for managing Azure’s Compute workload, server capacity, and datacenter infrastructure with a strong focus on efficiency and sustainability. Before joining Azure in 2022, Ricardo led the Systems Research Group and the Cloud Efficiency team at Microsoft Research (MSR). During his tenure at MSR, he created research projects in power efficiency and intelligent resource management that resulted in large-scale production systems across Microsoft. Prior to joining Microsoft in 2014, he was a Professor at Rutgers University, where he conducted research in datacenter power and energy management, cluster-based systems, and other cloud-related topics. Ricardo is a Fellow of both the ACM and IEEE.
Dr. Vinod Grover, Senior Distinguished Engineer, Nvidia, "The Essence of CUDA C++ : Past, Present, and Future"
Abstract: CUDA began as a way to harness GPU power for general-purpose computing. Over time, NVIDIA developed a vision of virtualized GPU architecture, built around C++ integration and the SIMT programming model. This approach enabled breakthroughs in high-performance computing and deep learning, culminating in innovations like Tensor Cores. Looking ahead, CUDA is evolving toward tile-based programming and mega-kernels, with large language models (LLMs) assisting in development for distributed systems.
Bio: Vinod Grover is a Senior Distinguished Engineer at NVIDIA. Since 2007, he has led the development of CUDA C++, a foundational technology for GPU programming. His recent work focuses on improving performance and productivity in deep learning using language and compiler technologies. Prior to NVIDIA, he held roles at Sun Microsystems and Microsoft. Vinod earned a B.S. in Physics from IIT Delhi and an M.S. in Computer Science from Syracuse University.
Dr. Ulf Hanebutte, Distinguished Engineer, Marvell, "Towards a flexible infrastructure supporting diverse AI workloads of today and tomorrow.”
Abstract: Accelerating AI workload of today while enabling a flexible infrastructure that provides opportunities to accelerate the AI workloads of tomorrow is a fundamental computer science challenge. To this end, concepts like Near-Memory-Computing and Data Acceleration and Offload, not long ago only research, are now product offerings. This talk will explore CXL based Near-Memory-Compute Accelerators in the context of AI workloads and provide an introduction to Marvell’s DAO (Data Acceleration Offload) high-performance open-source solution framework and the recently established DAO research facility to foster academic research and collaborations.
Bio: Dr. Ulf Hanebutte is a Distinguished Engineer at Marvell with focus on HW/SW co-design within AI/ML architecture. In this role he has contributed to multiple generations of ML inference accelerator HW and their SW stacks. Collaborating and solving big problems together has marked his extensive career, both at the National Labs and in the private sector, with projects ranging from HPC at Exa-scale to IoT for energy efficient buildings. He holds a Ph.D. from Northwestern University and a Dipl. Ing. in Aero Space Engineering from the University of Stuttgart.
Prof. Ratul Mahajan, University of Washington, "Application-defined networking.”
Abstract: Many new physical and virtual networks are built today to serve a handful of known applications, unlike the Internet which was built to support unknown applications. We argue that the implementation of such networks should be completely application-specific and not layered on top of general-purpose network abstractions from the Internet age. Such layering tends to more than double the latency of applications or makes it difficult to support application-specific handling.
We propose application-defined networking in which application developers specify network functionality in a high-level language and a controller generates a custom distributed implementation that runs across available hardware and software resources. We have instantiated this approach for microservices and service meshes. Our language can express common application network functions in only 7-28 lines of code, and the generated implementation lowers RPC processing latency by up to 82%.
Bio: Ratul Mahajan is an Associate Professor at the University of Washington (Paul G. Allen School of Computer Science). He is also the co-director of UW FOCI (Future of Cloud Infrastructure) and an Amazon Scholar. Prior to that, he was a Co-founder and CEO of Intentionet (acquired by Amazon), a company that pioneered intent-based networking and network verification, and a Principal Researcher at Microsoft Research.
Ratul is a computer systems researcher with a networking focus and has worked on a broad set of topics, including network verification, connected homes, network programming, optical networks, Internet routing and measurements, and mobile systems. He has published over fifty papers in top venues such as SIGCOMM, SOSP, MobiCom, CHI, and PLDI, and many of the technologies that he has helped develop are part of real-world systems at Microsoft and other companies.
Ratul has been recognized as an ACM Distinguished Scientist, an ACM SIGCOMM Rising Star, and a Microsoft Research Graduate Fellow. His papers have won the ACM SIGCOMM Test-of-Time Award, the IEEE William R. Bennett Prize, the ACM SIGCOMM Best Paper Awards (twice), and the HVC Best Paper Award. He got his PhD at the University of Washington and B.Tech at Indian Institute of Technology, Delhi, both in Computer Science and Engineering.
Testimonials from Previous Workshops
Professor David Patterson, the Pardee Professor of Computer Science, UC Berkeley, Turing Award Laureate, “I saw strong participation at the Cloud Workshop, with some high energy and enthusiasm; and I was delighted to see industry engineers bring and describe actual hardware, representing some of the newest innovations in the data center.”
Professor Christos Kozyrakis, Professor of Electrical Engineering & Computer Science, Stanford University, “As a starting point, I think of these IAP workshops as ‘Hot Chips meets ISCA’, i.e., an intersection of industry’s newest solutions in hardware (Hot Chips) with academic research in computer architecture (ISCA); but more so, these workshops additionally cover new subsystems and applications, and in a smaller venue where it is easy to discuss ideas and cross-cutting approaches with colleagues.”
Professor Hakim Weatherspoon, Professor of Computer Science, Cornell University, “I have participated in three IAP Workshops since the first one at Cornell in 2013 and it is great to see that the IAP premise was a success now as it was then, bringing together industry and academia in a focused workshop and an all-day exchange of ideas. It was a fantastic experience and I look forward to the next IAP Workshop.”
Professor Ken Birman, the N. Rama Rao Professor of Computer Science, Cornell University, “I actually thought it was a fantastic workshop, an unquestionable success, starting from the dinner the night before, through the workshop itself, to the post-event reception for the student Best Poster Awards.”
Dr. Carole-Jean Wu, Research Scientist, AI Infrastructure, Meta Research, and Professor of CSE, Arizona State University, “The IAP Cloud Computing workshop provides a great channel for valuable interactions between faculty/students and the industry participants. I truly enjoyed the venue learning about research problems and solutions that are of great interest to Meta, as well as the new enabling technologies from the industry representatives. The smaller venue and the poster session fostered an interactive environment for in-depth discussions on the proposed research and approaches and sparked new collaborative opportunities. Thank you for organizing this wonderful event! It was very well run.”
Nathan Pemberton, PhD student, UC Berkeley (currently Applied Scientist at AWS), "IAP workshops provide a valuable chance to explore emerging research topics with a focused group of participants, and without all the time/effort of a full-scale conference. Instead of rushing from talk to talk, you can slow down and dive deep into a few topics with experts in the field."
Dr. Pankaj Mehra, VP Product Planning, Samsung (currently Professor at Ohio State University and Founder at Elephance Memory), "Terrifically organized Workshops that give all parties -- students, faculty, industry -- valuable insights to take back"
Professor Vishal Shrivastav, Purdue University, “Attending the IAP workshops as a PhD student at Cornell was a great experience and very rewarding. I really enjoyed the many amazing talks from both the industry and academia. My personal conversations with several industry leaders at the workshop will definitely guide some of my future research."
Professor Ana Klimovic, ETH Zurich, “I attended three IAP workshops as a PhD student at Stanford, and I am consistently impressed by the quality of the talks and the breadth of the topics covered. These workshops bring top-tier industry and academia together to discuss cutting-edge research challenges. It is a great opportunity to exchange ideas and get inspiration for new research opportunities."
Dr. Richard New, VP Research, Western Digital, “IAP workshops provide a great opportunity to meet with professors and students working at the cutting edge of their fields. It was a pleasure to attend the event – lots of very interesting presentations and posters.”