MIT/UW 2021 - Day 2

MIT and University of Washington Workshop on AI Implementations and Applications: ML Architecture, Systems and Programming Environments - Day 2

July 30, 2021 - 9am-3:30pm PDT

Organizers (left to right): Prof. Manya Ghobadi (MIT), Prof. Mohammad Alizadeh (MIT), and Prof. Arvind Krishnamurthy (UW).

Agenda: Videos of Presentations

9:00 am PDT: Welcome – Workshop Organizers

Session 1: Distributed Systems (Chair: Manya Ghobadi)

  9:05 - 9:50 (45 mins): Ion Stoica, UC Berkeley, "Ray: A Universal Framework for Distributed Computing"
  9:50 - 10:35 (45 mins) : Luis Ceze, University of Washington and OctoML, "Improving Model Performance, Portability and
  Productivity with Apache TVM and the Octomizer"
  10:35 - 11:00 (25 mins) : Ben Klenk, Nvidia, "Accelerated Computing Needs Accelerated Networks"

11:00 - 11:15 Break (Gather Online)

Session 2: Big Data (Chair: Arvind Krishnamurthy)

  11:15 - 12:00 (45 mins): Matei Zaharia, Stanford, "What’s Next in Infrastructure for ML?"
12:00 - 12:45 (45 mins): Vinod Kathail, Xilinx, "Innovative HW and SW Solutions for AI Acceleration"

12:45 - 1:30 Lunch break (Gather Online)

Session 3: Hardware (Chair: Mohammad Alizadeh)

1:30 - 2:15 (45 mins): Darius Bunandar, Lightmatter, "Accelerating Artificial Intelligence with Light"
2:15 - 3:00 (45 mins): Amar Phanishayee, Microsoft, "Project Fiddle: Fast and Efficient Infrastructure for Distributed Deep Learning"
  3:00 - 3:30 (30 mins): Derek Chickles, Marvell, "Introduction to the Octeon 10 DPU and Integrated Inferencing"

Abstracts and Bios (alphabetically listed by last name) - Please check back later for updates.

Darius Bunandar, Lightmatter, "Accelerating Artificial Intelligence with Light"
Abstract: Lightmatter is leading the evolution of computing to reduce its impact on our planet, while also enabling the next giant leaps in human progress. By unifying the unique properties of light as an ideal carrier of information, with the interoperability of electronics, Lightmatter creates photonic processors and interconnect that are faster, more efficient, and cooler than anything else on earth. At the core of Lightmatter’s technology is the use of silicon photonic devices that perform linear algebra calculations with light. Lightmatter recently secured a Series-B funding from Viking Global, Matrix Partners, Spark Capital, and GV (formerly Google Ventures). In this talk, I will present the insights that sparked our core technology, the evolution of our photonic platform, and how this platform can accelerate AI.
Bio: Darius Bunandar is a Founder of Lightmatter. As Chief Scientist at Lightmatter, Darius coordinates the company’s research and development in new hardware architectures as well as new algorithms. Before founding Lightmatter, Darius obtained his PhD in physics at MIT studying quantum computation and communications using compact nanophotonic circuits. His interest in computing began when he was simulating extreme physical phenomena using supercomputers. With the Caltech-Cornell Simulating-eXtreme-Spacetimes (SXS) collaboration, he developed software to visualize how binary black holes can distort the night sky by warping space-time with their gravitational pull. He also briefly worked on large-scale blast simulations and experiments at BakerRisk: an engineering consulting firm in San Antonio, Texas. He previously earned his BS degrees in both Physics and Mechanical Engineering from the University of Texas at Austin.

Prof. Luis Ceze, University of Washington and OctoML, "Improving Model Performance, Portability and
Productivity with Apache TVM and the Octomizer"
Abstract: There is an increasing need to bring machine learning to a diverse set of hardware devices. Current approaches typically rely on vendor-specific operator libraries and frameworks, and require significant engineering effort. In this talk we will present an overview of the Apache TVM open source stack, which exposes graph- and operator-level optimizations to provide performance portability for machine learning workloads across diverse hardware back-ends. TVM solves compiler optimization challenges by employing a learning-based approach for rapid exploration of optimizations, saving months of engineering time and offering state-of-the-art performance in both edge and server use cases. We will discuss how TVM offers broad model coverage, and makes effective use of hardware resources. We will end the talk with a sneak peek at OctoML's Octomizer, a SaaS platform for continuous model optimization, benchmarking, and packaging.
Bio: Luis Ceze is a Co-founder and CEO at OctoML and Professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington. His research focuses on the intersection between computer architecture, programming languages, machine learning and biology. His current research focus is on approximate computing for efficient machine learning and DNA-based data storage. He co-directs the Molecular Information Systems Lab (misl.bio) and the Systems and Architectures for Machine Learning lab (sampl.ai). He has co-authored over 100 papers in these areas, and had several papers selected as IEEE Micro Top Picks and CACM Research Highlights. His research has been featured prominently in the media including New York Times, Popular Science, MIT Technology Review, Wall Street Journal, among others. He is a recipient of an NSF CAREER Award, a Sloan Research Fellowship, a Microsoft Research Faculty Fellowship, the 2013 IEEE TCCA Young Computer Architect Award, the 2020 ACM SIGARCH Maurice Wilkes Award and UIUC Distinguished Alumni Award.

Derek Chickles, Marvell, "Introduction to the Octeon 10 DPU and Integrated Inferencing"
Abstract: With the shift from application-specific compute to data-centric compute a platform for processing this data inline, and in real-time. Marvell’s Octeon 10 DPU meets this need with best-in-class raw compute performance, packet acceleration, crypto processing, and now integrated inferencing for machine learning. This integration significantly reduces the I/O requirements and latency penalty associated with other inferencing methods.
Bio: Derek Chickles leads the machine learning software group at Marvell developing the toolchain, runtime drivers, and Marvell DPU-optimized models all co-designed with Marvell hardware. Outside of ML, his past work also includes development of Marvell’s LiquidIO™ SmartNIC platform. He earned his BS degree in Computer Science from University of Colorado at Boulder.

Vinod Kathail, Xilinx, "Innovative HW and SW Solutions for AI Acceleration"
Abstract: Over the last ten years or so, Machine Learning (ML) or Artificial Intelligence (AI) has transformed a diverse set of application areas, ranging from applications on the edge to applications on the cloud. To satisfy the ever-growing demand for ML/AI, there has been an explosion of innovative work on HW accelerators to meet the increased performance needs and on the high-level SW frameworks to enable data scientists to easily deploy ML-based solutions. In this talk, we will describe the innovative HW devices and SW programming tools for ML that Xilinx is building to address these industry trends.
Bio: Vinod Kathail is a Xilinx Fellow and Chief Architect for the Vitis SW Environment. At Xilinx, he initiated the software programmability effort resulting in Vitis and led the company-wide focus on embedded vision including the use of machine learning. Prior to Xilinx, Vinod was the founding CEO and later CTO of Synfora and a principal scientist at HP Labs. Vinod brings over 25 years of experience in heterogenous programming environments, parallel and VLIW architectures, parallelizing compilers, and high-level synthesis. Vinod received an ScD in Electrical Engineering and Computer Science from MIT. He holds over 25 patents, and he has authored numerous research publications.

Ben Klenk, Nvidia, "Accelerated Computing Needs Accelerated Networks"
Abstract: Artificial Intelligence has become ubiquitous and has proven itself to excel at many tasks from object and speech recognition, image classification, natural language processing and translation to chip design and physics simulation. Training AI networks is not only computational intensive; networks keep growing and computational intensity increases at a staggering rate. It is for that reason that GPUs have become the main processor for training. However, more and more GPUs are needed to keep up with the compute demands, rendering the interconnection network increasingly important. In this talk, I will present why we need to put an emphasis on the network and share some of our research to improve the network for AI training, including in-network computation, and emerging technologies such as silicon photonics.
Bio: Benjamin Klenk is a Sr. Research Scientist in NVIDIA’s Networking Research Group. He received his PhD in Computer Engineering from Heidelberg University, Germany. His research ranges from network interfaces and GPU-initiated communication mechanisms to network acceleration for deep learning and HPC applications. Benjamin authored several conference and workshop papers in the area of GPU networking, including a paper on in-network reductions for GPU networks that was presented at ISCA in 2020.

Amar Phanishayee, Microsoft, "Project Fiddle: Fast and Efficient Infrastructure for Distributed Deep Learning"
Abstract: The goal of Project Fiddle is to build efficient systems infrastructure for fast distributed DNN training. Our goal is to support 100x more efficient training. To achieve this goal, we take a broad view of training: from a single GPU, to multiple GPUs on a machine, all the way to training on large clusters. Our innovations cut across the systems stack: memory management, structuring parallel computation across GPUs and machines, speeding up communication between accelerators and across machines, optimizing the data ingest and output pipelines, and schedulers for DNN training on large multi-tenant clusters.  In this talk, I'll give you an overview of Project Fiddle and focus on a couple of recent ideas to speed up data loading (input) and checkpointing (output).
Bio: Amar Phanishayee is Sr. Principal Researcher at Microsoft Research in Redmond. The goal of his research is to enable the creation of high-performance and efficient networked systems for large-scale data-intensive computing. His research efforts center around radically rethinking the design of datacenter-based systems: from infrastructure for compute, storage, and networking to distributed systems and protocols that are scalable, robust to failures, and use resources efficiently. His recent focus has been on leading Project Fiddle at MSR (https://aka.ms/msr-fiddle).  Amar received his Ph.D. in Computer Science from Carnegie Mellon University in 2012. His research work has been recognized by awards such as the ACM SOSP Best Paper Award in 2009 and Carnegie Mellon's Allen Newell Award for Research Excellence.

Prof. Ion Stoica, UC Berkeley, "Ray: A Universal Framework for Distributed Computing"
Abstract: Distributed computing is becoming the norm. This trend is driven by the rapidly increasing gap between the computational requirements of machine learning and data applications, on one hand, and the capabilities of a single sever, on the other hand. Unfortunately, building distributed applications today is extremely hard. Ray aims to make it as easy to program a cluster of machines as it is to program a laptop.
Bio: Ion Stoica is a Professor in the EECS Department at the University of California at Berkeley, and the Director of RISELab (https://rise.cs.berkeley.edu/). He is currently doing research on cloud computing and AI systems. Past work includes Apache Spark,
Apache Mesos, Tachyon, Chord DHT, and Dynamic Packet State (DPS). He is an ACM Fellow and has received numerous awards, including the Mark Weiser Award (2019), SIGOPS Hall of Fame Award (2015), and several test of times awards. He also co-founded three companies, Anyscale (2019), Databricks (2013) and Conviva (2006).

Prof. Matei Zaharia, Stanford, "What’s Next in Infrastructure for ML?"
Abstract: Building production ML applications is expensive and difficult because of their computational cost, data cost, and complex failure modes. I’ll discuss these challenges from two perspectives: the Stanford DAWN lab and experience with commercial ML customers at Databricks. I’ll cover two emerging ideas to help address these challenges. The first is software development platforms tailored for ML, often called “ML platforms”, that standardize the interfaces used in ML applications to make them easier to build and maintain. I’ll give a few examples, including the open source MLflow system from Databricks and the Model Assertions abstraction we developed at Stanford. The second idea is model designs that are more “infrastructure-friendly” and “ops-friendly” by design. As a concrete example, I will discuss retrieval-oriented NLP models such as Stanford's ColBERT that query documents from a corpus to perform tasks such as question-answering, which gives multiple advantages, including low computational cost, high interpretability, and very low-cost updates to the model’s “knowledge”. These models are an exciting alternative to large language models such as GPT-3.
Bio: Matei Zaharia is an Assistant Professor of Computer Science at Stanford University and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley in 2009, and has worked broadly in datacenter systems, co-starting the Apache Mesos project and contributing as a committer on Apache Hadoop. Today, Matei tech-leads the MLflow development effort at Databricks in addition to other aspects of the platform. Matei’s research work was recognized through the 2014 ACM Doctoral Dissertation Award for the best PhD dissertation in computer science, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE).