Stanford/UCSC 2018

The Stanford-UCSC Workshop on the Future of Cloud Computing was organized by Prof. Christos Kozyrakis and Prof. Heiner Litz, and conducted on Tuesday December 11, 2018 on the UC Santa Cruz Si Valley campus in Santa Clara, CA.

Agenda - Videos of Presentations

8:00-8:30AM Badge Pick-up – Coffee/Tea and Breakfast Food/Snacks

8:30-8:40AM  Welcome – Prof. Christos Kozyrakis, Stanford and Prof. Heiner Litz, UCSC

8:40-9:00AM Ana Klimovic, Stanford, “Pocket: Elastic Ephemeral Storage for Serverless Analytics”

9:00-9:20AM Heiner Litz, UCSC, “Making Storage Programmable with OpenChannel SSDs”

9:20-9:40AM  Ethan Miller UCSC, “Rethinking the Operating System Stack for Byte-Addressable Non-Volatile Memories"

9:40-10:00AM  Allen Samuels, Western Digital, “Unleashing Innovation from the Core to the Edge"

10:00-10:20AM  Pankaj Mehra, Samsung, “Introducing Samsung SmartSSD: From Devices to Subsystems”

10:20-10:40AM  Lightning Round of Student Posters

10:40-12:30PM Poster Viewing and Networking Lunch

12:30-12:50PM Arvind Krishnamurthy, University of Washington, “Programming the Data Plane”

12:50-1:10PM Kishore Atreya, Marvell, “Scaling Storage Efficiently for Composable Infrastructure Environments”

1:10-1:30PM Jose Renau, UCSC, “Live Hardware Development at UCSC”

1:30-1:50PM Sam Bayliss and Ralph Wittig, Xilinx, “Xilinx 7nm Versal Family with AI Engine Processing Fabric”

1:50-2:10PM Gabriel Southern, Intel, "Accelerating Cloud Applications with Intel Xeon CPUs and FPGAs"

2:10-2:30PM Michael Papamichael, Microsoft, “Catapult and Brainwave: Powering a Configurable Intelligent Cloud”

2:30-2:50PM BREAK - Refreshments and Poster Viewing

2:50-3:10PM Elton de Souza, IBM, “Farm to Table Supply Chain Provenance on IBM LinuxONE: Blockchain, Analytics, Databases, Security”

3:10-3:30PM Fazil Osman, Broadcom, “Learnings on Datacenter Disaggregation from Stingray SoC”

3:30-3:50PM Oksana Tkachuk, Amazon Web Services, “Automated Reasoning about Security of Amazon Web Services”

3:50-4:10PM Muhammad Shahbaz, Stanford, “Elmo: Source-Routed Multicast for Cloud Services”

4:10-4:30PM Rob Currie, UCSC, “Cross Border Lambdas in Pediatric, Hereditary and Consumer Genomics”

4:30-4:50PM Keith Winstein, Stanford, "Tiny Functions for Codecs, Protocols, Compilation, and (maybe) Soon Everything."

4:50-5:30PM Reception - Refreshments and Poster Awards

Abstracts and Bios

Kishore Atreya, Marvell, “Scaling Storage Efficiently for Composable Infrastructure Environments”
Abstract: As the world moves to a more connected way of life, applications that drive our day to day have started to require a large amount of data and computation. Data centers are growing rapidly and are increasing in costs both capital and operational. It’s highly desirable to scale resources based on application workload dynamically versus statically leading to a more Composable computation infrastructure. Composable infrastructure allows for operators to purchase storage, and compute independently of each other while presenting a single virtual machine interface for an application to run.

Composable infrastructure relies on disaggregated storage and compute in order to be cost effective and to provide scale. Marvell has introduced the concept of an IP SSD NVMeoF drive with its 88SN2400 part allowing for reduced power and reduced cost storage enclosures working in conjunction with Software Defined Storage systems to provide an efficient scaled out storage for data center sized composable infrastructure.
Bio: Kishore graduated with a BS in CMPE from Georgia Tech and an MS in ECE from Santa Clara. He was an initial member of the Cavium Xpliant team who built the first production programmable switch on the market with XP80. He worked as a software architect and project lead for next generation Xpliant devices prior to the merger, currently working as a software product manager in the Networking BU at Marvell Semiconductor with a focus on next generation applications including composable infrastructure and edge data centers.

Sam Bayliss and Ralph Wittig, Xilinx, “Xilinx 7nm Versal Family with AI Engine Processing Fabric”
Abstract: Versal, a combination of the words “versatile” and “universal”, is a new heterogeneous computing category that offers a broad range of architectures under the same umbrella, and has at its foundation programmable hardware and software. Versal comes at a time of immense change driven by such trends as AI; the rapid growth of data and the need to collect, store and analyze it; mobility; and 5G networking that require an adaptive compute environment that can quickly address the changing dynamics in the industry.

Bio: Sam Bayliss is a Principal Engineer and Ralph Wittig is a Fellow at Xilinx; both work in the Xilinx Office of the CTO. They led the advanced development behind the AI Engine processing fabric.

Rob Currie, UCSC, “Cross Border Lambdas in Pediatric, Hereditary and Consumer Genomics”
Abstract: Containers have revolutionized deploying complex stacks of code within and between data centers effectively scaling how a lambda can migrate around the web. The genomics community has adopted containers to package very complex bioinformatics pipelines toward reproducible science as well as for provenance in clinical settings. The Treehouse Childhood Cancer initiative has used this method to analyze pediatric data at three hospitals - two of which in Canada to solve the complex privacy issues involved in cross border collaboration involving human subject data. The same approach is being piloted in the BRCA Challenge to analyze the large bodies of data, both genomic and clinical, siloed in the health system and increasingly in consumers hands towards eliminating variants of unknown significance while preserving privacy.

Bio: Rob Currie is CTO of the UCSC Genomics Institute. He has over 25 years of experience with Silicon Valley early stage technology companies including executive positions with Universal Audio, Dash Navigation/Blackberry, Strangeberry/TiVo, Marimba/BMC and Digidesign/Avid. Rob received a BS in EECS from UC Berkeley and an MBA from University of Chicago Booth.

Elton de Souza, IBM, “Farm to Table Supply Chain Provenance on IBM LinuxONE: Blockchain, Analytics, Databases, Security”
Abstract: In this session, we will cover the IBM Food Trust solution deployed by IBM for the provenance in the food supply chain in production. Wider adoption of this solution would have mitigated the recent outbreak of lettuce tainted with Escherichia coli aka E. Coli across USA. We will cover challenges for Food Trust and general production Blockchain solutions such as blockchain selection, cross-chain compatibility, scalability, security, operations, devOps, applications of graph-theory, modularity in the private, hybrid and public cloud space and applicability to general Blockchain problems.

Bio: Elton is the Chief Architect for two of IBM’s Innovation Labs based out of New York. He earned his BASc in Computer Engineering from the University of Toronto, with an emphasis on software engineering and networking. Other technical interests include pharmacokinetics and pharmacodynamics of modern cancer and diabetes medication, consumable quantum computing and applications of DNA in next-generation storage.

Ana Klimovic, Stanford,  “Pocket: Elastic Ephemeral Storage for Serverless Analytics”
Abstract: Serverless computing is becoming increasingly popular, enabling users to quickly launch thousands of short-lived tasks in the cloud with high elasticity and fine-grain billing. While these properties make serverless computing appealing for interactive data analytics, a key challenge is managing intermediate data shared between tasks. Since communicating directly between short-lived serverless tasks is difficult, the natural approach is to store such ephemeral data in a common remote data store. However, existing storage systems are not designed to meet the demands of serverless applications in terms of elasticity, performance and cost.  We present Pocket, a distributed, elastic, multi-media data store that automatically scales resources to provide applications with desired I/O performance at low cost. Pocket dynamically rightsizes resources across multiple dimensions (storage capacity, network bandwidth, CPU cores) and leverages different storage technologies to minimize the cost of each job's resource allocation while ensuring jobs are not bottlenecked on I/O. We show that Pocket achieves similar performance to ElastiCache Redis for serverless analytics applications while reducing cost by almost 60%.

Bio: Ana Klimovic is a final year Ph.D. student at Stanford University, advised by Professor Christos Kozyrakis. Her research interests are in computer systems and computer architecture. She is particularly interested in building high performance, resource efficient storage and computing systems for large-scale datacenters. Before coming to Stanford, Ana graduated from the Engineering Science program at the University of Toronto. Ana is a Microsoft Research Ph.D. Fellow, Stanford Graduate Fellow and Accel Innovation Scholar.

Arvind Krishnamurthy, University of Washington, “Programming the Data Plane”
Abstract: Emerging networking architectures are allowing for flexible and reconfigurable packet processing at line rate both at the NIC and the switch. These emerging technologies address a key limitation with software defined networking solutions such as OpenFlow, which allow for custom handling of flows only as part of the control plane. Many network protocols, such as those that perform resource allocation, require per-packet processing, which is feasible only if the data plane can be customized to the needs of the protocol. These new technologies thus have the potential to address this limitation and truly enable a "Software Defined Data Plane" that provides greater performance and isolation for datacenter applications.
Despite their promising new functionality, intelligent NICs and flexible switches are not all-powerful; they have limited state, support limited types of operations, and limit per-packet computation in order to be able to operate at line rate. Our work addresses these limitations and helps accelerate both distributed systems and network protocols within the datacenter. Our work thus represents a first step towards developing an understanding as to what is required of a NICs and switches in order to enable data plane programmability.

Bio: Arvind Krishnamurthy is a Professor of Computer Science and Engineering at the University of Washington. His research interests span all aspects of building practical and robust computer systems, and he has worked on projects that improve the robustness, security, and performance of Internet-scale systems. A recent focus of his work has been to develop ways to dramatically improve the performance of networked applications deployed inside datacenters by rearchitecting all layers of the datacenter software stack.

Heiner Litz, UCSC, “Making Storage Programmable with OpenChannel SSDs”
Abstract: Non-volatile flash storage devices have traditionally been a black box. In particular, users had no visibility about where their data is stored, how it is protected against data loss and when it can be retrieved from the device. For data center operators, black box devices are undesirable as they do not provide the transparency
required to improve utilization, TCO and performance predictability. OpenChannel SSDs enable users to decide where data is stored, how data is stored and how fast data can be retrieved from the device. Furthermore, OpenChannel devices provide real time telemetry data about write amplification, garbage collection and device health which opens up numerous optimization opportunities. This talk will present the current state of OpenChannel SSDs and point out some research directions leveraging the unique features of OpenChannel devices.

Bio: Heiner Litz is an assistant professor at the University of California, Santa Cruz. Before, he was a lecturer and staff research fellow at Stanford University advised by Prof. Christos Kozyrakis and Prof. David Cheriton. He received his PhD. in computer architecture from Mannheim/Heidelberg University and recently spent a year as a visiting researcher at Google developing next-generation data center infrastructure. His research interests include computer architecture, operating systems and storage with a focus on performance and scalability in large-scale distributed systems.

Pankaj Mehra, Samsung, “Introducing Samsung SmartSSD: From devices to subsystems”
Abstract: Samsung scientists pioneered the idea of In-Storage Computing. During the past few years, our idea was well received by top database conferences. Our new SmartSSD product brings programmable acceleration to Samsung V-NAND. This new acceleration technology features the speed and parallelism of hardware, but also supports programming and deploying accelerated functions as easily as software. It provides the perfect platform for analyzing terabytes of data within seconds on each device, and to then keep scaling to petabytes and beyond, simply by adding more devices. I will share early results and invite the research community to explore new ways to think about scaling data-intensive computations.
Bio: Pankaj Mehra is VP of Product Planning at Samsung. He was previously VP and Senior Fellow at both SanDisk and Western Digital, and WW CTO of Fusion-io.

Ethan Miller, UCSC, “Rethinking the Operating System Stack for Byte-Addressable Non-Volatile Memories”
Abstract: The introduction of byte-addressable non-volatile memory on the memory bus promises upheaval in the data access model of applications and in the consistency support requirements of processors. Operating systems, too, must be ready for an evolution in how they provide applications with access to persistent data, how they handle security and access control for applications, and how they manage and persist kernel and application state across power interruptions. This talk will describe Twizzler, a new operating system designed for non-volatile memory that we are developing to meet these challenges of byte-addressable NVM. Twizzler presents applications and middleware with an access model for persistent memory based around direct persistent data access with minimal kernel involvement. Twizzler provides applications with the power to follow pointers across objects in the system without the need to use archaic I/O models to copy and buffer data. Access control is implemented in a distributed computing-friendly way with signed capabilities, enabling components of an application to be isolated from each other to improve security and enable fault isolation. By making these changes, Twizzler enables systems to fully leverage byte-addressable non-volatile memory with minimal overhead.

Bio: Ethan L. Miller is a Professor in the Computer Science and Engineering Department at the University of California, Santa Cruz, where he holds the Veritas Presidential Chair in Storage. At UCSC, he is the Director of the NSF Industry/University Cooperative Research Center for Research in Storage Systems (CRSS) and the Associate Director of the Storage Systems Research Center (SSRC). He is a Fellow of the IEEE and an ACM Distinguished Scientist, and his publications have received multiple Best Paper awards. Prof. Miller received an Sc.B. from Brown University in 1987 and a Ph.D. from UC Berkeley in 1995, and has been on the UC Santa Cruz faculty since 2000. He has co-authored over 140 papers in a range of topics in file and storage systems, operating systems, parallel and distributed systems, information retrieval, and computer security; his research has received over 10,000 citations. He was a member of the team that developed Ceph, a scalable high-performance distributed file system for scientific computing that is now being adopted by several high-end computing organizations. His work on reliability and security for distributed storage is also widely recognized, as is his work on secure, efficient long-term archival storage and scalable metadata systems. His current research projects, which are funded by the National Science Foundation, Department of Energy, and industry support for the CRSS and SSRC, include system support for byte-addressable non-volatile memory, archival storage systems, scalable metadata systems, reliable and secure storage systems, and issues in ultra-scale storage systems. Prof. Miller's broader interests include file systems, operating systems, parallel and distributed systems, information retrieval, and computer security. Prof. Miller has also worked closely with industry to help move research results into commercial use at companies such as NetApp, Veritas, and Pure Storage. Additional information is available at https://www.crss.ucsc.edu/person/elm.php.

Allen Samuels, Western Digital, “Unleashing Innovation from the Core to the Edge”
Abstract: Open standards and open source software have long been key to innovation in the computer industry. In recent years, open source hardware has also become available. Western Digital believes that future problems and workloads will require new data processing architectures that are best built using open technologies. Three recent contributions to the open source RISC-V project are discussed.
Bio: Allen joined SanDisk in 2013 as an Engineering Fellow, he is responsible for directing research in the system and software group in the Office of the CTO of Western Digital. He has over 40 years of industry experience and has previously served as Chief Architect at Weitek Corp. and Citrix, founding several companies including AMKAR Consulting, Orbital Data Corporation, and Cirtas Systems. Allen holds 48 patents and graduated from with a Bachelor of Science in Electrical Engineering from Rice University.

Fazil Osman, Broadcom, “Learnings on Datacenter Disaggregation from Stingray SoC”
Abstract: In early 2017, Broadcom started sampling its Stingray SoC, the first 64-bit ARM SoC, running standard Linux, targeted at disaggregation of the Datacenter. Being able to migrate code from an existing x86 standard server or develop code using standard tools has resulted in some very interesting use cases. This talk will focus on what we have learnt from customer engagements as well as researchers.

Bio: Fazil Osman is a Distinguished Engineer in the Compute and Connectivity BU at Broadcom. He has over 30 years of technical and leadership experience spanning a broad range of fields such as semiconductors, systems, storage and networking. Among other things, he has designed the first commercial single chip Ethernet switch, Cisco’s first Ethernet switch, the Catalyst 1200 and well as the first 10Gbps TCP termination and storage ASIC. Presently, at Broadcom, he is driving Broadcom’s strategy for SmartNICs as well as storage disaggregation.

Michael Papamichael, Microsoft, “Catapult and Brainwave: Powering a Configurable Intelligent Cloud”
Abstract: Project Catapult is the technology behind Microsoft’s hyperscale acceleration fabric that uses reconﬁgurable logic to accelerate both network plane functions and applications. In this Conﬁgurable Cloud architecture a layer of reconﬁgurable logic (FPGAs) is placed between the network switches and the servers, enabling network ﬂows to be programmably transformed at line rate, as well as local and remote application acceleration as the FPGAs can communicate directly over a converged network with ultra-low latency at datacenter scale. Project Brainwave is Microsoft’s deep learning acceleration platform that leverages Catapult for serving state-of-the-art, pre-trained DNN models and is used in production services for Bing and Azure. By pinning model parameters entirely in high-bandwidth on-chip memories, the FPGAs achieve near-peak processing efficiencies at low batch sizes, a critical requirement for real-time AI services. In this talk, I will provide an overview of the Catapult project, discuss the evolution of our acceleration fabric, and highlight aspects of the BrainWave architecture.
Bio: Michael K. Papamichael is a Researcher in the Silicon System Futures group at Microsoft. His research interests are in the broader area of computer architecture with emphasis on hardware acceleration, reconfigurable computing, on-chip interconnects, and methodologies to facilitate hardware specialization. He holds a PhD in Computer Science from Carnegie Mellon University.

Jose Renau, UCSC, “Live Hardware Development at UCSC”
Abstract: Prof. Renau will present the research effort by his team at University of California, Santa Cruz. The talk focuses in Live ASIC/FPGA flows to improve the productivity of hardware design.

The Live ASIC/FPGA flow projects aim at generating from Verilog code to place and route in few seconds. The few seconds goal is to have a productive environment matching the human short term memory. We aim at generating correct results, not approximate models.

Synthesis, placement, and routing turnaround times are some of the major bottlenecks in digital design productivity. Engineers usually wait several hours to get accurate results to design changes that are often quite small. The Live ASIC/FPGA flow is accurate and equivalent to the current flows. This improvement can be achieved only because of the focus on incremental synthesis.

Bio: Jose Renau (http://www.soe.ucsc.edu/~renau) is a professor at the CSE department from the University of California, Santa Cruz. His research focuses on computer architecture, including design effort metrics and models, infrared thermal measurements, low-power and thermal-aware designs, process variability, thread level speculation, FPGA/ASIC design, Fluid Pipelines, Pyrope a modern hardware description language, and Live flows improve the productivity of hardware designs.

Muhammad Shahbaz, Stanford, “Elmo: Source-Routed Multicast for Public Clouds”
Abstract: Modern cloud applications (e.g., publish-subscribe systems, streaming telemetry, database and state-machine replication, and more) frequently exhibit one-to-many communication patterns and, at the same time, require sub-millisecond latencies and high throughput. IP multicast can achieve these requirements but has data- and control-plane scalability limitations that make it challenging to offer it as a service for hundreds of thousands of tenants, typical of cloud environments. Tenants, therefore, must rely on unicast-based approaches, e.g., application-layer or overlay-based, to support multicast in their applications, imposing overhead on throughput and end-host CPU utilization with higher and unpredictable latencies.

In this talk, we present Elmo, a system that overcomes the data- and control-plane scalability limitations that pose a barrier to multicast deployment in public clouds. Our key insight is that emerging programmable switches and the unique characteristics of data-center topologies (namely, symmetry and the limited number of switches on any path), enable the use of efficient source-routed multicast in these cloud environments. In our approach, software switches (like PISCES) encode the multicast forwarding policy inside packets which are processed by hardware switches (like Barefoot Tofino) at line rate. Doing so alleviates the pressure on switching hardware resources (e.g., group tables) and control-plane overhead during churn.

In a three-tier data-center topology with 27,000 hosts, our evaluation shows that Elmo can support a million multicast groups using a 325-byte packet header, requiring as few as 1,100 multicast group-table entries (on average) inside hardware switches, and having a traffic overhead as low as 5% of ideal multicast.

Bio: Muhammad Shahbaz is a Postdoctoral Scholar in the Department of Electrical Engineering at Stanford University. His research focuses on the design and development of domain-specific languages, compilers, and architectures and their applications to emerging workloads (including machine learning and self-driving networks). In the past, Shahbaz has built open-source systems like PISCES, SDX, OSNT, and NetFPGA-10G that are widely used in industry and academia. Shahbaz's work appears at top-tier networking and systems conferences, including ACM SIGCOMM and USENIX NSDI. His work on SDX got the Internet2 Innovation Award. Shahbaz received his Ph.D. and M.A. in Computer Science from Princeton University and B.E. in Computer Engineering from National University of Sciences and Technology (NUST). Before joining Princeton University, Shahbaz worked as a research assistant at the University of Cambridge, Computer Laboratory on the CTSRD and MRC2 projects, and was a Senior Design Engineer at the Center for Advanced Research in Engineering (CARE).

Gabriel Southern, Intel, "Accelerating Cloud Applications with Intel Xeon CPUs and FPGAs"
Abstract: Cloud computing applications can be accelerated using specialized hardware, but it is not feasible to deploy ASIC accelerators for every cloud application.  FPGAs can solve many problems that prevent deployment of ASIC hardware accelerators.  This talk describes the Intel Programmable Acceleration Card (PAC) and the open-source Open Programmable Acceleration Engine (OPAE) software framework that are used together to accelerate cloud applications using Intel FPGAs.

Bio: Gabriel Southern is an engineer in Intel's Programmable Solutions Group. He joined Intel in 2016 and is working on developing solutions for FPGAs in the data center. Prior to joining Intel, Dr. Southern was a student at UC Santa Cruz where his research focused on performance analysis and simulation methodology. Dr. Southern received his Ph.D. from UC Santa Cruz, his M.S. from George Mason University, and his B.S. from the University of Virginia.

Oksana Tkachuk, Amazon Web Services, “Automated Reasoning about Security of Amazon Web Services”
Abstract: With rapid growth of AWS services and features comes great responsibility to ensure their security. Do the services use proper authentication and authorization
mechanisms? Are resources protected using the right access controls? Is the customer data properly encrypted? Does customer network have any hosts reachable from the internet? To scale with AWS growth, these and other questions related to security requirements need to be answered automatically. This talk covers some of the AWS projects addressing the challenges of automated reasoning about security of AWS.

Bio: Oksana Tkachuk leads a project at AWS Security, with a goal to scale security reviews through the use of automated reasoning. In 2008, she received a PhD on a topic of automated environment generation for software analysis. She developed and applied software analysis techniques at Fujitsu Laboratories of America (2005-2011) for commercial software, at NASA Ames Research Center (2011-2017) for safety-critical software, and at AWS Security (2017-present) to reason about security of AWS.

Keith Winstein, Stanford, "Tiny Functions for Codecs, Protocols, Compilation, and (maybe) Soon Everything."
Abstract: Networks and applications frequently treat one another as strangers. By expressing large systems as compositions of small, pure functions, we've found it's possible to achieve tighter couplings between these components, improving performance without giving up modularity or the ability to debug. I'll discuss our experience with systems that demonstrate this basic idea: ExCamera (NSDI 2017) parallelizes video encoding into thousands of tiny tasks on cloud-functions infrastructure; Salsify (NSDI 2018) uses a purely functional video codec to lower end-to-end latency, and better picture quality, than Skype, Facetime, Hangouts, and WebRTC; Lepton (NSDI 2017) uses a purely functional JPEG/VP8 transcoder to compress images in parallel across a distributed network filesystem and has been broadly deployed at Dropbox to compress hundreds of petabytes of user files; Functional Containers (ongoing) abstract applications from underlying cloud-functions compute and storage services to support massive interactive parallelism for a variety of applications. Expressing systems and protocols as compositions of small, pure functions may open up a wave of general-purpose lambda computing, permitting us to transform everyday operations into massively parallel -- and understandable -- applications.

Bio: Keith Winstein is an assistant professor of computer science and, by courtesy, of electrical engineering at Stanford University. His research group designs networked systems that cross traditional abstraction boundaries, using statistical and functional techniques. He and his colleagues made the Mosh (mobile shell) tool, the Sprout and Remy systems for computer-generated congestion control, the Mahimahi network emulator, the Lepton JPEG-recompression tool, the ExCamera and Salsify systems for low-latency video coding and lambda computing, the Guardian Agent for secure delegation across a network, and the Pantheon of Congestion Control. Winstein has received the Usenix ATC Best Paper Award, the Usenix NSDI Community Award, a Google Faculty Research Award and Facebook Faculty Award, the ACM SIGCOMM Doctoral Dissertation Award, a Sprowls award for best doctoral thesis in computer science at MIT, and the Applied Networking Research Prize. Winstein previously served as a staff reporter at The Wall Street Journal and later worked at Ksplice Inc., a startup company (now part of Oracle Corp.) where he was the vice president of product management and business development and also cleaned the bathroom. Winstein did his undergraduate and graduate work at MIT.

Workshop Sponsors