UT 2019

The IAP University of Texas Workshop on the Future of Cloud Computing Applications and Infrastructure was organized by Professor Chris Rossbach and conducted on Tuesday December 10, 2019 on the UT Austin campus in the Gates Dell Complex.

Agenda - Workshop Presentation Videos

8:00-8:30AM - Badge Pick-up – Coffee/Tea and Breakfast Food/Snacks

8:25-8:30AM     Welcome – Prof. Chris Rossbach, UT

8:30-9:00AM - Dr. Derek Chiou, UT and Microsoft, “Accelerating Microsoft's Cloud”

9:00-9:30AM - Larry Wikelius, Marvell, “New Applications and Accelerators for ARM-based Servers”

9:30-10:00AM - Prof. Vijay Chidambaram, UT, "Designing File Systems and Concurrent Data Structures for Persistent Memory"

10:00-10:30AM - Dr. Vinod Kathail, Xilinx, “Vitis and Vitis AI: ML Application Acceleration from Cloud to Edge”

10:30-11:00AM - Prof. Hovav Shacham, UT, “Data Dependent Instruction Timing Channels”

11:00-12:30PM - Lunch and Cloud Poster Viewing

12:30-1:00PM - Prof. Keshav Pingali, UT, “Single-Machine Analytics on Massive Graphs Using Intel Optane DC Persistent Memory”

1:00-1:30PM - Grant Mackey, Western Digital Research, “Computational Storage: A Brief History and Where is it going?”

1:30-2:00PM - Prof. Chris Rossbach, UT, “System Software in the Wake of Moore's law”

2:00-2:30PM - Dr. Niall Gaffney, TACC, “Research using HPC for Big Data and AI"

2:30-3:00PM Break - Refreshments and Poster Viewing

3:00-3:30PM - Prof. Kristen Grauman, UT and Facebook AI Research, “Egocentric Visual Learning”

3:30-4:00PM - Dr. Hui Lei, Futurewei, “Towards the Industrialization of AI”

4:00-4:30PM - Prof. Simon Peter, UT, “E3: Energy-Efficient Microservices on SmartNIC-Accelerated Servers”

4:30-5:00PM - Prof. Emmett Witchel, UT, “Achieving System Security in the Era of Secure Enclaves”

5:00-5:30PM - Reception - Refreshments and Poster Awards

Abstracts and Bios (alphabetical order by speaker)
Prof. Vijay Chidambaram, UT, "Designing File Systems and Concurrent Data Structures for Persistent Memory"
Abstract: Intel recently released DC Persistent Memory, a new class of storage technology that offers near-DRAM latency and bandwidth. Persistent memory is byte addressable and can be directly accessed using processor loads and stores. In this talk, I'll discuss lessons learned from building concurrent data structures and file systems for persistent memory. I'll introduce SplitFS, our new file system for persistent memory, which reduces the software overhead for common file-system operations. I'll also present RECIPE, our approach for transforming an existing concurrent DRAM data structure into a crash-consistent version for persistent memory. RECIPE-converted data structures, which we term the P-* family of data structures, outperform hand-crafted concurrent data structures for persistent memory by as much as 5x on YCSB workloads.
Bio: Vijay Chidambaram is an Assistant Professor in the Computer Science department at the University of Texas at Austin. His research tackles the challenges in building high-performance, reliable storage systems for emerging technologies (such as persistent memory) and emerging applications (such as blockchains and machine learning). His papers have been awarded Best Paper Awards in ATC 2018, FAST 2018, and FAST 2017. He was awarded the NSF CAREER Award in 2018, SIGOPS Dennis M. Ritchie Dissertation Award in 2016, the Microsoft Research Fellowship in 2014, and the University of Wisconsin-Madison Alumni Scholarship in 2009.

Prof. Derek Chiou, UT and Microsoft, “Accelerating Microsoft's Cloud”
Abstract: This talk describes Microsoft’s balanced acceleration efforts that include FPGAs and ASICs. FPGAs provide programmability with hardware performance, while ASICs provide density, power, and cost advantages for fixed functions. Examples of both with be provided. Microsoft is moving towards open accelerator eco-systems as demonstrated by the release of the Project Zipline compression standard with RTL code.
Bio: Derek Chiou is a Partner Architect at Microsoft responsible for infrastructure hardware architecture and future uses of FPGAs and a Research Scientist in the Electrical and Computer Engineering Department at The University of Texas at Austin. He co-started the Azure SmartNIC effort and lead the Bing FPGA team to first deployment of Bing ranking on FPGAs. Until 2016, he was an associate professor at UT. Before going to UT, Dr. Chiou was a system architect at Avici Systems, a manufacturer of terabit core routers. Dr. Chiou received his Ph.D., S.M. and S.B. degrees in Electrical Engineering and Computer Science from MIT.

Prof. Kristen Grauman, Facebook Ai Research and UT, “Egocentric Visual Learning”
Abstract: Computer vision has seen major success in learning to recognize objects from massive “disembodied” Web photo collections labeled by human annotators. Yet cognitive science tells us that perception develops in the context of acting the world---and without intensive supervision. Meanwhile, many realistic vision tasks require not only categorizing a well-composed human-taken photo, but also actively deciding where to look in the first place. In the context of these challenges, we are exploring how machine perception benefits from anticipating the sights and sounds an agent will experience as a function of its own actions. Based on this premise, we introduce methods for learning to look around intelligently in novel environments, learning from video how to interact with objects, and perceiving audio-visual streams for both semantic and spatial context. Together, these are steps towards first-person perception, where interaction with the world is itself a supervisory signal.
Bio: Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Scientist at Facebook AI Research. Her research in computer vision and machine learning focuses on visual recognition and search. Before joining UT Austin in 2007, she received her Ph.D. at MIT. She is a AAAI Fellow, IEEE Fellow, Sloan Fellow, and a recipient of the NSF CAREER, ONR YIP, PECASE, PAMI Young Researcher award, and the 2013 IJCAI Computers and Thought Award. She and her collaborators were recognized with best paper awards at CVPR 2008, ICCV 2011, ACCV 2016, and a 2017 Helmholtz Prize “test of time” award. She served as a Program Chair of the Conference on Computer Vision and Pattern Recognition (CVPR) in 2015 and Neural Information Processing Systems (NeurIPS) in 2018, and she currently serves as Associate Editor-in-Chief for the Transactions on Pattern Analysis and Machine Intelligence (PAMI).

Dr. Vinod Kathail, Xilinx, “Vitis and Vitis AI: ML Application Acceleration from Cloud to Edge”
Abstract: Xilinx’s Vitis and Vitis AI development environments are motivated by three industry trends: AI proliferation, the need for heterogenous computing, and applications that span cloud to edge to end-point. Vitis is a comprehensive SW development, including a rich set of open-source libraries, to seamlessly build and deploy complete accelerated applications. Vitis AI, an integral part of Vitis, enables AI inference acceleration on Xilinx platforms including Alveo cards, FPGA-instances in the cloud, and embedded platforms. Vitis AI supports industry’s leading deep learning frameworks like Tensorflow and Caffe, and offers comprehensive APIs to prune, quantize, optimize, and compile pre-trained networks to achieve the highest AI inference performance on Xilinx platforms. In this talk, we provide an overview of Vitis and Vitis AI development environments.
Bio: Vinod Kathail is a Xilinx Fellow and Chief Architect for the Vitis Development Environment. He is also leading the company-wide focus on embedded vision including machine learning usage in edge and endpoint applications. At Xilinx, he initiated the software programmability effort for the Zynq family, and developed and drove the adoption of SDSoC earlier on. Prior to joining Xilinx, Vinod was the founding CEO and later CTO of Synfora, a high-level synthesis startup. Vinod brings over 25 years of experience in heterogenous programming environments, high-performance parallel and VLIW architectures, parallelizing compilers and high-level synthesis, working in both research labs (HP Labs) and startups. Vinod received a B. Tech in Electrical Engineering from MANIT, Bhopal, a M. Tech in Computer Science from IIT, Kanpur and an ScD in Electrical Engineering and Computer Science from MIT. He holds over 25 patents, and he has authored numerous research publications.

Dr. Hui Lei, Futurewei, “Towards the Industrialization of AI”
Abstract: We are witnessing a great awakening of AI, thanks to the proliferation of specialized chips, the breakthroughs in machine learning techniques, and the explosion of digital data. Although many enterprises consider AI strategically important, only a very small portion of them have revenue-generating AI systems running today. Advancing from machine learning prototypes to production AI systems imposes many challenges and requires a systematic approach. In this talk, I will discuss the issues an enterprise will have to address in order to adopt AI in a scalable manner and realize its full benefits. I will also make a case for a cloud-based enterprise AI platform and outline the technical requirements for such a platform.
Bio: Hui Lei is Vice President and CTO of Cloud and Big Data at Futurewei Technologies. Previously he was Director and CTO of Watson Health Cloud at IBM, an IBM Distinguished Engineer, and an IBM Master Inventor. He is a Fellow of the IEEE, Editor-in-Chief of the IEEE Transactions on Cloud Computing, and a past Chair of the IEEE Technical Committee on Business Informatics and Systems. He has been recognized with the prestigious IEEE Computer Society Technical Achievement Award for his pioneering work on big data engineering. He holds a Ph.D. in Computer Science from Columbia University.

Grant Mackey, Western Digital Research, “Computational Storage: A Brief History and Where is it going?”
Abstract:  “Computational Storage” is the latest term used to describe the co-location of compute and storage in the IO path of computing. The term is nebulous, the definition contentious, and the implementation is not clear. Are we in a technology hype cycle, or is it going to happen this time around? This talk will review what has been done in the past, who some of the current players are, and (some) of what WD is working on today to help make computational storage a reality.
Bio: Grant Mackey is a senior technologist at Western Digital, and during his 6 year tenure has served in various research teams within the company. Projects areas during this time include discrete event simulation of parallel and distributed storage system architectures and new memory fabrics, as well as prototyping efforts of novel storage architectures and new computational technologies. He has served as a mentor to several graduate students, as well as being a voting member for several university research centers providing support and direction to various storage related research projects. An avid supporter and consumer of open source software, Grant was a pioneer for voluntary open source code efforts at Western Digital. Previously, Grant has spent some time in the folly that is tech startups, learning that you can invent something that isn’t kafka on your own, then have a very large company release it some time later and ruin your value proposition. Additionally he has spent time at Los Alamos National Lab researching high-performance computing storage architectures and how they can better service the analysis efforts of petascale scientific simulations. Grant received his BS, MS, and (pending, don’t go into industry while doing a) PhD. from the University of Central Florida. Born in the state of Florida, this walking internet meme currently resides with his family in the west coast orange county of California. He enjoys brewing beer and working (forever) on his dissertation

Prof. Simon Peter, UT, “E3: Energy-Efficient Microservices on SmartNIC-Accelerated Servers”
Abstract: We investigate the use of SmartNIC-accelerated servers to execute microservice-based applications in the data center. By offloading suitable microservices to the SmartNIC’s low-power processor, we can improve server energy-efficiency without latency loss. However, as a heterogeneous computing substrate in the data path of the host, SmartNICs bring several challenges to a microservice platform: network traffic routing and load balancing, microservice placement on heterogeneous hardware, and contention on shared SmartNIC resources. We present E3, a microservice execution platform for SmartNIC-accelerated servers. E3 follows the design philosophies of the Azure Service Fabric microservice platform and extends key system components to a SmartNIC to address the above-mentioned challenges. E3 employs three key techniques: ECMP-based load balancing via SmartNICs to the host, network topology-aware microservice placement, and a data-plane orchestrator that can detect SmartNIC overload. Our E3 prototype using Cavium LiquidIO SmartNICs shows that SmartNIC offload can improve cluster energy-efficiency up to 3× and cost efficiency up to 1.9× at up to 4% latency cost for common microservices, including real-time analytics, an IoT hub, and virtual network functions.
Bio: Simon is an assistant professor in the department of Computer Science at the University of Texas at Austin. Simon works to dramatically improve data center efficiency and reliability by designing, building, and evaluating new
alternatives for their hardware and software components. Simon currently co-designs operating system networking and storage stacks with new hardware technologies to push server I/O efficiency an order of magnitude beyond
today's capabilities. He is also working on networking issues that arise when pushing performance far beyond current capabilities.
Simon is the director of the Texas Systems Research Consortium, where he collaborates closely with industry to shape the future of cloud computing. Simon's work is supported by VMware, Microsoft Research, Huawei, Google,
Citadel Securities, and ARM. Simon was twice awarded the Jay Lepreau Best Paper Award, in 2014 and 2016, and a Memorable Paper Award in 2018. He received an NSF CAREER Award and he is a Sloan research fellow. Before
joining UT Austin in 2016, Simon was a research associate at the University of Washington from 2012-2016. He received a Ph.D. in Computer Science from ETH Zurich in 2012 and an MSc in Computer Science from the Carl-von-Ossietzky University Oldenburg, Germany in 2006.

Prof. Keshav Pingali, UT, “Single-Machine Analytics on Massive Graphs Using Intel Optane DC Persistent Memory”
Abstract: Graph analytics systems today must process very large graphs that have billions
of nodes and edges and requiring several TB of storage. Since the main memory of most computers is limited to a few 100 GB, graphs of this size must be processed either on clusters or by out-of-core processing. However, both these approaches have large overheads and they support only a limited set of graph processing algorithms. Intel Optane DC Persistent Memory is a transformative memory technology which has higher density and lower cost than DRAM, but which can be accessed efficiently at the byte level like DRAM. This enables affordable machines with several TB of memory. In this talk, we describe our experience in using such a machine for in-memory analytics of massive graphs using the Galois system.
Bio: Keshav Pingali is a Professor in the Department of Computer Science at the University of Texas at Austin, and he holds the W.A."Tex" Moncrief Chair of Computing in the Oden Institute at UT Austin. Pingali is a Fellow of the IEEE, ACM and AAAS. He received the IIT Kanpur Distinguished Alumnus Award in 2013. Between 2008 and 2011, he was the co-Editor-in-chief of the ACM Transactions on Programming Languages and Systems. He has also served on the NSF CISE Advisory Committee.

Prof. Chris Rossbach, UT and VMware, “System Software in the Wake of Moore's Law”
Abstract: As CPUs' ability provide transparent performance improvements through Dennard scaling continues to decline, emerging hardware increasingly leverages the surplus transistor budget at smaller process sizes to provide specialized accelerators, designed to efficiently support particular types of computation. While this is leading to higher diversity in hardware platforms, OS and hypervisor designs have retained a many-decades-old guiding principle that CPUs are the dominant compute resource at the center of a constellation of I/O devices. This organization is increasingly untenable, and emerging system software needs to be re-organized to reflect this ongoing architectural sea change. Moreover, OS
evolution to better manage non-CPU compute gives rise to a number of interesting questions around what hardware features non-CPU compute devices should support to enable OSes to better manage them. This talk will describe recent research projects toward better virtualization support in hardware and software for both.
Bio: Chris Rossbach is an Assistant Professor at UT Austin, an affiliated Senior Researcher with VMware Research group, and an alumnus of Microsoft Research's Silicon Valley Lab. He received his Ph.D. in computer science from The University of Texas at Austin in 2009. Chris’s research focuses on OS, hypervisor, and architectural support for emerging hardware.

Prof. Hovav Shacham, UT, “Data Dependent Instruction Timing Channels”
Abstract: We show that the floating point instructions on modern processors exhibit data-dependent timing variation: the running time of same instruction, applied to the same registers, can vary by an order of magnitude depending on the register values. We show that this variation can be exploited by attackers to violate intended privacy guarantees in software that manipulates both secrets and untrusted inputs, such as Web browsers. We discuss possible mitigations at the application, compiler, and microarchitectural level.
Bio: Hovav Shacham joined the University of Texas at Austin in 2018. His research interests are in applied cryptography, systems security, privacy-enhancing technologies, and technology policy. Shacham was a student at Stanford and a postdoctoral fellow at the Weizmann Institute. From 2007 to 2018, he was on the faculty at the University of California, San Diego. He received the 2017 ACM CCS Test-of-Time Award for his 2007 paper that introduced return-oriented programming. Shacham took part in California's 2007 "Top-to-Bottom" voting systems review and served on the advisory committee for California's 2011–13 post-election risk-limiting audit pilot program. His work has been cited by the National Academies, the Federal Trade Commission, the National Highway Traffic Safety Administration, and the RAND Corporation.

Larry Wikelius, Marvell, “New Applications and Accelerators for ARM-based Servers”
Abstract: This presentation will review details on the ThunderX server processor roadmap, based on the ArmV8-A architecture, with particular focus on Marvell optimizations for critical workloads. Additionally, specific applications and customer use cases that deliver differentiating performance and value to end users unique to the Arm architecture will be highlighted as well as Marvell’s architectural innovations in the server SoC. The presentation will also highlight key customer deployments in the server and data center industry.
Bio: Larry Wikelius currently serves as Vice President – Ecosystem and Partner Enabling for Marvell. He originally held this position with Cavium and has continued in this role following Marvell’s acquisition of the company in July 2018. He has built a network of strategic partners and blue-chip companies like Nvidia, HPE and Microsoft in fully supporting Arm®-based servers for the first time in the server industry. Wikelius holds an MBA from Northeastern University and a BS Computer Science from the University of Minnesota.

Prof. Emmett Witchel, UT, “Achieving system security in the era of secure enclaves”
Abstract: GPUs have become ubiquitous in the cloud due to the dramatic performance gains they enable in domains such as machine learning (ML) and computer vision (CV). However, offloading GPU computation to the cloud requires placing enormous trust in providers and administrators. Recent proposals for GPU trusted execution environments (TEEs) are promising, but fail to address very real side channel concerns. To illustrate the severity of the problem, we demonstrate a novel attack and then discuss a system that facilitates computation on cloud GPUs while eliminating side channels. the system is based on a novel GPU stream abstraction that ensures execution and interaction through untrusted components are independent of any secret data.
Bio: Emmett Witchel is a professor in computer science at The University of Texas at Austin. He received his doctorate from MIT in 2004. He and his group are interested in operating systems, security, and fast computation.