Random Sampling on Big Data: Techniques and Applications

Abstract

While the prevailing approach to the big data challenge is to scale up/out the computation, the other, complimentary approach is to scale down the data, which is highly effective when small errors in the results can be tolerated. Random sampling is one of the main tools to scale down the data, and has been well studied in both computer science and statistics. However, as modern big data systems continue to evolve, they put new constraints and requirements on the sampling methods. This talk will present some recent results on random sampling in large-scale database systems, including random sampling over data streams, in a distributed system, on spatial data, and sampling for SQL queries.

Speaker

Prof. Ke YI
Associate Professor
Department of Computer Science & Engineering
Hong Kong University of Science and Technology
Hong Kong, China

Date & Time

27 Jan 2016 (Wednesday) 11:00 - 12:00

Venue

E11-4045 (University of Macau)

Organized by

Department of Computer and Information Science

Biography

Ke YI is an Associate Professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. He obtained his Bachelor's degree from Tsinghua University (2001) and Ph.D. from Duke University (2006), both in computer science. His research spans theoretical computer science and database systems, and publishes in the top venues of both areas, such as JACM, TODS, SIGMOD/PODS, and FOCS. He has received a Google Faculty Research Award (2010), a Young Investigator Research Award from HKUST (2012), and an ACM SIGMOD Best Demonstration Award (2015). He currently serves as an Associate Editor of TODS and TKDE.