Synopsis: This course focuses on the design of data science algorithms and characterizing properties of privacy and fairness. Our daily lives are actively being monitoring on web browsers, social networks, wearable devices and even robots. These data are routinely analyzed by statistical and machine learning tools to infer aggregate patterns of our behavior (with applications to science, medicine, advertising, IoT, etc). However, these data also contain private information about us that we would not like revealed to others (e.g., medical history, sexual orientation, locations visited, etc.). Disclosure of such data leads to our privacy being breached. Moreover, acting on data with such sensitive information could result in physical or financial harm, and discrimination, leading to issues in fairness.
A grand challenge that pervades all fields of computer science (and more generally scientific) research is: how to learn from data collected from individuals while provably ensuring that (a) private properties of individuals are not revealed by the results of the learning process, and (b) the decisions taken as a result of data analysis ensure fairness.
In this course, we will study recent work in computer science that mathematically formulates these societal constraints. For privacy, we will study differential privacy, a breakthrough privacy notion: an algorithm is differentially private if its output is insensitive to (small) changes in the input. Differential private algorithms have found applications in developing algorithms with provable privacy guarantees while learning from data from varied domains (e.g., social science, medicine, communications) and in varied modalities (e.g., tables, graphs, streams), and is used by government organizations and internet corporations like Google and Apple to collect and analyze data. Differential privacy has also been shown to help design better learning algorithms. We will also investigate how to formulate the notion of fairness in data analysis mathematical. We will study both the theory and practice of designing private and fair data analysis algorithms and their applications to data arising from real-world systems.
Prerequisites:
The course is open to interested graduate and undergraduate students with sufficient mathematical maturity. Basic knowledge in algorithms, proof techniques, and probability will be assumed. Familiarity with databases and machine learning would help but is not necessary.