From Deep Data Integration to Using LLMs to Query Unstructured and Structured Data

Abstract

We are organizing an IDS-CS Joint Seminar by Dr Wang-Chiew Tan from Meta on the topic of entity linking and LLMs this Tuesday morning at 10am (do note that the venue is at COM3 MPH). There will be light refreshments after the talk too.

Date
Sep 26, 2023 10:00 AM — 11:30 AM
Location
COM-3-01, Multi-Purposed Hall

We are witnessing the widespread adoption of deep learning techniques as avant-garde solutions to different computational problems in recent years. In data integration, the use of deep learning techniques has helped establish several state-of-the-art results in long standing problems, including information extraction, entity matching, data cleaning, and table understanding. In this talk, I will reflect on the strengths of deep learning and how that has helped move forward the needle in data integration. I will also discuss a few challenges associated with solutions based on deep learning techniques and describe some opportunities for future work. Recently, Large Language Models (LLMs) have emerged as a powerful tool for accessing parametric knowledge, but the potential of tapping into the vast expanse of external or private data remains largely unexplored. This talk presents an open-source question-answering system for seamlessly integrating model parameters with knowledge from external data sources to enhance its predictive capabilities. Our larger vision transcends question answering. We envision a personal insight assistant, adept at sifting through your past data to offer invaluable insights to help make informed decisions and plan with foresight.

Wang-Chiew is a research scientist at Meta AI. Before she was the Head of Research at Megagon Labs, where she led the research efforts on building advanced technologies to enhance search by experience. Prior to joining Megagon Labs, she was a Professor of Computer Science at the University of California, Santa Cruz. She also spent two years at IBM Research - Almaden. She received her B.Sc. (First Class) in Computer Science from the National University of Singapore and her Ph.D. in Computer Science from the University of Pennsylvania. Her research interests include data integration and exchange, data provenance, and natural language processing. She is the recipient of an NSF CAREER award, a Google Faculty Award, and an IBM Faculty Award. She co-authored best papers, she is a co-recipient of the 2014 ACM PODS Alberto O. Mendelzon Test-of-Time Award, the 2018 ICDT Test-of-Time Award, and the 2020 Alonzo Church Award. She received the 2019 VLDB Women in Database Research Award. She was on the VLDB Board of Trustees (2014-2019) and she is a Fellow of the ACM.