IDSNLP 2024: NUS IDS NLP-SIG Workshop for TheWebConf 2024

Time	Event
8:30 - 8:45	Welcome and Opening Remarks Speaker: See-Kiong Ng (NUS)
8:45 - 9:15	When Spatio-Temporal Data Meet Large Language Models Speaker: Yuxuan Liang (HKUST)
9:15 - 9:45	Integrating Large Language Models into Recommender Systems Speaker: Jiarui Jin (SJTU)
9:45 - 10:30	LLM-based Clarification in Conversational Search: Going beyond Unimodality and System Performance Speakers: Mohammad Aliannejadi (University of Amsterdam), Yifei Yuan (University of Copenhagen)
10:30 - 11:00	Coffee Break
11:00 - 11:30	Sailor: Open Language Models for South-East Asia Speaker: Qian Liu (Sea AI lab)
11:30 - 12:00	From Keyword to Conversational Search: Leveraging Behavioral Data and Deep Learning in E-Commerce Speaker: Yupin Huang (Amazon)
12:00 - 12:30	Enhancing LLMs’ Reliability via Generation and Verification Speaker: Wenya Wang (NTU)
12:30 - 13:30	Lunch
13:30 - 15:00	Poster Session

Speakers

The following speakers are invited to give keynotes at IDSNLP 2024. Please click the profile image to view the detailed content of the talk.

Title: Welcome and Opening Remarks
Speaker: See-Kiong Ng

Bio: See-Kiong Ng (Ph.D., Carnegie Mellon University), a recipient of the Singapore National Computer Board's overseas scholarship (1986), is currently Professor of Practice at the Department of Computer Science of the School of Computing, National University of Singapore (NUS), and Director, Translational Research for the university's Institute of Data Science. Founded in May 2016, the Institute is the focal point at NUS for developing integrated data science capabilities and nurturing data scientists for Singapore's Smart Nation initiative. It pushes the boundary of data science through transdisciplinary upstream research, and creates impact by translating the research outcomes into real-life applications by collaborating with partners from the industry and public agencies.

Title: When Spatio-Temporal Data Meet Large Language Models
Speaker: Yuxuan Liang

Abstract: Spatio-temporal data mining is crucial for understanding the complexities inherent in various real-world systems and applications. Despite the significant progress of large language models (LLMs), the development of artificial general intelligence (AGI) with spatio-temporal data capabilities is still in its early stages. Most existing ML or DL models heavily rely on domain knowledge and extensive model tuning, predominantly focusing on prediction tasks. In this talk, we argue that current LLMs have the potential to revolutionize spatio-temporal data mining, thereby promoting efficient decision-making and advancing towards a more universal form of spatio-temporal analytical intelligence. Such advancement could unlock a wide range of possibilities, such as spatio-temporal forecasting, location understanding, and urban planning. This talk encourages researchers and practitioners to recognize the potential of LLMs in advancing spatio-temporal data mining and emphasizes the need for trust in these related efforts.

Bio: Dr. Yuxuan Liang is currently an Assistant Professor at Intelligent Transportation Thrust, also affiliated with Data Science and Analytics Thrust, Hong Kong University of Science and Technology (Guangzhou). He is working on the research, development, and innovation of spatio-temporal data mining and AI, with a broad range of applications in smart cities. Prior to that, he obtained his PhD degree at School of Computing, National University of Singapore, supervised by Prof. Roger Zimmermann and Prof. David S. Rosenblum. He also worked closely with Dr. Yu Zheng and Dr. Junbo Zhang from JD Technology. He published 70+ papers in refereed journals (e.g., TPAMI, AI, TKDE, TMC) and conferences (such as KDD, NeurIPS, ICLR, WWW, ECCV, IJCAI, AAAI, and MM). His publications collectively gathered 3,700 citations on Google Scholar, with h-index of 29 and i10-index of 50. Among them, three papers (GeoMAN, ST-MetaNet and STMTMVL) were selected as the most influential IJCAI/KDD papers according to PaperDigest, which indicates their significant impacts on both industry and academia. He also served as a PC member (or reviewer) for some prestigious conferences, including KDD, ICML, ICLR, NeurIPS, WWW, CVPR, ICCV, ECCV, IJCAI, AAAI, SIGSPATIAL (outstanding PC), and Ubicomp.

Title: Exploring Interaction Patterns of Sequence Data in Recommender Systems
Speaker: Jiarui Jin

Abstract: Interaction patterns, also known as cross-features in tabular data, are considered essential in recommender systems, having inspired widely adopted models such as factorization machines and their extensions. This talk aims to present methods for defining and capturing these interaction patterns in sequence data within recommender systems. Additionally, we will explore their application in both traditional item recommendation and the emerging field of anchor recommendation tasks.

Bio: Jiarui is a Zhiyuan Honor Ph.D. Student at Apex Data and Knowledge Management Lab and Wu Honor Class, Computer Science Department, Shanghai Jiao Tong University starting from September 2019. His graduate research advisors are Prof. Yong Yu, Prof. Weinan Zhang and Prof. Jun Wang (University College London). Previously, Jiarui was a visiting student researcher at Artificial Intelligence Center, University College London. He was a research intern or an applied scientist intern at Amazon Web Services Shanghai AI-Labs, Taobao Live, Alimama, DiDi AI-Labs, and closely collaborated with China Merchants Bank. He earned his B.Eng. from Yingcai Honor Class of University of Electronic Science and Technology of China in 2019.

Title: LLM-based Clarification in Conversational Search: Going beyond Unimodality and System Performance
Speaker: Mohammad Aliannejadi

Abstract: Conversational search systems aim to improve user experience by taking a more active role in an information-seeking session. This can be achieved in a mixed-initiative setup where the system would also pose questions to the users. The questions could have multiple goals, such as asking for feedback or clarification. Research in this area has emphasized the importance of this area and proved it leads to improved system performance. In this talk, we will focus on the latest developments and findings in this area by taking two steps further. First, we present our work on multimodal clarifying questions where the system couples the questions with useful images to help users even further. We describe our data collection and evaluation pipeline and present our newly collected dataset. We propose a novel pipeline for this problem and present the results of our proposed method, as well as the baselines. Second, we discuss how mixed-initiative actions of a system can influence user experience. In particular, we will share the results of two user studies, focusing on clarifying questions. Our findings show that while asking high-quality clarifying questions leads to improved user experience, the risk of asking low-quality questions is still high and should be taken into account in system design.

Bio: I am an assistant professor at IRLab (formerly known as ILPS), the University of Amsterdam in The Netherlands. I obtained my Ph.D. in the Faculty of Informatics, Università della Svizzera italiana (USI) in Lugano, Switzerland. During my Ph.D., I spent four months visiting CIIR at the University of Massachusetts, Amherst, USA. Before my Ph.D., I completed my MSc degree in the Department of Computer Engineering and Information Technologies at Tehran Polytechnic, Tehran, Iran.

Title: LLM-based Clarification in Conversational Search: Going beyond Unimodality and System Performance
Speaker: Yifei Yuan

Bio: I am currently a postdoc fellow at Coastal NLP group at University of Copenhagen, working with Prof. Anders Søggard. I obtained my PhD degree in October 2023 at CUHK text mining group, supervised by Prof.Lam Wai. I obtained my bachelor’s degree at Harbin Institute of Technology in 2019, majoring in bioinformatics. Currently, my research interest focuses on multimodal information retrieval and text mining problems, including social media based fashion analysis, multimodal dialogue systems, etc. I’m also interested in other aspects of NLP and IR in the multimodal domain.

Title: Sailor: Open Language Models for South-East Asia
Speaker: Qian Liu

Abstract: Sailor is a family of open language models ranging from 0.5B to 7B parameters, which are tailored for South-East Asian (SEA) languages. These models are continually pre-trained from existing language models, covering over 200B to 400B tokens across languages such as English, Chinese, Vietnamese, Thai, Indonesian, Malay, and Lao. In this talk, I will provide an overview of the Sailor models, discuss the key technical innovations, and present the empirical findings from our evaluation. I will also highlight the challenges and lessons learned in tailoring language models for multilingual settings. The talk will conclude with a discussion on the potential future directions and opportunities for further research and development in this area.

Bio: I am a Research Scientist (NLP) at the Sea AI Lab, an industry research lab based in Singapore 🇸🇬. Currently we are still actively seeking (Senior) Research Scientists for all directions to join our team, and we encourage you to reach out if you are interested. Please don't hesitate to send me an email for further details or to express your interest. My primary research interests are in natural language processing, particularly code generation, table pre-training and natural langauge reasoning. I have been fortunate to work with an amazing set of researchers in the past. I did my Ph.D. thesis at the joint program of Beihang University and Microsoft Research Asia where I was advised by Jian-Guang Lou and Bei Chen. The research topic during my Ph.D. journey is semantic parsing, which aims to translate a user’s natural language sentences into machine-executable formal programming language to accomplish relevant tasks. My thesis revolves around my efforts to develop cost-effective, generalizable, and interactive semantic parsing systems.

Title: From Keyword to Conversational Search: Leveraging Behavioral Data and Deep Learning in E-Commerce
Speaker: Yupin Huang

Abstract: In this talk, we explore the evolution of e-commerce search engines from traditional keyword-based systems to advanced conversational interfaces, highlighting our latest research and technological advancements. The first half of the presentation will focus on enhancements in keyword search technologies, where deep learning and behavioral data have significantly improved the precision of product ranking and user experience. The latter half will introduce our transition to conversational search, addressing the unique challenges posed by natural language queries that often involve complex semantics or contextual understanding. Key examples include searches like “what’s the easy-to-clean coffee maker?” and “show me shoes with big ribbons.” This shift is further catalyzed by the broader adoption of chatbots, making such interactions increasingly common. We present the Conversational Product Understanding Foundation Model, a novel approach that transforms natural language queries and product information into embeddings, thereby revolutionizing the search and shopping experience by supporting dynamic, context-aware dialogue. The session will detail ConvoFM’s development, its integration into search frameworks, and its impact on both search accuracy and user engagement.

Bio: Senior Applied Scientist at Amazon.com.

Title: Enhancing LLMs’ Reliability via Generation and Verification
Speaker: TODO

Abstract: In the rapidly evolving landscape of artificial intelligence, ensuring the reliability of Large Language Models (LLMs) is paramount for their application across diverse domains. This talk presents our exploration of strategies aimed at enhancing the reliability of LLMs through the lens of tailored generation and verification. In the first part, I will delve into two of our recent works towards equipping LLMs with the capability to generate additional contextual information that significantly improves the transparency and quality of their output. The augmented generation includes crucial elaborations/explanations related to the question, or referencing citations as supporting evidence in LLMs’ responses. In the second part, I will discuss two of our efforts towards verification models aimed at scrutinizing and validating the outputs of LLMs to ensure their accuracy and logical consistency. The verification models are essential for detecting fallacious and inconsistent rationales generated by LLMs, and thus could be used as intermediate checking modules towards reliable generations.

Bio: I am an Assistant Professor with the school of Computer Science and Engineering at Nanyang Technological University. Prior to joining NTU, I worked with Noah Smith and Hanna Hajishirzi as a Postdoc in Paul G. Allen School of Computer Science and Engineering at the University of Washington. I completed my PhD under the supervision of Sinno Jialin Pan at Nanyang Technological University.

NUS IDS NLP-SIG Workshop for TheWebConf 2024

Welcome!

Programme

Speakers

See-Kiong Ng

Yuxuan Liang

Jiarui Jin

Mohammad
Aliannejadi

Yifei Yuan

Qian Liu

Yupin Huang

Wenya Wang

Poster Papers

Organizers

Location

Contact Us