Shengli Hu, Research

Research Interests

Substantive: Computational Social Science, Cognitive Linguistics, Economics of AI, Creativity, Consumer Experience, Media/Memorable Consumption, Behavioral Economics;
Methodological: Computer Vision and Natural Language Processing Applications, Counterfactual Machine Learning, Experimental Economics, Functional Data Analysis;

Research Philosophy

I try to only publish as the first author if and only if the answers are YES to all of the following:

Am I irreplaceable in this line of work? Am I the only few in the world with the necessary skills, motivation, and interests in materializing this?
Will this work make the world/society better?
Is this one of the "I wish I did it" projects rather than "I'm glad it's done (possibly by others) and it's quite nice"?
Is it NOT combining hot trendy topics that many others could be working on?

Selected Publications and Conferences:

A Diverse and Interpretable Benchmark for Viti- and Vini-cultural Visual Understanding. In Proceedings of AAAI Conference on Artificial Intelligence (AAAI 2022) AIAFS Workshop.

We present four new datasets for viticultural and vinicultural visual understanding: iVineyard, iCellar, iGrapevine, and VinePathology. We designed, gathered data for, cleaned, and provided numerical and natural language annotations for these datasets in collaboration with domain experts with the aim of (1) accelerating AI adoption in the realms of viticulture and oenology; (2) improving data efficiency and interpretability with data collection, task formulation, and annotation processes informed by domain expertise; (3) benchmarking the performance of representation learning algorithms on a suite of challenging downstream viti- and vini-cultural tasks that go beyond standard species classification. We provide analyses of qualitative and quantitative results of downstream tasks including fine-grained visual categorization, fine-grained image retrieval, image geo-localization, and object discovery, thus shedding light on the strengths and weaknesses of feature representations across a diverse set of tasks that are of scientific importance to viticulturists and oenologists.

[📕 BOOK] Neural Networks and Nebbiolo --- Artificial Intelligence for Wine. 2021. Ciel d'Avril.

This book is a proof of concept for how artificial intelligence could, and should be applied to each and every aspect of the wine industry from vine to wine, to assist wine professionals in improving their professional skills, productivity, and efficiency, to change the wine industry for the better, and ultimately enrich wine consumers' experiences. We ask, answer, illustrate, and demonstrate the solutions to a diverse range of questions relevant to wine professionals and enthusiasts, including but not limited to:

How could AI be leveraged for improving viticulture such as vineyard management and natural disaster response?
What are the essential components and techniques to enable speech assistants like Alexa or Google Home to answer any wine-related questions?
How could AI automatically come up with reasonable wine pairing suggestions, whether it be with food, music, or art?
How could AI help flying winemakers and globe-trotting wine professionals optimize their lifelong wine experiences?
How could AI techniques tailor and optimize for each wine taster the best blind tasting strategies based on personal strengths and weaknesses?
What factors could influence bidder behaviors at wine auctions, and which auction design elements play a role in auctioneers expected revenue from the auction? How could AI methods help design the optimal auction mechanism for the auctioneer?
Are fine and rare wines worth considering of potential alternative assets relative to traditional assets for investment? How could AI improve wine collector's investment portfolio management strategies?
Could AI assist vine-growers, viticulturists, and geneticists in accurate identification of thousands of grape varieties around the globe?
What makes a great wine list? How to leverage AI to automatically evaluate wine lists objectively? How to automatically generate wine lists according to themes, preferences, moods, and occasions?
What are some AI techniques that would enable us to automatically generate wine maps according to any artistic styles?
How could we scientifically pinpoint the causal effect of Terrior versus Vigneron on wine? What are some AI techniques that would enable us to know for sure if wine's quality is caused by winemaking practices, vintage variations, climats or lieux-dits, etc.?
What makes a cocktail creative? How could we automatically generate creative cocktail recipes with AI?

Detecting Domain-specific Expertise and Credibility in Text and Speech. 2020. In proceedings of InterSpeech 2020.

We investigate and explore the interplay of credibility and ex-pertise level in text and speech. We collect a unique domain-specific multimodal dataset and analyze a set of acoustic-prosodic and linguistic features in both credible and less cred-ible speech by professionals of varying expertise levels. Ouranalyses shed light on potential indicators of domain-specificperceived credibility and expertise, as well as the interplay in-between. Moreover, we build multimodal and multi-task deeplearning models that outperform human performance by6.2%in credibility and3.8%in expertise level, building upon state-of-the-art self-supervised pre-trained language models. To ourknowledge, this is the first multimodal multi-task study thatanalyzes and predicts domain-specific credibility and expertiselevel at the same

Multimodal Detection of Crisis Events in Social Media. 2020. Joint with with Mahdi Abavisani, Liwei Wu, Joel Tetreault, Alex Jaimes. 2020. In proceedings of Computer Vision and Pattern Recognition (CVPR).

Recent developments in image classification and naturallanguage processing, coupled with the rapid growth in so-cial media usage, have enabled fundamental advances indetecting breaking events around the world in real-time.Emergency response is one such area that stands to gainfrom these advances. By processing billions of texts andimages a minute, events can be automatically detected toenable emergency response workers to better assess rapidlyevolving situations and deploy resources accordingly. Todate, most event detection techniques in this area have fo-cused on image-only or text-only approaches, limiting de-tection performance and impacting the quality of informa-tion delivered to crisis response teams. In this paper, wepresent a new multimodal fusion method that leverages bothimages and texts as input. In particular, we introduce across-attention module that can filter uninformative andmisleading components from weak modalities on a sam-ple by sample basis. In addition, we employ a multimodalgraph-based approach to stochastically transition betweenembeddings of different multimodal pairs during trainingto better regularize the learning process as well as deal-ing with limited training data by constructing new matchedpairs from different samples. We show that our method out-performs the unimodal approaches and strong multimodalbaselines by a large margin on three crisis-related tasks.

Detecting Concealed Information in Text and Speech. 2019. In proceedings of Association for Computational Linguistics (ACL). [pdf] [poster] [🏆 Best Paper Award Nominee]

Motivated by infamous cheating scandals in various industries and political events, we address the problem of detecting concealed information in technical settings. In this work, we explore acoustic-prosodic and linguistic indicators of information concealment by collecting a unique corpus of professionals practicing for oral exams while concealing information. We reveal subtle signs of concealed information in speech and text, compare, and contrast them with those in deception detection literature, thus uncovering the link between concealing information and deception. We then present a series of experiments that automatically detect concealed information from text and speech. We compare the use of acoustic-prosodic, linguistic, and individual feature sets, using different machine learning models. Finally, we present a multi-task learning framework with acoustic, linguistic, and individual features, that outperforms humman performance by over 15%.

Single Stage Prediction with Embedded Topic Modeling of Online Reviews for Mobile App Development and Management. 2018. Joint work with Shawn Mankad and Anandasivam Gopal. Annals of Applied Statistics. [pdf] [ArXiv].

Mobile apps are one of the building blocks of the mobile digital economy. A differentiating feature of mobile apps to traditional enterprise software is online reviews, which are available on app marketplaces and represent a valuable source of consumer feedback on the app. We create a supervised topic modeling approach for app developers to use mobile reviews as useful sources of quality and customer feedback, thereby complementing traditional software testing. The approach is based on a constrained matrix factorization that leverages the relationship between term frequency and a given response variable in addition to co-occurrences between terms to recover topics that are both predictive of consumer sentiment and useful for understanding the underlying textual themes. The factorization can provide guidance on a single app’s performance as well as systematically compare different apps over time for benchmarking of features and consumer sentiment. We apply our approach using a dataset of over 81,000 mobile reviews over several years for two of the most reviewed online travel agent apps from the iOS and Google Play marketplaces.

Understanding Perceptual and Conceptual Fluency at a Large Scale. 2018. In proceedings of European Conference on Computer Vision (ECCV). [pdf]

We create a dataset of 543,758 logo designs spanning 39 industrial categories and 216 countries. We experiment and compare how different deep convolutional neural network (hereafter, DCNN) architectures, pretraining protocols, and weight initializations perform in predicting design memorability and likability. We propose and provide estimation methods based on training DCNNs to extract and evaluate two independent constructs for designs: perceptual distinctiveness ("perceptual fluency" metrics) and ambiguity in meaning ("conceptual fluency" metrics) of each logo. We provide evidences of causal inference that both constructs significantly affect memory for a logo design, consistent with cognitive elaboration theory. The effect on liking, however, is interactive, consistent with processing fluency (e.g., Lee and Labroo (2004), Landwehr et al. (2011).

Somm: Into the Models. 2018. In proceedings of Empirical Methods in Natural Language Processing (EMNLP). [pdf]

To what extent would the sommelier profession, or wine stewardess, be displaced by machine learning algorithms? There are at least three essential skills that make a sommelier: wine theory, blind tasting, and beverage service, as exemplified in the rigorous certification processes of certified sommeliers and above (advanced and master) with the most authoritative body in the industry, the Court of Master Sommelier (hereafter CMS). We propose and train corresponding machine/deep learning models that match these skills, and compare algorithmic results with real data collected from a large group of certified wine professionals. We find that our machine learning models outperform human sommeliers on most tasks --- most notably in deduction as an essential part of blind tasting, where both hierarchically supervised Latent Dirichlet Allocation outperforms sommeliers' judgment calls by over 6% in terms of F1-score; in terms of beverage service --- wine and food pairing, a modified Siamese neural networks based on BiLSTM achieves better results than sommeliers by 2%. This demonstrates, contrary to popular opinion in the industry, that the sommelier profession is at least to some extent automatable, barring economic (Kleinberg et al., 2017) and psychological (Dietvorst, 2015) complications.

Never Tell Me The Odds: How Belief Dynamics Shape Audience Experience in Sports. 2017. NLP+CSS, Association for Computational Linguistics, Vancouver, Canada, 2017. [slides]

Why do people watch movies, attend sports events, read news or novels? What are the underlying drivers of their experienced utility while consuming such experiential goods? Using social media sentiments as proxies for audience experience, we explore how belief dynamics manifest in the form of suspense, surprise, prior expectations, and peak-end effects, drawing on theories in behavioral economics and media psychology. While there has been extensive prior work looking into the predictive power of consumer sentiment, its antecedents and drivers in the setting of experiential goods have rarely been studied. We combine sentiment analysis with time series analysis (functional linear models and vector autoregressive models) and provide an integrated framework for experienced utility estimation. We uncover evidences of greater effects of experienced surprise than suspense at the ending, moderated by prior expectations in the setting of sporting events.

Understanding Perceptual and Visual Fluency at a Large Scale. 2017. Mutual Benefits of Cognitive and Computer Vision Workshop at International Conference on Computer Vision (ICCV). [pdf]

Information Design and Audience Experience. 2017. Marketing Science Conference. Los Angeles. CA.

Dynamics of Ideation in Crowdsourcing Platforms. 2013. The Institute for Operations Research and the Management Sciences (INFORMS) Annual Conference. Phoenix, AZ.

Managing Green Supply: Carbon Labeling Implications in Supply Chains. 2012. INFORMS International Conference. Beijing. China.

Selected Work in Progress:

Acoustic-prosodic Indicators of Concealed Information in Text and Speech.

Linguistic Indicators of Concealed Information in Text and Speech.

A Trip Down Memory Lane: Identifying and 3D Reconstructing Memorable Experiences in the Wild.

Playlist Curation and the Consumption of Music

As the streaming music industry gains momentum, music play-listing has become the new battle field for major streaming service providers. We study the role of play-lists in shaping consumer music listening behavior by asking two questions: (1) given a music play-list (a sequence of tracks), how should we specify and estimate a utility function of for an in-dividual consumer? (2) how does a consumer create her own music play-lists given whatshe has listened to in the past? We propose a two-stage sequence model based on Long Short-Term Memory recurrent neural networks combined with explicit and implicit preferences for individual tracks from a hazard and a discrete choice model. In the first stage, consumers encode tracks they have listened to into their memory, represented as sequence embeddings augmented by revealed preferences; in the second stage, they create their ownpersonal play-lists based on retrieved memory encoded in the first stage. We model and visualize consumers' music memory formation and retrieval processes of consumers using a dataset of consumer music streaming and play-listing behavior. We find great consumer heterogeneity and uncover three patterns in play-listing behaviors --- (1)”narrow bracketing”;(2) ”bunching”; (3) ”Peter Pan”, given revealed preferences for individual tracks.

Selected Resting Papers:

An Evolutionary Theory of Creativity and Ideation. 2015.

Harbingers of Entrepreneurial Failure: Evidence from Half A Million Business Pitches. 2016.

Decentralized Matching in Supply Chains: An Experimental Investigation. 2014.