We study and model the process by which humans summarize creative documents (e.g., from a movie script to a synopsis). We develop a customized topic model based on Poisson Factorization and inspired by the creativity literature, which links the text in a summary to the text in the original document. Traditional Poisson Factorization approximates documents as positive combinations of topics, i.e., as points in the cone defined by a set of topics (in the Euclidean space defined by the words in the vocabulary). The model proposed here captures not only this "inside the cone" portion of a document, but also the "outside the cone" portion that is not explained by a combination of common topics. The model captures how these two types of content are weighed in summaries as compared to full documents. In addition, it captures writing norms that influence the extent to which each topic appears in summaries compared to full documents. We apply this model to a dataset of marketing academic papers and their abstracts, and to a dataset of movie scripts and their synopses. We illustrate a practical application of our research by creating a public, online interactive tool meant to serve as a "sounding board" for users interested in writing summaries of creative documents.
Toubia, Olivier. "A Poisson Factorization Topic Model for the Study of Creative Documents (and Their Summaries)." Journal of Marketing Research (forthcoming).
Each author name for a Columbia Business School faculty member is linked to a faculty research page, which lists additional publications by that faculty member.
Each topic is linked to an index of publications on that topic.