Natural Language Processing for the Long Tail
David Bamman, Assistant Professor, UC Berkeley School of Information
Matrix is located on the 8th floor of Barrows Hall, on the UC Berkeley campus, near Telegraph and Bancroft Avenues, just up the hill from Sather Gate. There are entrances at both ends of the building, but only one of the elevators on the eastern side goes directly to the 8th floor. You can alternatively take the stairs to the 7th floor and walk up the stairs from there.
Over the past few years, natural language processing (NLP) has become an increasingly important element in computational research in the humanities and social sciences, enabling sophisticated analyses that can go far beyond simple word counting. However, there is a substantial gap between the quality of the NLP used by researchers in the humanities and the state of the art. NLP research has overwhelmingly focused not only on one language (English), but also one domain (news wire), leaving many other languages, dialects, and domains (such as literary text) underserved.
In this talk, David Bamman, Assistant Professor in the UC Berkeley School of Information, will advocate for two changes that he thinks are necessary to drive the next generation of textual work in the computational humanities. First, he will argue for the importance of structured linguistic representations in computational models of text, surveying several recent projects that have leveraged that structure to good effect. Second, he will advocate for the development of high-quality NLP for the long tail of languages, dialects, and domains that humanists study—and which humanists are in the best position to take the reins and make progress on.
By leveraging standard machine learning techniques with disciplinary expertise that only humanists can provide, we can both dramatically expand the scope of NLP to be applied to a much wider variety of texts in our cultural record and use the linguistic structure we infer to help define new tasks altogether.
Hosted in conjunction with the Digital Humanities at Berkeley Summer Institute, this event is open to the public. Food and drink will be served. RSVP through EventBrite.