Choose date to book a ticket
Text and data mining with the Chinese Text Project
Donald Sturgeon
This hands-on workshop introduces participants to complete text and data mining workflows, from digital transcription and annotation of premodern works through to extraction of data derived from their contents, using materials and tools from the Chinese Text Project (https://ctext.org). It consists of four parts:
1. Using the Chinese Text Project: how to use this crowdsourced editing platform to create and obtain accurate, linked digital transcriptions of premodern Chinese texts.
2. Interactive text mining: extracting and visualizing statistical properties and relationships from transcribed texts. Types of analysis include pattern matching of words and phrases, identification of text reuse, and patterns of vocabulary usage; visualizations include summarization via networks, charts, and textual heatmaps.
3. Annotating, disambiguating, and linking references to entities (such as names of people, places, and eras) in a premodern text to authority databases, extracting knowledge claims about these entities (such as dates of birth, death, or appointment to a particular bureaucratic office) and contributing them to a crowdsourced knowledge base.
4. Interactive data mining: extracting and visualizing data from annotated texts and extracted knowledge claims. This includes simple querying of the knowledge base for particular types of information through the online interface, as well as the basics of the widely used SPARQL query language.
This workshop does not assume any prior background in digital methods, and requires only a web browser (recommended: Google Chrome or Firefox). Participants are encouraged to create a free account on ctext.org prior to the workshop: https://ctext.org/account.pl
Further information:
• https://ctext.org/dh
• Crowdsourcing the Historical Record: Creating Linked Open Data for Chinese History at Scale, International Journal of Humanities and Arts Computing 16:1, 2022.
• Chinese Text Project: a dynamic digital library of premodern Chinese, Digital Scholarship in the Humanities, 2019.
• Digital Approaches to Text Reuse in the Early Chinese Corpus, Journal of Chinese Literature and Culture 5:2, 2018.
Where does the event happen?
Staatsbibliothek zu Berlin
Potsdamer Straße 33
Hörsaal 320
10785 Berlin
When does the event happen?
Begin:
End:
Add to Calendar