Choose date to book a ticket
Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |
---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A pracitcal introduction to topic modelling
Christian Göbel
How to quickly make sense of a body of text that is too large for a single human to read? This workshop provides a practical introduction to topic modeling, a form of text mining that uncovers hidden semantic structures ("topics") in a corpus of documents. Topic modelling is a form of unsupervised machine learning suitable for eliciting how prominent certain topics are in a corpus, how they are connected, and how they develop over time. With a bit of caution, researchers can also use topic modelling algorithms to classify documents and thereby make them amenable to statistical analysis.
The workshop consists of four parts. First, participants will receive a brief introduction into the use cases of topic modelling, the most commonly used algorithms, and their strengths and weaknesses. Second, participants will learn how to preprocess text for analysis (remove stop words, lemmatise words, segment Chinese language documents), select the hyperparameters of Latent Dirichlet Allocation (LDA) models and decide on an appropriate number of topics. In the third part, we will use the fitted model to classify documents, inspect a random sample of classified documents and discuss the accuracy of classification.
Finally, participants will learn how to visualise the prevalence, development and connection of topics as a bar chart, line diagram and correlation plot, respectively.
Where does the event happen?
Staatsbibliothek zu Berlin
Potsdamer Straße 33
Simon-Bolivar-Saal
10785 Berlin
When does the event happen?
Begin:
End:
Add to Calendar