Choose date to book a ticket
Quantitative Analysis on Licensed Materials: CrossAsia Ngram Service & LoGaRT
Hou Ieong Brent HO
N-grams are an essential tool in linguistic research and digital humanities, enabling the analysis of language patterns, textual variations, terminology shifts, and more. However, obtaining meaningful N-gram data from commercial databases presents significant challenges, including technical difficulties and access restrictions due to licensing rights. CrossAsia's on-demand N-gram service aims to overcome these obstacles by providing researchers with accessible, high-quality N-gram data tailored to their specific needs. By offering a flexible and customizable solution, CrossAsia empowers researchers to make substantial advancements in their respective fields while adhering to licensing regulations.
This workshop is designed to introduce participants to the CrossAsia’s N-gram service and demonstrate how to leverage it for linguistic research and digital humanities projects. The workshop will be structured into three main parts:
1.  Introduction to CrossAsia N-gram Service and its Data Formats (15 mins)
 We will begin with an overview of CrossAsia's N-gram service, explaining how the data is collected, how to access them and its data formats. This session provides the foundational understanding necessary for the hands-on activities that follow.
2.  Hands-on Session with Dataset Sample in Google Colab (45 mins)
 Participants will engage in a hand-on session using Google Colab. Working with sample datasets, attendees will learn how to access and manipulate N-gram data, building confidence in using cloud-based tools for textual analysis.
3.  Introduction to Orange Data Mining Tool and Hands-on Session (45 mins)
 The workshop will briefly introduce Orange, a local data mining tool written in Python. Participants will work with a sample dataset, covering basic textual analysis and visualization techniques using Orange.
Prerequisites:
Participants are required to have a Google Colab account (https://colab.google/) and to download and install Orange (https://orangedatamining.com/) on their local machines prior to the workshop.
Shih-Pei Chen
This 90-minute workshop will introduce the Local Gazetteers Research Tools (LoGaRT), which currently hosts 4,410 titles of full-text digitized Chinese local gazetteers (difangzhi 地方志) published during late Imperial China and the Republican era. The 4410 titles are offered from two sources: 4000 are licensed materials from Zhongguo Fangzhi Ku (Beijing Erudition) and the license only covers CrossAsia users. The other 410 titles are digitized by MPIWG from Harvard Yenching Library’s rare book collection and is open access. Users who sign up for a LoGaRT account from outside of MPIWG can immediately see the 410 open access titles. In the past year, we have also made the metadata of the Zhongguo Fangzhi Ku available to general users, which include the book metadata and the section headings of the entire collection within LoGaRT.
In this workshop, in addition to show the basic functions of LoGaRT, I will show how a general user can already make use of the metadata within LoGaRT to observe general patterns in this big collection. I will showcase how running a section search in this collection can help us understand the knowledge structure encompassed by this genre and how it changed over time.  
Prerequisites: 
Participants can sign up for a LoGaRT account prior to the workshop at this page: https://logart.mpiwg-berlin.mpg.de/LGServices2/#/signin. They can also watch the existing recorded tutorials online before coming to the workshop, but it’s optional. 
Online tutorials:
•   https://content.mpiwg-berlin.mpg.de/mpiwg/online/permanent/Media_Online/Video/2020/2020-06-19_TALK_HarvardYenchinLoGaRT/2020-06-19_TALK_HarvardYenchinLoGaRT_SChen.mp4 
•   https://www.youtube.com/playlist?list=PLhOCf20UlVNtwaQwegrBfzOxSgWit4UZY
            Staatsbibliothek zu Berlin
Potsdamer Straße 33
Dietrich-Bonhoeffer-Saal
10785 Berlin
        
            
            
                
                
                    
                        Begin: 
                    
                
                
                    
                    
                        
                            End: 
                        
                    
                
            
            
            
            
                
            
            Add to Calendar