Dates  Topics  Assignment  

1  April 10  Introduction to this lecture. Tagging with HMM. slides slides 
 *install Python into your laptop computer.  *learn basics of Python if you are a novice.  read a note on HMM by Michael Collins.  implement your HMMbased POS tagger. 
2  April 17  Text classification with naive bayes classifiers slides 
 read "a comparison of event models for naive bayes text
classification" by McCallum and Nigam.  install MeCab, and try to find sentences that MeCab cannot analyze correctly 
3  April 24  The method of Lagrange multipliers. Maximum likelihood estimation. Maximum a posteriori estimation.slides 
 read Sections 1 and 2 of the following tutorial on Lagrange multipliers.
tutorial by Dan Klein. Try to give an intuitive explanation of this method when the solution space is 3dimensional.  implement a naive bayes classifier. Train it on this file, and test it on this file. Each line of these files consists of a class (+1 or 1) and a segmented sentence. 
4  May 1 
Maximum likelihood estimation. Maximum a posteriori estimation. bagofwords representation of document. SVM. slides 
 MAP estimation of multinomial model of naive bayes classifiers  derive the dual problem of the optimization problem of SVM with softmargin  Use an SVM tool (e.g., TinySVM) to train a model on this file, and test it on this file. You need to write a script that converts those files into input format of the tool.  Read Section 2.1 of this tutorial. 
  May 8  NO LECTURE  
5  May 15 
NamedEntity Extraction Dependency Analysis slides 
 read Section 3 and Section 5.1 of the "CaboCha" paper":
"Japanese Dependency Analysis using Cascaded Chunking", CoNLL 2002. and answer the following questions:  which static features are used?  which dynamic features are used?  are dynamic features effective? If so, in what situation?  which kernel function is used?  what benefit does the use of the kernel function above have? (not written in the paper. Think for yourself) 
6  May 22  Loglinear Model Conditional Random Fields (CRF) slides 
 read Sections 1, 2, and 3 of the tutorial on CRF  read Section 6.3 of a book (in Japanese) to review CRF and try to understand the forwardbackward algorithm. 
7  May 29  Forwardbackward algorithm Text summarization slides 
Read the following paper and learn how the weights on words are calculated in their work: Yih et al., 2007 
8  June 5  text summarization slides  take a rest 
9  June 12  kmeans clustering, EM, PLSI slides 
 derive the update equations for the product model.  Answer the following questions with the reference to Hofmann's paper. * how is ``document'' integrated into the model? * what is the tempered EM? What is the update equation for PLSI when the tempered EM is used? * what is the foldingin? What kind of calculation is needed for the foldingin?  Implement PLSI, and train it on this file, and calculate the perplexity of this file. 
10  June 19  LDA slides 
implement Gibbs Sampling for LDA. Train it on this file. Each line of this file corresponds to a document, which is represented as a set of nouns, verbs, adverbs, and adjectives that appear in the document. 
  June 26  NO LECTURE  
11  July 3 
Check LDA code. slides, 
No assignment. But see the slides for details on the report submission (GRADING 1). 
12  July 10 
Derivation of update equations for LDA's Gibbs Sampling. Sentiment analysis. slides, survey by Kajisan 
Watch this video (10 minutes). GRADING 2: Read the submission, write the review form, and send it to me by July 23rd? 
13  July 17 
Linguistic resources, Conference presentations slides, 

14  July 24  NO LECTURE 