׻ءComputational Linguistics

2014ǯ3-4(10:45-12:15)
ֵG311

Schedule

Dates Topics Assignment
1 April 10 Introduction to this lecture.
Tagging with HMM. slides slides
- *install Python into your laptop computer.
- *learn basics of Python if you are a novice.
- read a note on HMM by Michael Collins.
- implement your HMM-based POS tagger.
2 April 17 Text classification with naive bayes classifiers slides - read "a comparison of event models for naive bayes text classification" by McCallum and Nigam.
- install MeCab, and try to find sentences that MeCab cannot analyze correctly
3 April 24 The method of Lagrange multipliers.
Maximum likelihood estimation.
Maximum a posteriori estimation.slides
- read Sections 1 and 2 of the following tutorial on Lagrange multipliers. tutorial by Dan Klein.
Try to give an intuitive explanation of this method when the solution space is 3-dimensional.
- implement a naive bayes classifier. Train it on this file, and test it on this file.
Each line of these files consists of a class (+1 or -1) and a segmented sentence.
4 May 1 Maximum likelihood estimation.
Maximum a posteriori estimation.
bag-of-words representation of document.
SVM.
slides
- MAP estimation of multinomial model of naive bayes classifiers
- derive the dual problem of the optimization problem of SVM with soft-margin
- Use an SVM tool (e.g., TinySVM) to train a model on this file, and test it on this file. You need to write a script that converts those files into input format of the tool.
- Read Section 2.1 of this tutorial.
-- May 8 NO LECTURE
5 May 15 Named-Entity Extraction
Dependency Analysis
slides
- read Section 3 and Section 5.1 of the "CaboCha" paper": "Japanese Dependency Analysis using Cascaded Chunking", CoNLL 2002.
and answer the following questions:
-- which static features are used?
-- which dynamic features are used?
-- are dynamic features effective? If so, in what situation?
-- which kernel function is used?
-- what benefit does the use of the kernel function above have? (not written in the paper. Think for yourself)
6 May 22 Log-linear Model
Conditional Random Fields (CRF)
slides
- read Sections 1, 2, and 3 of the tutorial on CRF
- read Section 6.3 of a book (in Japanese) to review CRF and try to understand the forward-backward algorithm.
7 May 29 Forward-backward algorithm
Text summarization
slides
Read the following paper and learn how the weights on words are calculated in their work:
Yih et al., 2007
8 June 5 text summarization slides take a rest
9 June 12 k-means clustering, EM, PLSI slides - derive the update equations for the product model.
- Answer the following questions with the reference to Hofmann's paper.
  * how is ``document'' integrated into the model?
  * what is the tempered EM? What is the update equation for PLSI when the tempered EM is used?
  * what is the folding-in? What kind of calculation is needed for the folding-in?
- Implement PLSI, and train it on this file, and calculate the perplexity of this file.
10 June 19 LDA
slides
implement Gibbs Sampling for LDA.
Train it on this file. Each line of this file corresponds to a document, which is represented as a set of nouns, verbs, adverbs, and adjectives that appear in the document.
-- June 26 NO LECTURE
11 July 3 Check LDA code.
slides,
No assignment. But see the slides for details on the report submission (GRADING 1).
12 July 10 Derivation of update equations for LDA's Gibbs Sampling.
Sentiment analysis.
slides, survey by Kaji-san
Watch this video (10 minutes).

GRADING 2: Read the submission, write the review form, and send it to me by July 23rd?
13 July 17 Linguistic resources,
Conference presentations
slides,
14 July 24 NO LECTURE

Grading

Basically, grading will be based the following two things:
1. code for LDA: you are to write and submit a code of LDA (due on July 16).
2. review of a research paper: you are to read a research paper and write its review (due on July 23).
Back to my top page

¼ (TAKAMURA, Hiroya)
226-8503 ͻжĹĮ4259, Mail-box R2-7
̩ظ
phone & fax 045-924-5295