超淺的 Word sense disambiguation 學習心得
WSD (word sense disambiguation) is harder than POS.
- accuracy 70% vs. 95%
- WSD is harder to tag. Humans are hard to know all senses of a word.
Both areas are dominated by machine learning methods.
- WSD: SVM
- POS: HMM
- Deep approach:
- code human knowledge into computer-readable format.
- it's very hard to use in practice.
- Shallow approach:
- a statistical way.
- it works most times; when facing ambiguity, use window to reduce ambiguity.
- bass: low frequency or a kind of fish.
- bass can be distinguished by counting word co-occurrence.
- bass + sound -> low frequency
- bass + fish or sea -> fish
- A hard example: A dog barks at a tree.
- bark: 吠 or 樹皮?
- If we use window size <= 2, it's related to dog.
- A naive approach (simply match POS by the most possible one) achieves 90%. 這樣的成果還挺令人無言的, 一開始最多就只有 10% 進步空間。