Jump to section:
The following basic skills and knowledge are recommended for anyone interested in computational linguistics (also known as natural language processing or NLP). All the topics listed below are ones that we wish we had started learning earlier! They will make you more confident and versatile in your job or graduate program. Even if you don’t have time to finish the list, there is always time to start!
Under each recommendation are links to high-quality online tutorials (often free!) and UF courses (growing!). Note that some courses fulfill requirements for a degree in Computer Science, Linguistics, or Data Science. A major/minor in any two of those fields is a marketable combination!
If you are not sure yet whether you are really interested in computational linguistics, do the first recommendation in each section.
Computer Science
A general introductory course will make later classes less confusing.
- Harvard’s CS50 on EdX is a great course!
- Other options: Khan Academy’s Computer Science Principles
Fluency in Python is a must!
- You can learn basic programming concepts in any programming language, but Python is dominant for NLP and other subareas of AI.
A course on data structures and algorithms helps you think like an efficient programmer.
- There are several highly-rated courses on Udemy.com. Choose a course that uses Python.
Information theory, particularly Markov chains and information entropy, helps you grasp the foundations of machine learning.
- Khan’s Academy’s Information Theory unit in Computer Science
Bonus: A second programming language such as Java, C#, or C++ will boost job prospects.
- Find a class through your local library, an organization like Galvanize, MeetUp.com or a local community college.
Math
A solid foundation in statistics is a must!
- LIN 4005 Stats for Linguists
- Khan Academy’s AP/College Statistics or Statistics and Probability
You may need to study the following topics separately because introductory statistics often do not cover them:
- Logistic regression – Logistic Regression Tutorial on Youtube
- Probability, particularly Bayes Theorem – Khan Academy’s Probability unit and Bayes Theorem by 3Blue1Brown
A good grasp of matrix and vector operations and reading related symbols is necessary.
- Khan Academy’s Linear Algebra course
Being able to read calculus formulae without panicking is a great help. Understanding them is even better!
- You may need precalculus and/or trigonometry – Khan Academy’s Precalculus
- Take Calculus 1 – Khan Academy’s Calculus I or AP/College Calculus BC
- After Calculus 1, become familiar with partial derivatives – Partial Derivatives unit in Khan’s Academy AP/College Calculus course.
Linguistics
An introduction to language as a science is a must!
Being able to read the International Phonetic Alphabet is especially useful for speech technology.
- Interactive websites such as the MIT learning tool
- LIN 3201 Sounds of Human Language
A course in morphosyntax or semantics helps you think about language as a science (not a collection of words).
Bonus: An advanced course in syntax or semantics clarifies theoretical formalisms.
- LIN 4500 Introduction to Syntax
- LIN 4850 Formal Semantics
- Or any course that covers quantification, sentential, predicate, propositional logic, or first order logic
- Or read these books
Summer Schools
Consult with your academic advisor about getting college credit for a summer school.
- Linguistic Institutes are hosted by the Linguistic Society of America in odd-numbered years. It is a great option for linguistics newbies and linguistic majors. Several computational linguistics courses are usually offered.
- The European Summer School in Logic, Language and Information is held every summer offering courses in formal and computational linguistics.
- The North American Summer School of Logic, Language, and Information is a similar event in even-numbered years.
- The Johns Hopkins Center for Language and Speech Processing’s intensive summer research workshops on speech and language engineering are usually preceded by a two-week summer school that is open to anyone.
- The Lisbon Machine Learning School is not specific to NLP and is recommended for someone very comfortable with Python.