Installing NLTK on Heroku

I love the Natural Language Toolkit (NLTK) for Python but one of its problems (shared by many mature libraries) is that it has not been updated to work with the latest packaging standards. For example:

pip install nltk

will not actually work. You will get an error like this:

File "nltk/yamltags.py", line 1, in
import yaml
ImportError: No module named yaml

If you add PyYAML to your requirements.txt before NLTK, pip install will work locally. This is what my requirements.txt file looks like:

Django==1.3
psycopg2==2.4.2
http://pyyaml.org/download/pyyaml/PyYAML-3.10.tar.gz
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.1rc1.tar.gz

However, pip install will still fail on Heroku even with this requirements.txt file. Thanks to Derek Willis for the workaround:

  1. Remove NLTK from requirements.txt but leave PyYAML

  2. Do a git push to heroku

  3. Add NLTK back in after PyYAML in requirement.txt

  4. Do another git push to deploy

NLTK will now install successfully on Heroku!