May 9, 2018

Using NLTK library with AWS Lambda

This is a walk through of the process of creating a simple serverless app for finding part-of-speech tag of an input text. 1 Create virtual environment In order to separate system-wide dependencies from this app, create a separate virtual environment with: ~ mkvirtualenv nltk_env 2 Install nltk In the virtual environment use pip to install nltk package: (nltk_env) ~ pip install nltk 3 Download nltk data Pip doesn’t install additional files that are needed to the app, but nltk has a helper functions to download them: Read more

April 16, 2018

Extracting keyphrases from texts: unsupervised algorithm TopicRank

Keyphrase extraction is the task of identifying single or multi-word expressions that represent the main topics of a document. There are 2 approaches to extract topics (and/or keyphrases) from a text: supervised and unsupervised. Supervised approach This is a multi-label, multi-class classification algorithm, where following features can be used as an input: text converted to bag-of-words text is treated as a stream of vectors, which are pre-trained word embeddings For bag-of-words linear SVM is a good classifier. Read more

© Alexey Smirnov 2023