This is a walk through of the process of creating a simple serverless app for finding part-of-speech tag of an input text.
1 Create virtual environment
In order to separate system-wide dependencies from this app, create a separate virtual environment with:
~ mkvirtualenv nltk_env
2 Install nltk
In the virtual environment use pip
to install nltk
package:
(nltk_env) ~ pip install nltk
3 Download nltk data
Pip doesn’t install additional files that are needed to the app, but nltk has a helper functions to download them:
(nltk_env) ~ python
Python 3.6.2 (v3.6.2:5fd33b5926, Jul 16 2017, 20:11:06)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('tagsets')
[nltk_data] Downloading package tagsets to /Users/as/nltk_data...
[nltk_data] Unzipping help/tagsets.zip.
True
4 Copy downloaded nltk data to current directory
THe helper functions download the extra data to user home directory, so you need to copy them closer to the app code:
(nltk_env) ~ cp -R /Users/as/nltk_data/* ./
5 Copy site packages from virtualenv directory
Now copy all the packages from the site-packages folder of the virtual environment to the folder with the app:
(nltk_env) ~ cp -R /Users/as/.virtualenvs/nltk_env/lib/python3.6/site-packages/* ./
To find site-packages folder you may use which python
command.
6 Now let’s create a lambda function code
import imp
import sys
sys.modules["sqlite"] = imp.new_module("sqlite") # (1)
sys.modules["sqlite3.dbapi2"] = imp.new_module("sqlite.dbapi2")
import nltk
from nltk.data import load
tagdict = load('help/tagsets/upenn_tagset.pickle')
def lambda_handler(event, context):
text = event.get('text')
tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
return {word: tagdict[tag][0] for word, tag in tagged}
(1) Since libsqlite3-dev
is not installed in container running lambda this workaround of creating dummy empty modules is needed.
7 Check the size.
There is a limit on the Lambda function code size, so check it with:
(nltk_env) ~ du -sh ./ | cut -f1
187M
8 Zip everything
To deploy lambda zip the folder:
(nltk_env) ~ zip -r -9 -q ./lambda.zip *
9 Upload to S3
Zipped Lambda code is uploaded to S3 from where it will be deployed:
(nltk_env) ~ aws s3 mb s3://serverless-nltk
(nltk_env) ~ aws s3 cp ./lambda.zip s3://serverless-nltk
10 Create lambda
Use AWS CLI to create lambda function and tell it where on S3 the code resides:
(nltk_env) ~ aws lambda create-function \
--function-name serverless-nltk \
--runtime python3.6 \
--role arn:aws:iam::1234567890:role/lambda_basic_execution \
--handler lambda_function.lambda_handler --code S3Bucket=serverless-nltk,S3Key=lambda2.zip \
--environment Variables={NLTK_DATA=./}
Key things here are
- role arn can be found in IAM (look for role with name
lambda_basic_execution
) - environment variable
NLTK_DATA
telling nltk where look for data
Now let’s create a simple javascript application that will call lambda with user input from the page:
- Go to AWS Cognito
- Create a new identity pool
- In the first step check
Enable access to unauthenticated identities
- In the
sample code
step select javascript and copyIdentityPoolId
(needed in invocation script later) - Go to IAM
- Find the role for unauthenticated access (it will look like
Cognito_serverless_nltkUnauth_Role
) - Select
Permission
and edit the role as json. It should look like this
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"mobileanalytics:PutEvents",
"cognito-sync:*"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction"
],
"Resource": [
"arn:aws:lambda:us-east-1:1234567890:function:serverless-nltk"
]
}
]
}
The script calling the lambda will look like this
<script type="text/javascript">
var button = document.getElementById('upload-button');
AWS.config.credentials = new AWS.CognitoIdentityCredentials({IdentityPoolId: 'us-east-1:8b6a0b3d-6a2a-4c7d-b617-c8dafd8a1aec'});
AWS.config.region = 'us-east-1';
var lambda = new AWS.Lambda({region: 'us-east-1', apiVersion: '2015-03-31'});
function htmlToElement(html) {
var template = document.createElement('template');
html = html.trim(); // Never return a text node of whitespace as the result
template.innerHTML = html;
return template.content.firstChild;
}
function call_lambda() {
var pullParams = {
FunctionName : 'serverless-nltk',
InvocationType : 'RequestResponse',
LogType : 'None',
Payload : JSON.stringify({text:document.getElementById("exampleFormControlTextarea1").value})
};
// create variable to hold data returned by the Lambda function
var pullResults;
lambda.invoke(pullParams, function(error, data) {
if (error) {
console.log(error);
} else {
pullResults = JSON.parse(data.Payload);
console.log(pullResults);
var result = document.getElementById("result")
result.innerHTML = '';
for (var key in pullResults)
{
var text = htmlToElement('<span>'+key+': </span>');
var pos = htmlToElement('<span>'+pullResults[key]+ '</span>');
var line = htmlToElement('<h6></h6>');
line.appendChild(text);
line.appendChild(pos);
result.appendChild(line);
}
}
});
};
</script>