How can we help?

Design? Check. Strategy? Check. Bulletproof code? Check! People who can manage it in an agile and efficient manner? Check. Someone to help you create your next big product? Of course.

Denver

- 1201 18th St. Suite 250, Denver, CO 80202

Grand Rapids

- 125 Ottawa Ave NW Suite 270, Grand Rapids, MI 49503

Blog

BLOG:Alexa, Is My Food Ready?

Alexa, Is My Food Ready?

Amazon Echo

Introduction

A few months ago, I wrote a blog post on how I built an internet connected grill thermometer so I could check on my low and slow cooking from any of my devices and then get push notifications when it was ready (you can read that post here). I recently acquired an Amazon Echo and was curious if I could somehow get this device to tell me the status of my grill with just my voice. For those that aren’t familiar with the device, it’s a product from Amazon that’s a standalone digital assistant similar to Siri, but it’s always on and ready for commands. It looks just like a small speaker that can sit in your house and whenever you have a question for it, you can just say something like, ‘Alexa, what’s the weather’ and it’ll tell you the weather. It does much more than that, but that’s the gist.

Skills

The question for me was whether or not I could add some custom functionality to the device. It has a bunch of built-in commands, but I wanted to make my own. Thankfully Amazon thought of this and added the concept of ‘Skills’ that your Echo can use. A ‘Skill’ is a custom action developers can create to do whatever they want and they can either use them on their own device or they can publish them publicly for others to use. There is an entire list of skills on the Alexa app that you can add to your Echo.

Let’s take a look at how you can create your own skill and then we’ll see how we can connect that skill to our BBQ temperature data.

In order to initiate your custom skill, you have to speak to the Echo in a very specific format. It’s worth noting that in order to work with a custom skill, you have to include an invocation name for your skill and tell Alexa to ask that name to do something. It’s not the most natural way to ask the Echo to use your skill, but it’s required so it knows to use your skill versus some other skill. I’ve heard rumors that this might go away in the future, but for now, we need to deal with it. Let’s review in detail the exact format it requires:

User: “Alexa, ask Grill Log for the temperature”

  • “Alexa” is the required wake word for the device, unless you changed it to ‘Amazon’.
  • “ask…for…” is the phrase to request a specific skill to do something. There are multiple variations of words supported here, you can see them all here.
  • “Grill Log” is the invocation name I’ve registered my custom skill to use, and it must be two words.
  • “the temperature” is the intent of the user (not related to Android intents) and will get passed to your service for processing.

Now that we know how to interact with a custom skill, let’s take a closer look at how we can get that skill on our device. Getting the skill available on your device requires you to do two things. First, you need to decide where you will process the user’s requests, either in your own web service or using AWS Lambda. Second, you have to register the skill on the Amazon Developer site and provide it information about your skill such as the name, endpoint, and interaction model. Don’t worry if this doesn’t make sense yet, we’ll go through it in detail below.

Processing User Requests

As stated above, you have two options to process the user’s request, either in your own web service or using AWS Lambda. If you already have a web service infrastructure setup, it might make more sense for you to use the web service option, but I won’t go into the detail of what’s required to do this. You can read more about the details of that on the Amazon dev site. It’s much easier to use the Lambda functionality as it’s already within the Amazon suite of tools. If you aren’t familiar with Lambda, it’s a service that Amazon Web Services (AWS) provides that can run code in response to events. You can read more about Lambda here and read through a tutorial here. That code can be written in a number of different languages, Javascript (Node.js), Java, or Python, so you can pick the one you are most familiar with. You pay for how many requests and how much processing time you use, but you get one million requests free per month using the free tier.

In our example, we’re going to use Lambda, so to create a new Lambda function, you log into the AWS console, click Lambda, and then Create a Lambda Function. From here you can filter a list of Lambda templates to get started. In fact, they already have some Alexa templates for you to use. If you type alexa into the filter, you’ll get three different templates to choose from. Choose the alexa-skills-kit-color-expert template for Javascript and you’ll have a simple template to work from. This will have some sample code that shows how to process different intents and work with sessions. For now, let’s just create the function and leave it as is. Take note of the ARN name it gives your function as we’ll use that when we register our skill with the amazon developer site. It should be something like - arn:aws:Lambda:us-east-1:123456789:function:yourFunctionName. We’ll come back to filling out the details of our code after we setup the interaction model. I find it easier to work out the intents and data before finishing up the code. If you don’t have access to AWS, you can see what the sample looks like here and see a screenshot of the AWS Lambda console below.

AWS Lambda Console

AWS Lambda Console

Interaction Model

Now that you have some code that you can call hosted up in the AWS cloud, we can define our skill on the developer site. If you go to developer.amazon.com, and then go to the Alexa section, you can register your custom skill here. It’s worth noting that in order to test your custom skill, you must have your device registered with the account you have registered on the developer site. If you’d like to make that skill publicly available, you can complete the registration process and then Amazon will need to certify it for use. The certification process takes anywhere from a few days to a week. Once certified, it will be in the list of available skills that anyone can add to their device. In my experience, the review team is very particular about skills it certifies, which I think is a good thing, but I’d highly recommend you review the submission guidelines before you send it for certification.

The major steps in registering your skill are choosing an invocation name, pointing the skill to your endpoint, defining the intent schema file, defining any optional data slots, and defining the sample utterances that the user uses to interact with your skill. Let’s break those steps down and see what they are all about.

The invocation name is the word or set of words that identify your skill as the one the user wants to use. This can be tricky to choose, as you want it to flow nicely in the required word order of the ask or tell pattern. The name shouldn’t be complicated and it should be easy to pronounce. You’ll want to make sure you avoid any built-in names like weather, for example. For my skill, I’ll just name it Grill Log.

Next we set the endpoint of the custom skill to the Lambda function we created above. You’ll take the ARN name of the function and save it on the skill so it knows where to go when it sends the user’s requests. If you were using a custom web service, you’d put that here instead of the ARN.

Interaction Model Screen

On the Interaction Model screen, I think it’s best to walk through these sections from the bottom up. Let’s start with the Sample Utterances. Here we’ll list all the potential ways someone can interact with the skill. For my Grill Log skill, I’d like to have two types of intents. One where the user can get the temperature of the grill and the meat together, and one where I can tell it which one I want specifically. For the first intent, I’d list something like:

StatusIntent what’s the status

In this line, the StatusIntent is the name of the intent I want to use. This is something you define and you’ll use this key in your service to check on which intent was passed in. So when the user says something like, ‘Alexa, ask Grill Log what’s the status’, the requested intent would be ‘StatusIntent’. Then in your processing code you’d react to that intent accordingly.

Now there could be multiple ways to say the same thing. For example:

StatusIntent can you tell me the status

StatusIntent what’s going on inside my grill StatusIntent what’s going on

You can list additional utterances that would request the same intent. You just use the same intent name you used for each of the different utterances and they’ll all end up processing the same way.

Those examples are pretty straightforward. For the other type of intent I want, I want to know what the temperature of the grill or the meat is. I could list them out as two different intent names, but that wouldn’t be optimal as I’d have to duplicate all the utterances for meat or grill. Instead we’ll use the concept of slots and custom slot types. You can think of a slot as a variable that you can use inside of your utterance. To set up an utterance to check the grill or the meat, you’d do something like:

TemperatureIntent what’s the temperature of the {temptype}

In this utterance, temptype is the slot that will be used in place of ‘meat’ or ‘grill’. When you want to use a custom slot like this, you need to define that slot type and what values it expects in the developer portal. To do this you’d click Add Slot Type on the Interaction Model screen. For our example, you’d give it a type of LIST_OF_TYPES and then list the values of grill and meat. You might ask why didn’t we use a type of temptype since that’s what we’re using in the utterance. When you define custom slot types, they are available for use in any of the utterances you create. You might have an example where you have two slots in your utterance both using the same data types. Think of the slot type as a reusable data type that’ll you map your slots to. That leads us to the last section we haven’t talked about, which is the Intent Schema.

The Intent Schema is a JSON configuration string that defines your list of intents and what slots map to what data types. This configuration ties all of what we’ve been talking about together. This file will list out an array of all the intents you’ve set up utterances for. You will also want to include any of the built in intents like AMAZON.HelpIntent, if you want your skill to react to them. For our example, we’d have a configuration file like:

{
"intents": [
{
"intent": "AMAZON.HelpIntent"
},
{
"intent": "StatusIntent"
},
{
"intent": "TemperatureIntent",
"slots": [
{
"name": "temptype",
"type": "LIST_OF_TYPES"
}
]
}
]
}

Here we’ve listed our two intents we set up utterances for. For the TemperatureIntent, we’ve defined a section for the slots we’re using and mapped the slot named temptype to use the data type of LIST_OF_TYPES. With this configuration file, we’ve now tied all our configuration together and we’ll reference things from here in our processing code so we know what the user wants the skill to do.

Processing Code

Now that we’ve registered our skill and have a basic Lambda function setup, we need to fill out the detail of what the skill is going to actually do. Since our code is just a Node.js script, we can connect to whatever we need to assuming there is the ability to connect to it. For my skill, I need to connect to the database of BBQ temperatures and get the last reading.

Between the time I wrote the original BBQ article and now, Parse.com has unfortunately made the decision to close its operation, so I’m in the process of migrating my code from Parse to AWS. To replace the database that Parse was using, I’ve decided to use the AWS DynamoDB to hold the temperature data. I’m not going to go into the detail of how I’ve updated my existing scripts to use AWS instead of Parse, as that’s enough to be a separate article. Let’s just assume that the Grill data is getting saved into DynamoDB and we need the Alexa skill to read the data from it.

My approach to this would be to write a Javascript service file that would handle the logic on grabbing the data from the DB and return the details of it. Then I’d have the Alexa script call this new Grill service to get the data. If I had to move the data to another database or change the processing code of that, I wouldn’t want it affecting the Alexa code. You can see my example Grill service code here.

Following the code that’s used in the sample template, let’s talk through the flow of what happens when your Lambda function gets called. It first enters the handler function where it’ll check to see what the request type is. A ‘LaunchRequest’ is the type when you invoke the skill without an intent. For this type of request, you’ll want to respond back with a welcoming message about what the skill does and how to use it. The other request type is ‘IntentRequest’. This type means the user has provided a specific intent they want. Inside your code, you can pull out the intent and any slot values that it may have, and then figure out what to do with that intent. In the sample code, you can see how the intents are being checked in ‘onIntent’. Here you’d want to list out all the intents you want to respond to, including any of the built-in ones. This list should match what you’ve set up on your interaction model.

Once we have an intent and slot information, we know what the user wants to do. In my code, I’m going to call the Grill Service method that will use the ‘aws-sdk’ node package’s (which is automatically available on AWS, no need to have it in node_modules) DynamoDB.DocumentClient to fetch the last record from the database. I’ll process the values and send them back to the Alexa code to process those values into a sentence to return to the user.

Now that you have a sentence you want to return to the user, you’ll have to build a response object that contains the information you want to send back to the user. This response object will contain four main pieces of information:

  • outputSpeech - The text string to speak to the user.
  • card - The information to put on the card object the user sees in the Alexa app.
  • reprompt - The text string to speak if the user doesn’t respond within a small time period.
  • shouldEndSession - A flag to either close the session or keep it open for the user to respond.

The sample code from the template you selected already has some utility methods in it to help create this response object, so I’d recommend that you use those to handle the response back to the user.

Now that we’ve handled the incoming request, fetched the data, and sent back a response, our function is ready to test. You can view the completed Alexa skill file here.

Testing Your Skill

There are a few different ways you can test our your skill. The first way I’d test the skill is by directly testing the Lambda function using the web tools. If you are logged into the AWS console, you can load up the Lambda function and configure it’s test data and test it there. The nice part about doing the testing here is that you immediately see the response and any console logging for that request. Hopefully, when you test the function, you’ll see your response object with all the speech you’ve generated. It gets a little tricky if you have to work with session data, but works well for simple tests.

Another way you can test your skill is in the Amazon Alexa Developer site. That’s the site you used to register your skill and define its interaction model. If you bring up your skill, you can go to the ‘Test’ section and test your skill there with the ‘Service Simulator’. What’s cool about this section is that you can hear the responses it would generate on the device. You can enter in an utterance to test, and it’ll show you the request and response from Lambda, but then you can click a play button to hear what the voice generation actually sounds like. On this page, you can also test the Alexa speech generation directly by typing in anything you’d like to hear, so you can figure out if certain words might be problematic. Related to speech output, if you need to finer control over how things are said you can use Speech Synthesis Markup Language (SSML) to do things like insert audio files or speak words differently.

Service Simulator

The last way you can test your skill is by simply asking your device. Your skill will be set up on the device tied to your developer account, and you can test it there. This is a good way to test how it actually sounds saying the different utterances and how you might need to change or add new ones if you feel it’s difficult to articulate exactly what you’d like it to do.

Once you’ve completed testing your skill, you can either leave it in development so it’s only available to you or can you send it for certification for anyone to use. Amazon doesn’t charge to put your skills in the Alexa store, so if you think anyone can benefit from your developed skill, there is no reason not to try to get it certified. The process takes anywhere from a few days to a week and you’ll want to be sure you’ve reviewed the submission checklist to ensure you’ve satisfied all their requirements. Some things that are easily overlooked before submission: not supporting the help intent, not responding to the stop/cancel intent, and not having enough variations of your utterances. You’ll also want to be sure that when you show any example phrases in your description, they have a comma after ‘Alexa’, and include all the needed phrase words. One nice touch from the review process is that any issues they find or recommendations they have will be emailed to you, with examples that are specific to your code, so it’s pretty painless to get fixed.

Conclusion

Here we’ve walked through the steps to creating an Alexa skill that can read the temperatures out of my Grill Log database. Amazon has made it pretty simple to create these skills, all you need to worry about is figuring out how you will connect to your data. In my case, I was simply calling into another AWS product, but you could be calling into anything that’s available on the internet. The format of invoking the skills on the device still feels a bit awkward at times, but I bet that gets improved over time. I’m amazed at how many things the Echo has already integrated with and it’ll be interesting to see if Apple ever creates something similar with Siri and HomeKit to compete in this space.

One common question I often hear is if/when the Echo will support the concept of push notifications. Right now the device only wakes when it’s told (with the exception of timers/alarms), so there is no way to have something wake it similar to getting a notification on your phone. This seems like it’ll be a tricky problem to solve as you’d not want your Echo talking to you all day for your notifications that happen. I’d guess at some point they will release this feature and maybe have users opt-in to them by ensuring that they know what they are going to get notified on.

Hopefully, this article shows you how simple developing Alexa skills can be and how you can potentially have your users utilize their voice to interact with the services you have.

(Youtube video can be found here.)