Connecting Alexa to a Raspberry Pi--Why bother? A hellish oddysey.
Let's start at the beginning. I got myself an Echo Dot for my birthday, and in my typical overenthusiasm for all things new and fresh, immediately set about trying to program new skills into it. This is what I do for almost any new piece of technology I come into contact with. This particular device cost me almost 2 months of effort and over $75 in additional unexpected expenses to achieve what I wanted.
For the TLDR, the final setup I used was as follows: Alexa -> Alexa Skill Kit skill -> Lambda nodejs function -> SQS message storage -> Nodejs + Express receiver on a raspberry pi
Alexa is the voice-activated service used by the Echo, Echo Dot, and other Amazon smart home devices. The cheapest ones will cost you somewhere around $35, but the setup is almost painless. There is an echo app on most mobile devices, which once installed just requires you to walk through a basic internet connection setup. Once that's done, Alexa can answer basic questions like what's the weather, time, and perform simplistic functions like creating a to do list. Great right? Not really, since almost everything she can do out of the box can already be done by any smartphone--but she can't make calls or send texts unless you connect her to your phone. That's also simple enough to do, but we're still in the very shallow end of usefulness of a device that is touted as a speaker system, but has atrocious sound quality (subjective opinion here, but a pair of $35 speakers I compared it to from Target, which were also awful, were far superior to the sound quality of the Echo Dot).
Alexa cannot connect to anything in your local network since it relies on an internet based service to process all of its commands. That's a gross oversimplification (and borderline untrue, since I'm about to detail how I rigged mine to do exactly that) but the gist of the Echo is that its meant for things like public API connections or pre-baked skills.
Its speech pattern recognition software should be running separately from the functional pieces it relies on (lambda), but that doesn't appear to be the reality. Instead, Alexa can connect easily to any bluetooth or "pre-approved" smarthome devices it has an app already written for, such as Philips Hue or Harmony Hub, but it CANNOT connect to any other smart home devices you may have if this sort of app doesn't already exist in its library. Odds of her having an app for your smart device that actually works are low, and I don't know if this is because Amazon wants to force customers to buy its own name brand products or if its just too lazy to dedicate developer hours to building useful software for one of its most popular products. Case and point, I wanted to connect Alexa to my smartTV, a Vizio, but there is no app for this smartcast tv yet. I found one for a samsung smart tv, but it had a measly 2 star rating, and another for Roku with 2 and a half...so you really try your luck with the available skills in Alexa's library. I find no fault with the devs who are contributing their free time to working on these projects, however, because I am one of them. Its just ridiculously difficult to connect Alexa to your LAN.
In case its still unclear, I was very disappointed to find all of this out. There are already individual custom apps for most smart-home devices, and some of them can even be voice controlled through your phone already, so finding out that Alexa cannot interface with smarthome devices when she doesn't already have built-in software is pathetically lame. I wasn't ready to give up though.
Now you'd think the next step would be to simply create a custom app that could connect to whichever device you're trying to use. You'd be right, that is the next step, but you'd be dead wrong if you thought that process would be simple.
For starters, unless you want to shell out tons of additional money for one of the "hub" devices to connect through (Harmony, AnyMote, both with 2.5/5 stars in the Skills library), you'll need to set up your own local server, which will still cost money (just significantly less, we're talking $80+ for a hub, and ~$30 for a local server). For a hub that may or may not work (based on reviews) with Alexa's prebuilt apps, I wasn't willing to spend another $80 on top of the initial echo purchase. In my case, I bought a raspberry pi for this purpose.
Let me back up for a minute. Alexa has multiple types of custom skills, smart home is just one of many. She can perform basic programming functionality through an A-B response system, something akin to A:"Alexa, where is Alexandria?" B:"Alexandria is in Egypt." She can also take basic input from users, such as A:"I'd like to go on a trip." B:"Where would you like to go?" A:"Chicago." B:"So you would like to go to Chicago, then?"B:"Yes" There is absolutely nothing wrong with this sort of basic A-B response stuff, its great if you want to make an app to answer a question you already know the answer to, or have her google an answer for you or something (just don't bother trying to get your app approved if it contains the word google, apparently it will be auto-rejected because of some sort of pissing contest going on between Amazon Echo and Google Home). But these skills are rudimentary, and not particularly useful for making Alexa part of your smart home. And once again, one is left to wonder why they can't just ask their phone to do something similar, as there are millions of apps already available for download that do practically everything you could program Alexa to do so far (just with significantly less effort). To be clear, you could program your phone to do practically everything Alexa can do, including hooking it up to a smart tv (which already has an individual app on my phone, in this case), the one caveat being that if you don't have your phone on you all the time, you have no way of accessing those programs conveniently. Additionally, some of the functionality available through both Alexa's skills and the custom apps for each smart home device you may own, are extremely limiting in their scope of functionality.
For eaxmple, I have Philips Hue smart bulbs, and while Alexa can do things like change a scene or dim the lights, if I ask her to turn the kitchen red, she cannot do it unless I create a specific scene for a red hue. If I go in through the Philips Hue App instead, I can change specific room's colors, but I cannot "group" my lights beyond 1 room (as in, if I want to put 1 light into 2 different groups, its impossible) and setting up something like a "party lights" feature is impossible, and since I have an open kitchen/dining room area, its obnoxious that I have to ask for 2 commands every time I want something changed in that wider area. There are additional third-party apps for something like this, and some of them seem to be quite powerful, but they are once again, hit or miss on reliability, being functional at all, and serving specific custom purposes for what you're looking for.
Worse than all of that, Alexa's "testing" functionality on the amazon site is not functional for any use case involving user input, most custom use cases, or anything outside of basic A-B responses involving no user feedback. There is a third-party testing site someone wrote, but again, it does not provide you with any response errors or any other useful information about what is actually being returned from the Echo. I had to program Alexa to literally read out JSON response strings to me as I was going through programming her to discover all this, and if you've never done that I highly discourage it. It is very user-unfriendly and not particularly useful unless you are desperate and determined.
Back to the real subject here, setting up a raspberry pi is not what I would consider modern or convenient either. They expect you to own an HDMI monitor, and hooking up the pi to a TV screen alone won't work. But let's get real, who owns an HDMI monitor in this day and age, I live in the world where everyone uses a laptop. And hooking up the pi to a laptop screen will NOT work since HDMI input/output are 2 very different beasts, and to achieve that sort of configuration you've got to do a lot of SSH/Virtual screen program downloading. I had to download and use no less than 3 programs to get my pi set up, those being Putty, IP Scanner, VNC Viewer. And I still wound up setting it up through command line because VNC Viewer doesn't work out of the box for raspberry pi setup. Next problem being that the raspberry pi requires a USB power source, which any cell phone charger will work as, but that's just one more expense in this ridiculous effort to hook Alexa up to my TV, because if I used my own cell phone charger for the pi, I wouldn't be able to use it for my phone since the pi has to remain on at all times to work as a continuous local server. Additionally, raspberry pis have a voltage limit on what they can handle, and most cell phone chargers far exceed it, putting your pi at risk if you decide to use a cell phone charger to power them. We're now up to a total expense of about $70-$80 for this entire setup. Want to run your raspberry pi without an ethernet cord (on the wifi) that's going to add another $7, or $1 for an ethernet cord unless you buy the more expensive Pi 3+ that comes pre-installed with wifi drivers and a network adapter (around $60 for a starter kit).
Of course, keep in mind you can potentially bypass the hassle of the local server setup by throwing money at the problem and buying a hub. But there is no guarantee your hub will work with the pre-made Alexa skills for your particular hub, and if it doesn't, then you just wasted even more money.
Fantastic, you've now got a local server set up, all you have to do is connect it to the Echo. Simple, right? WRONG.
Alexa requires several things to actually have any sort of custom functionality, those being a developer "Skill" or "Smart Home" application, and a lambda function. By themselves, they can perform the A-B type responses mentioned above. However, to get everything hooked up to a local network server, you're going to need more than that. You'll either need an API build that is publicly hosted somewhere where the lambda function can hit it (PITA and a security risk and expensive), or an SQS setup (uber PITA), or to directly plug your local server directly into the echo (less painful, but now you'll likely need a wifi dongle, also note that I did not attempt this kind of setup so I cannot confirm that it would even work). Lambda functions and AWS skills are housed on two separate domains with 2 separate logins by Amazon (don't get me started on their UI/UX), with an extremely convoluted process involved to connect the two involving ARN codes, endpoints, unnoticable dropdown menu options, and a broken/not-fully functional testing interface as mentioned above.
Again, this step is avoidable with the purchase of a hub, but the liklihood of your hub working with your smart home device and the echo is around 50% according to user reviews. Not great odds.
I went the route of SQS, so I would not have to expose my raspberry pi publicly to the internet, and could just run it as a local server.
Let's start with the basics on how this is set up. The first thing you'll need (top level) is an endpoint application skill that Alexa can understand, and hit your lambda function from. This is set up through the Alexa skills section of one of amazon's many sites. The actual setup here is clunky, overly verbose, and not particularly intuitive, but if you have no intention of testing anything, it otherwise functions fine. The intent->utterances section makes sense after you've messed around with it for a couple hours (and are a programmer...) but it will take another few hours to make sense of the variable or user input sections that can be appended to those utterances in dropdown-ish menus. The clunk is real, and its cumbersome to deal with. If you spend an hour ahead of time watching some videos on how to use it, you might be able to shave off a few hours in learning time, but many of the videos I found were for a previous interface that is no longer visible on the actual amazon skills page.
Once you have your skill set up, the next step is to set up a lambda function on the backend for it to access when it fires off that skill. This interface was...completely different. I am not sure if it was better or not. Equally overly-verbose, slightly less clunky, not at ALL intuitive, you will have to navigate a maze of hidden dropdown menus, security options and more to make this work. Pro tip, the npm aws-sdk is pre-installed on lambda functions, you don't need to include the node modules directory in a zipped upload if that's the only package you're planning to add or use. Took me at least a day or two of head-banging before I petitioned help from a coworker since that is not really documented anywhere (surprise!). The testing interface, once again, doesn't seem to be fully functional. I tried setting it up with an event.request.object in JSON as expected by the code, but no dice, still bailed out constantly thinking that the intent was being called on an undefined. Got around this with some ternary logic pointing to a default...not pretty but it worked. The expectation is that at the end of having an alexa skill and a lambda function hooked up successfully, you'll be almost ready to go. This is a comforting lie you'll try to convince yourself of for a while.
Because there is another step--setting up the messages to send to SQS (and be received off your server). This step was one of the most excruciating.
The SQS interface is nice looking. Intuitive, friendly, no crazy hidden dropdowns or missing functionality like multi-file inline editing (cough lambda). It will lead you into a false sense of security, because once you try to set up security for this thing...you'll probably cry. I set up no less than 6 accounts (key + secret, cognito with auth, cognito without auth, admin permissions, hard coded, everyone has access...) trying to get the stupid credentials to validate for a basic SQS send through lambda. Nothing worked. The solution (for me) was to use a config.json file that had my credentials in it, and just import that file. I deleted out probably 5 roles/users after that nonsense was taken care of, not counting the permissions I yanked from the lambda function's supplied role/policy.
All in all, its taken me over two months to get my Echo to be what I consider the bare minimum of useful as a smart home device. I work as a professional developer with years of experience in the industry (although, full disclosure, this was my first stab at pretty much anything deep-dive aws or raspberry pi related), and I've worked with all of the languages involved in setting up a custom Echo skill for multiple years. I can only imagine how horrifically painful this process must be for someone who is not a developer, does not understand how (or what to do) to achieve something like this, and who doesn't have money to throw at the problem. I imagine for most people, the echo is a decorative brick, a circular (and lousier) voice-activated phone.
For what it is, I give it 2/5 stars. For what it SHOULD be, I give it a -2/5 stars.
In the end, my Echo-Pi setup can now process any LAN API requests I want it to for my smartTv, my PhilipsHue Bulbs or anything else that isn't available over the wider internet. I can tell it to turn on my PS4 and it will switch on the TV, change the input to the appropriate setting, send a WOL request to my PS4 (hit or miss), and dim the lights for some game time all at once. This is one example of the sort of things the Echo can do once it has this kind of setup, whereas before it could only dim the lights.
And Amazon wonders why no one is programming awesome software for their device.