Technical Lessons from Ship 1 – Building My First Twitter Bot

For my first ship project I built a twitter bot, @realHellToupee. Below is the technical digest for lessons learned, head over to the non-technical post for other lessons learned and an intro to what these ship projects are all about :)

I thought this project was going to be super straightforward. I couldn’t have been more wrong.

The project seemed pretty straightforward. Scan twitter for new tweets with “donald trump,” replace all instances of “trump” with “WhISIS,” and post back to twitter. I ran into a host of snags along the way, and this post documents those learnings for future reference. To read the source code with comments head to github. Speaking of documenting things…

Documentation

The first lesson I learned was about clear documentation. I initially planned to use the REST API to search for tweets to edit. I needed to search for recent tweets, and thus had to pass a date parameter to the API.

Since I started by forking another twitter bot’s code, I found out about a great library called Twit that streamlines accessing the twitter API. Unfortunately, Twit‘s documentation contained an ambiguous date. I couldn’t tell if proper formatting was MM-DD-YYYY or DD-MM-YYYY.

Clearer documentation here would have saved me a half-hour of searching through the various tabs of Twitter documentation I had open.

On the flipside, I did get to make my first pull request :)

Node Libraries

Getting dates with javascript

No not those kinds of dates!

Getting (calendar) dates on javascript is quite weird. I ended up switching to the Streaming API instead of the REST API since Streaming gives most recent results (whereas REST sorts by some kind of relevancy feature).

Here’s the code for how to do it.

var today = new Date();
var dd = today.getDate();
var mm = today.getMonth()+1; //January is 0!

var yyyy = today.getFullYear();
if(dd<10){
dd='0'+dd
}
if(mm<10){
mm='0'+mm
}
var today = dd+'/'+mm+'/'+yyyy; // we'd have to change this for twitter
document.getElementById("DATE").value = today;

While this wasn’t useful this time around, I could use this to name logfiles by date if I were outputting to *.log files instead of just console.log() for my debugging.

Reading secrets files synchronously and asynchronously with node

I needed a way to load the Twitter secrets (various API keys) into the Twit library so it could make requests for me. My solution was to edit the .gitignore to ignore some specific text files in which I had stored my secrets. This way I could git push to my heart’s content and still keep my secrets safe.

Node has a built in library called fs to handle reading and writing files. However, using fs.readFile() failed to load the secrets while using fs.readFileSync() worked fine. My suspicion (which StackOverflow helpfully confirmed) is that the first command makes an asynchronous call, giving execution back to the program without waiting for the disk IO operations to conclude. This results in my code calling the Twit library prematurely, before the secrets files  had a chance to be read.

Repeat a subroutine periodically with node

How to make my code periodically run a subroutine, like updating the current date or polling the Twitter streaming API for fresh quotes to botify? The answer is pretty simple: use either setInterval or setTimeout.

There are subtle differences with code timing, and setTimeout will run the code only once unless you nest it. This article by John Resig goes into more detail, including how to use the timer id‘s to clear them – useful if you want a function to repeat periodically until some condition is fulfilled.

Check my code to see it in practice, along with helpful comments regarding other implementation details.

Node.js and Packages

I forked the code from darius’s metaphor-a-minute project, which helpfully provides the following build instructions:

npm install node-restclient@0.0.1

npm install twit@1.1.6

npm install express@2.5.9

node metaphor.js

So of course the first thing I did was update express to 4.x, whatever its latest version was. Big. Mistake.

See, programs are really built ontop of lots of other programs, each of which has its own spidering set of dependencies, hacks, and odd bugfixes. It’s a marvel any of it hangs together at all, which is why specific versioning is so key. (No, it did not work after what I did :( )

Updating all the packages instead of npm install-ing the version specified in the readme.md is a bad idea.

I remembered seeing a nifty way to install dependencies for other programs: just typing npm install in the local directory of the source code would automatically install exactly the right versions of needed modules. Turns out it’s simple to generate an npm manifest in the form of a package.json file that lets you package your programs like this too.

Simply type npm init and you’re off to the races. Thankfully the defaults are pretty sensible (including the nifty ISC – Internet Software Consortium – license, essentially a simpler form of the permissive MIT license).

Tweaking the package.json defaults

Importantly, npm init gave me a bit of a headache with the carat operator ^. You can see the series of changes I made trying to get the project to build correctly on heroku. That first link will clue you in to the nuances of the ^ operator in npm manifests, but essentially it means “accept all versions >= to the version specified, so long as you don’t rollover to a major release.” This answer on Stack Overflow has good guidelines on which operator (^ or ~) to use when.

You’ll see from my manifest that my mix is very strange – my npm is updated, but my node is very old. Should have updated my own system before starting development!

Deploying on the Cloud

As mentioned previously, storing secrets requires some thought when deploying remotely. My solution was to use a local secretstore for local development, and inject environment variables for remote deploy.

Node.js can happily read environment variables, and it’s easy to provide a graceful fallback in case they’re not present.

function keepAlive(){
http.createServer(handleRequest).listen(process.env.PORT || 5000);
}

Heroku’s version, “config vars,” can be found here, along with advice on how to deploy a node app on heroku (e.g. configuring the Procfile).

Sleeping on the cloud

I thought everything was hunky dory and Heroku was working (I saw a couple tweets go out), but then I saw the posts stop. Checked the logs using heroku logs --app APPNAME and it turned out that periodically my app was shutting down. No idea why!

Eventually I learned that heroku has changed their policy, and free dynos must now sleep for 6 out of every 24 hour period. I did learn a nifty way to force heroku to restart my dyno without using git push heroku master to force a new build: heroku restart. Previously I had been shuffling whitespace around to create a new git commit.

Running after logging out

I ended up switching to a differnt cloud provider instead, simply ssh’ing into an Ubuntu instance to configure an environment to run my node.js app. There were a couple sticking points though, chief among them figuring out how to get a command to continue running after logging out of ssh.

The nifty command nohup is supposed to accomplish this, but while it worked as advertised for a while, eventually I had to manually redirect the standard output, as per the ServerFault.com link above.

My command for launching my app ended up looking like this:

nohup env ACCESSTOKEN="secret" ACCESSTOKENSECRET="secret" CONSUMERKEY="secret" CONSUMERSECRET="secret" nodejs helltoupee.js >> nohup.out 2>&1 &

disown -a

In Ubuntu node is already a different package, so nodejs must be invoked instead. nohup and the standard output redirect acts as previously described, while nohup.out is where the console output is supposed to go. The final & creates this process as a background process, returning control of the shell to me for further commands, while disown -a disowns all processes from the current shell and returns them to the host (good for us since we’ll lose our current shell once we disconnect from ssh).

Final advice for Ubuntu and node.js

To get everything working hunky dory, here were the commands I used to install nodejs.

apt-get update
apt-get install -y curl
apt-get install -y git
curl -sL https://deb.nodesource.com/setup | bash -
apt-get install -y nodejs
git clone https://github.com/the-bobo/helltoupee.git
cd helltoupee
npm install --production

While what I did above will work, it’s worth noting the security risks of piping arbitrary code from a URL into your shell and executing it, which is what the curl | bash - line does. See here for an alternative approach.

Wrapping up

I learned a ton during this project. No doubt most of it is old hash to a seasoned programmer, but I’d say for my First Real Project (TM) as a product manager? Not too shabby. I’m excited to keep building more stuff and expanding my knowledge base.

Also, a parting mystery — for whatever reason, any time i visit the heroku name for my app (fake example: blooming-onions-34324.heroku.com) it crashes the app. I have no idea why. If you can figure it out let me know in the comments below!

 

 

 

bobo

 

One thought on “Technical Lessons from Ship 1 – Building My First Twitter Bot

Comments are closed.