April 3, 2008
Guide to Getting Started with Merb and ActiveRecord

Spent a while today getting up and running with Merb, the minimalist modular alternative Ruby framework from Ezra and the good people at Engine Yard. Merb has been in a bit of chaos these recent months as it's gone through a major reworking to acheive a whole new level of performance as well as honest-to-goodness modularity, including choosing your own ORM and templating system. I've been watching Merb's development for some time waiting for it to get to a level of stability that looked safe enough to dive in; their most recent release, 0.9.2, combined with pressing needs in a few exciting new Grabb.it features, made today the day.
My first day with Merb has been mostly great, but the one thing I found really sorely missing was a tutorial on how to get started. In all fairness, the Merb team promises left and right that copious documentation will be coming as they settle down to 1.0. In the meantime, I thought I'd pitch in for any brave early adopters out there with this Guide to Installing Merb and ActiveRecord.
Acquire and Install the Source
So, my goal here was to get from 0 (no Merb on my machine whatsoever), to running a hello world for accessing the database from an existing Rails project using ActiveRecord from within Merb. The first step was to acquire the source. One of the downsides to Merb's modular architecture is the complexity involved with installing it (again, all relevant disclaimers here about how the merb team will, I'm sure, be working to simplify and improve the process as they reach 1.0). At least for now, you actually have to get your hands on three different packages: merb-core (the base of the framework, a requirement), merb-more (has more advanced features like the command for actually creating a new app), and merb-plugins (this is where things like ORMs and templating systems live). Let's do that:
$ git clone git://github.com/wycats/merb-core.git
$ git clone git://github.com/wycats/merb-more.git
$ git clone git://github.com/wycats/merb-plugins.git
That'll get us the bleeding edge trunk version (about which more here).
Now that we've got the pieces, we need to install them, thusly:
$ cd merb-core ; rake install ; cd ..
$ cd merb-more ; rake install ; cd ..
$ cd merb-plugins/merb_activerecord; rake install; cd ..
Create a New Project and Configure it for Active Record
We've got all the pieces; it's time to create our project and get it setup to use ActiveRecord. Go to the place you want to create your app and do this:
$ merb-gen app my_app
This is equivalent to the 'rails' command and will create your project directoy with most of what you need. But since a basic project in Merb is assumed to be simpler than a basic project in Rails, you'll quickly notice that you don't have a models directory. Since we're actually going to need a model if we want to connect up to a database with ActiveRecord, go ahead and create that directory and create a file inside if for your model just as you would in Rails, for example my_resourece.rb, which could look like this:
class MyResource < ActiveRecord::Base
end
We'll probably want a controller as well, so create a new file: controllers/my_resources.rb:
class MyResources < Application
def show
r = MyResource.find :first
render r.some_method
end
end
Notice that all Merb controllers inherit from Application just like Rails controllers inherit from Application::Controller. The naming choice there is kind of interesting because it reveals Merb's controller-centric history and philosophy (remember that the framework doesn't assume that we need models by default; it turns out there's a lot you can do with just controllers).
Since we're using ActiveRecord, we'll obviously need to tell Merb that we want it to go ahead and actually load AR as our ORM. Go into config/init.rb in your project and uncomment the line that says "use_orm :activerecord".
We're almost there! These last few steps will feel familiar from setting up a Rails app: letting the framework know about our route and database configuration. To set up your route, open up config/router.rb and, inside the 'prepare' block, add a line like this:
r.resources :my_resources
Merb's routes work pretty much like Rails's, but with a few more advanced features some of which are explained in the comments at the top of that file. If you need something other than standard RESTful routing, read those.
Finally, all we've got to do is configure database access and we'll be ready to roll. In Merb this looks exactly like Rails, in fact, I simply copied the database.yml file over from the Rails project that usually manages the db I wanted to access, dropped it in config/databse.yml and it worked straight out of the box.
Ok! If you've made it this far, then you're probably more than ready for the big reveal. In you project directory do this:
$ merb
The server will start and you'll get a few log messages in your terminal that look like this:
$ merb
~ Loaded DEVELOPMENT Environment...
~ loading gem 'merb_activerecord' from ...
~ loading gem 'activerecord' from ...
~ Connecting to database...
~ Compiling routes...
~ Using 'share-nothing' cookie sessions (4kb limit per client)
~ Using Mongrel adapter
Once that settles down, browse to http://localhost:4000 and you should see the Merb welcome screen. Go to your expected url to see the result of your handiwork: i.e. http://localhost:4000/my_resources/7
If this works, you're officially up and running with Merb and ActiveRecord. If not, you'll see one of Merb's very stylish error screens and it'll be time to go to #merb on irc.freenode.net where all the friendly merbfolk hang out and are more than willing to help.
Tagged: merb, ruby, web, framework, install, 0.9.2Posted by Greg at 9:35 PM | Comments (1)
March 7, 2008
Automating Firefox for Web Application Integration
This post explains how to control Firefox from the command line with Telnet and Ruby. After presenting some context to explain why I think this hack represents an important area of concern in contemporary web application development, I'll show how to execute it with actual install directions and code samples.
Ok, I'll say it: I think JavaScript is cool. One of my favorite effects of the move to the modern AJAX-oriented web application architecture has been the opportunity to move ever more functionality into the client. At Grabb.it, we like to say, "Anything you can implement in JavaScript is free." Instead of running on our servers, the JavaScript portion of our app runs on a distributed grid of thousands of machines maintained for us by our users. Also, despite the reputation given it by the Browser Wars, JavaSript is incredibly fun to develop in: it's lightweight and extremely flexible in a unique way that somehow forces you to constantly keep your code very closely tied to the data it's manipulating.
The one big downside to JavaScript is its runtime environment. Not only does code running in the browser confront a Gordian Knot of browser compatibility problems, but it's also irretreviably isolated from interoperating with other application code. While javascript libraries (like the inestimable jQuery) are increasingly proving the Alexander's sword of the browser compatibility Knot, the issue of lack of application interoperability is only just beginning to get serious. As JavaScript's innate advantages lure more and more application code into the browser, the question will be unavoidable: How do you get modules implemented in JavaScript to interact with those built in other languages that live in more traditional environments? How do you avoid duplicating all functionality that you put into the JavaScript portion of the application so that you can call it from outside the browser?
This week, trying to solve exactly these types of problems, I discovered a tantalizing avenue towards addressing some of these questions: browser automation from the command line and from scripting languages. Here was my situation.
As part of an upcoming Grabbit project, I've built a a highly interactive data browser for our customers. The JavaScript running on that page makes a series of JSON GET requests to gather all of the necessary information to compose its display and it makes a few AJAX POST requests to report back to the server on certain bits of status. But now, I wanted to trigger those POSTs programatically on a schedule rather than waiting for customers to trigger them. The dilemma is that I'd already written this relatively sophisticated JavaScript application that makes all the necessary requests, implements the business logic, and knows how to POST in the data. I had two options: redo all of that work again in my server-side application (ick!) or figure out a way to trigger this JavaScript code by automating its runtime enviornment (the browser).
After a half day's research, here's what I discovered: there's a Firefox extension that allows other applications to establish JavaScript shell connections to a running Mozilla process via TCP/IP. It's called JSSH. Once you've got JSSH installed and running in Firefox, you can open a telnet connection to the browser that allows you to automate it using JavaScript commands to do things like load new pages or even manipulate the DOM on pages you've loaded. You can then automate this interaction using any scripting language with a telnet library. For the remainder of this post, I'll provide step-by-step instructions for running JSSH and for automating it with Ruby.
Install JSSH
The easiest way to install JSSH is to download the JSSH.xpi and open it with Firefox which will offer to install the extension (if you're interested in compiling Firefox with it from scratch or installing an existing binary, you should read these instructions).
Start Firefox with JSSH
Once you've got a copy of Firefox with JSSH installed, you'll need to run it. You can do this by providing the correct options when launching Firefox from the command line. On Mac OS X, that looks like this:
/Applications/Firefox.app/Contents/MacOS/firefox -jssh &
The "&" at the end of that line will background your command so it doesn't take over your terminal session.
Telnet into the JavaScript Shell
Once Firefox is running, we can use telnet to log into JSSH like so:
$ telnet localhost 9997
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Welcome to the Mozilla JavaScript Shell!
>
Load a URL from JSSH
Now that we're in, we can tell Firefox to load pages for us, thusly:
var w0 = getWindows()[0]
var browser = w0.getBrowser()
browser.loadURI("http://www.urbanhonking.com/ideasfordozens")
And that's it! If the JavaScript application I wanted to run lived at "http://urbanhonking.com/ideasfordozens", we'd be done. That command would load the page and Firefox would interpret and run the JavaScript it found there.
Now, all we've got left to do is make it so that we can trigger this process from our application code. So, we'll...
Automate the Process with Ruby
Like any good scripting language, Ruby has a telnet library, which means that once we've got an instance of Firefox running with JSSH enabled, we can talk to it from Ruby whenever we want. Here's an example script that logs into the telnet shell and loads a series of URLs one at a time:
require 'net/telnet'
my_urls = ["http://urbanhonking.com/ideasfordozens", "http://atduskmusic.com", "http://grabb.it", "http://pdxpopnow.com"]
# start telnet session with the Firefox javascript shell and setup browser object
puts "starting telnet session"
firefox = Net::Telnet::new("Host" => "localhost", "Port" => 9997)
firefox.cmd "var w0 = getWindows()[0]"
firefox.cmd "var browser = w0.getBrowser()"
# load each page
my_urls.each do |url|
puts "loading...#{url}"
firefox.cmd "browser.loadURI('#{url}')"
sleep 10 # so that the browser has time to load even if the page is slow
end
firefox.close
Further Research: Screen Scraping JavaScript Heavy Sites
What else might this rickety bridge we've built to the JavaSript runtime environment be good for? One thing that immediately occurs to me is: screen scraping for sites with a lot of JavaScript. Another side effect of the rise of rich JavaScript applications has been to create intractable problems for people trying to do screen scraping. If the data you want is not in the page's HTML when you request it in the first place but is only written in later when the page's JavaScript runs then traditional spidering and screen scraping techiques will fail to find it. Freebase, the open database application built by Danny Hillis and his team, for example, uses a highly dynamic interface for presenting its data that is almost entirely based in JavaScript. Or, on the low-brow side, MySpace uses JavaScript throughout the forms in its interface to help with date picking and such. If you wanted to scrape or automate interaction with either of these sites, you'd need access to a runtime environment that could execute JavaScript.
I haven't really tackled this problem with JSSH, but I do have some leads. For example, here's how you get the html of the document:
> browser.contentDocument
[object XPCNativeWrapper [object HTMLDocument]]
> domDumpFull(domNode(browser.contentDocument))
<HTML><HEAD><META content="text/html...
If you want to explore this avenue further, one of the best places to look is Firewatir, a project to add Firefox support to the WATIR browser testing framework. They do lots of click-by-click automation and checking for results, so I'm sure they've figured out approaches for a lot of what you'd confront when screen scraping. The JSSH documentation itself is useful and clear but not the most in depth.
Happy automating! Let me know what you discover...
Tagged: ruby, firefox, jssh, javascript, automation, browser, ajax, jsonPosted by Greg at 10:54 AM | Comments (0)
March 1, 2008
Developing Single Serving Sites using Ruby CGI scripts on Dreamhost
There's been a lot of hullabaloo lately about Single Serving Sites. Stimulated by the inexplicable runaway success of Barack Obama is Your New Bicylce, these simple sites that provide a small dollop of amusement (isitchristmas.com) or utility (istwitterdown.com) have become all the rage.
Of course, I've been making them for awhile now, e.g. Largehearted Goat and The NY Times Explains the Ratings. At first, creating a new SSS is a satisfying experience. You have a whacky idea and withing a few hours, you've registered a domain, written some simple code, and put something up. But, over time, each one gets to be more and more of a burden. My ideas, at least, have involved sites that need to be constantly updated with new information over time. Since I've always implemented these sites with simple ruby scripts that run on my local machine and then upload the static finished versions of the sites, this has meant keeping an eye on unreliable cron jobs and, sometimes, hand maintenance. And, over the years, I've wondered if there was a better solution.
Today, I took the first steps towards finding one. It turns out that good-ole CGI scripts — so foreign to those of us who's main experience has taken place in the age of sophisticated web frameworks like Rails — make a great basis for SSS development. What follows is a basic introduction to writing and running CGI scripts with Ruby. I'll focus on Dreamhost as a deployment target since it's the service I have access to for this kind of thing and also the issues that arise there are probably not that dissimilar from what comes up with the other shared hosting services that are the natural habitat of SSSes.
Step One: Get Ruby and Rubygems up and running
If you plan on keeping your SSSes extremely spartan simple, you might be able to skip this step. Dreamhost accounts come with an old version of Ruby already installed. If you don't need to install any custom gems for your scripts or control anything else about your ruby environment, you can skip straight down to the step two. However, if you plan on doing anything the least bit more sophisticated — from installing one individual gem all the way up to using Rails itself — you've got a bit of work to do beforey you can get started on the fun part.
Being far from an expert on unix build process and dependencies, I followed Nate Clark's excellent instructions for building Ruby and Rubygems on Dreamhost with just a few discrepancies. My biggest note is that the version of the Ruby source to which he links is out-of-date. I changed his line for download the Ruby source to:
$ wget ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6-p111.tar.gz
But, don't trust me! The most recent version of Ruby changes all the time and you can always find the most recent version of Ruby at ruby-lang.org. The one other thing worth noting that I discovered was that in any of the compilations make may die by timeout (Dreamhost kills processes that last longer than a given interval, a major pain point we'll be returning to in a moment when we try to install individual gems). It won't always be clear that a timeout was at fault so if you see a 'configure', 'make', or 'make install' command die mysteriously, just go ahead and try again.
Step Two: CGI Hello World with Ruby
Now that we've got Ruby and Rubygems installed, the first thing we want to do is to just run a basic 'hello world' CGI script to make sure that we've got things configured correctly. Here are the steps:
- Log into dreamhost over ssh.
- Navigate to the desired directory within your web space.
- Create a new file, e.g. test.cgi.
- Fill it with the following content (if you skipped Step One and are using the build-in DH Ruby, the first 'shebang' line should read: "#!/usr/bin/ruby" instead of what is shown):
#!/usr/bin/env ruby require "cgi" cgi = CGI.new("html3") cgi.out("text/plain"){ "Hello World #{Time.now}" } - Change the permissions of that file by running:
$ chmod 775 test.cgi
Hello World Sat Mar 01 19:04:53 -0800 2008
Step Three: Install Gems by Hand
Now, if you plan on doing anything more interesting than displaying the date, the odds are you're going to want to take advantage of Ruby's great and growing weatlh of libraries made available as Gems. Unfortunately, the RubyGems server system running at RubyForge has lately become incredibly slow. In order to find packages and detect dependecies, the default 'gem install' command has to communicate with that server and so becomes subject to the ruthless Dreamhost killing of long-running processes. In my experience, trying to install gems on Dreamhost will universally end in frustration:
$ gem install feedalizer
Bulk updating Gem source index for: http://gems.rubyforge.org
Killed
What to do?
Fortunately, in addition to using the RubyForge server to search for gems, it's possible to install them directly from .gem files if you can find them for your desired packages. For example, when building a recent SSS I wanted to use Feedalizer — a great library that scrapes web pages and automatically creates RSS feeds from the resulting content — so I hunted down the .gem file via the RubyForge website, downloaded it, attempted local installation, and then (when that failed) chased down the sources for the dependencies (in this case just Hpricot) to follow the same process. To give you a taste of things here were the gory details:
Installing Feedalizer:
$ wget http://rubyforge.org/frs/download.php/13797/feedalizer-0.1.0.gem
$ gem install feedalizer-0.1.0.gem
which threw an error complaining about the need for the Hpricot dependency, so:
$ wget http://code.whytheluckystiff.net/dist/hpricot-0.5.140.tgz
$ tar xzf hpricot-0.5.140.tgz
$ cd hpricot-0.5.140
$ rake gem
$ gem install pkg/hpricot-0.5.gem
The 'rake gem' step there is necessary because _why only makes the source directory available for direct download rather than an explicit .gem file. After this completes successefully, return to the 'gem install' step for Feedalizer above (be sure to the return to the directory into which you dowmloaded the Feedalizer .gem before running it).
Much bumpier than the smooth ride normally provided by 'gem install', but it'll get the job done. You could clamber your way down similar rutted paths for most any other gem you needed for you script.
And that should be more than enough to get you started. If you run into any troubles, dive into the comments here and on the other blog posts linked for help. Also, if anyone has any experience using lightweight persistence strategies in this context, I'd love to hear about them, espcially if they're file-system only; a lot of my scripts requiring saving records and I tend to use a rickety YAML-based system for them that could stand improving.
Note: Thanks to The Pug Automatic for inspiring me to start down this path in the first place.
Tagged: ruby, cgi, dreamhost, rubygems, sss, single+serving+sitesPosted by Greg at 7:27 PM | Comments (0)
December 26, 2007
50 Blog Posts I'd Like to See in 2008
A while back, noted social media commentator Chris Brogan published a list of 100 blog posts he'd like to see other people write. The list was meant as a spur in the backside of readers who suffer from Blogger's Block, a final blow to the excuse that they can't think of anything to write about. While Brogan's actual topics were too focused on the incestuously insider world of social media for my taste ("44. The Difference Between Fark and Truemors"), the format was inspiring; I immediately started putting together my own list of posts I'd like to see.
Inevitably, my list reflects my interests and bias just as Brogan's does his. The posts I'd like to see are largely aimed at people like me: people who picked up Ruby in the last couple of years because of Rails and have got the language and framework pretty handily under control but lack the deep rooting in technical fundamentals and culture that comes from having gone to school for this stuff or having had a long career in it.
The theme is: "I know Ruby. Now what?"
Like Brogan, I'll ask that if you write on one of these topics link back here or come back to comment so I can find your post and gradually turn this list into something of a directory of this kind of info. Now, without further ado, here's my list:
- My First C Extension
- A Set Theory Primer for Relational Database Users
- What I Get Out of My Local Ruby Users' Group
- All About Postgres Indexes
- Ruby Was My First Language, Here's My Second
- The Five Most Useful Things I Learned in Computer Science
- My First Contribution to an Open Source Project
- Participating Actively on Mailing Lists
- Participating Actively on IRC
- Understanding and Using Threads
- Keeping Up to Date with New Developments to the Libraries You Care About
- How to Write Good API Documentation
- An Intro to Code Research, Or: Is There A Library for That?
- An Intro to Queues, Pools, Runners, and Inter-Process Communication
- Unicode Once and for All
- Timezones Once and for All
- How Do Migrations Actually Work?
- Basic System Administration for Developers
- Domain-based Programming in Javascript
- Instrumenting My Rails App
- Approaches to Processing Large Data Sets
- A Developer's Guide to Deployment
- Bootstrapping Rails Development for the Absolute Beginner
- Hey Look, I Did Something Useful with LISP!
- Approaching Your Idols: How to Start Conversations with Gray Beards and Gurus
- My Painless Gem Integration System
- Using Your Own Documentation
- Building a Rails App from Someone Else's Excel Spreadsheet
- Useful Ruby Outside of Rails
- Ruby as PHP Replacement
- Writing Simple CGI Scripts with Ruby
- Finding and Fixing Memory Leaks in Rails, Or: Why Are My Mongrels So Big?
- Writing and Managing Long-Running Processes
- SMS Integration in Rails
- What I Learned from Working with Statically Typed Languages
- My First Profiling Session
- Big Team Tools and Small Teams, Or: Why Is My Trac Empty?
- Rules of Thumb for Performant Ruby Code
- A Survey of Persistence Strategies Beyond Relational Databases
- Living with a Large Schema
- My First Cocoa Program
- Writing and Distributing Rails Apps for Desktop Installation
- Domain Registration and Hosting for Rails Apps
- Setting Up Subdomains and Pointing Them at Rails Apps
- My First Apache Configuration
- My First Nginx Configuration
- Is It Worth Releasing?: When to Open Source Your Work
- An Introduction to GUI Programming
- Lessons from Java for Someone Who's Never Written Any
- Practical Tips for Learning Protocols and Reading Specs
Posted by Greg at 1:18 PM | Comments (3)
September 20, 2007
git_tools: the Beginnings of a Rake Library for Git
There's been quite a lot of buzz lately amongst Rubyists (or at least amongst those I follow on Twitter) about the new version control system Git. As Linus Torvalds explained in a Google Tech Talk last May, licensing issues drove him to start work on Git to replace BitKeeper, under which Linux kernel developers had previously operated. Torvalds had three main requirements for the project: it had to be fast, it had to be distributed, and it had to be guaranteed never to lose any data.
For mere mortal developers like me, the most attractive one of these three is the second: Git's distributed architecture. In practice, the fact that Git is distributed means that you can separate the act of committing your changes from the act of publishing them. With Subversion, and other centralized systems, the only way to get your changes under version control is to push them out to the central server from which all of your collaborators pull. So, if you've got a set of changes that make a logical atomic commit, but break the build in a major way, you can't commit them without disrupting the work of others. Git, on the other hand, lets you make commits on your local machine without pushing any changes out to anyone. The result is that you can make a series of commits as you go that leave the app in whatever state you want -- even including creating your own experimental branches for trying out ideas, a process Git makes almost trivially easy -- and then when you're ready to push your changes out into the wider world, you can do so without anyone having to see the mess you made along the way. (A cool side effect of this architecture is that you can commit even if you don't have network access, for example if you're on a plane, which is where I wrote most of the code I'm about to tell you about...)
Anyway, as soon as I started using Git myself, I noticed the lack of a tool I use so often I'd stopped even noticing it was there: jchris's svn_tools plugin. It's a series of rake tasks for controlling Subversion. It provides a bunch of tasks to do common things like add all existing unrecognized files all the way up to bootstrapping a brand new Rails project into a new checkout. And Git definitely has some bumpy edges that would be nice to abstract out in this same way (like the need to type "git commit -a" for almost every commit and the need to manually tell Git about each new file individually when they're first being added). So, I set out to write a version of the same library for git, one that could provide some utilities that would speed up everyday activities and, eventually, some more complex macros for doing things like bootstrapping a new Rails project.
I didn't manage to accomplish that second goal, but I made enough progress towards the first, that I thought it would be useful to throw the project out there to see if one of those feverishly twittering Rubyists busy trying out Git might want to pick it up and run it into the end zone. So, without further ado here it is:
The main tasks of interest are:
rake git:add
- adds all files that need a manual git addrake git:commit
- commits with the -a flag thrown so your new content will actually get committedrake git:ignore
- removes the given files (or all files matching a quotation-surrounded pattern if given) from version control
A note on contributing: The archive of git_tools is itself a Git repository. One of the nice things about Git is that the files containing the versioning info are very compact and portable, so it's easy to ship around copies of a full repository with its entire history. If you make changes or improvements you'd like to see incorporated, just send me an archived version of the updated repository and I'll merge them in.
If you're intrigued and would like to learn more about Git I'd recommend the excellent tutorial intro to Git on Kernel.org and this how-to on building Git on OS X.
Happy Gitting!
Tagged: git, svn, cvs, ruby, rake, linuxPosted by Greg at 1:13 AM | Comments (2)
June 15, 2007
Presenting THUMBNAIL: a Ruby wrapper for the AWS Alexa Site Thumbnail service
I'm proud to announce the release of my very first gem: THUMBNAIL. It's a Ruby wrapper for the Amazon Web Services Alexa Site Thumbnail Service, which lets you automatically download thumbnails of any website or dynamically embed them in your own pages for an incredibly small fee ($0.20 per 1000 images). And it has a badass logo (see above)!
You can get it from RubyForge thusly:
$ sudo gem install thumbnail
You've got to do some bureaucratic overhead at Amazon before you can play (more about that on the THUMBNAIL homepage), but once you do it's just as easy as pie to download pix of sites from around the web like so:
require 'rubygems'
require 'thumbnail'
require 'open-uri'
t = Thumbnail::Client.new :access_key_id => YOUR_ACCESS_KEY_ID,
:secret_access_key => YOUR_SECRET_ACCESS_KEY
url = t.get("www.urbanhonking.com")[:thumbnail][:url]
File.open("urho.jpg", "w") { |f| f.write open(url).read }
where YOUR_ACCESS_KEY_ID and YOUR_SECRET_ACCESS_KEY are things you get from Amazon when signing up for the service. Running such code would make you the proud owner of a local copy of an image like this:
THUMBNAIL will also build you a url you can include in any webpage that will redirect to the site thumbnail you want like so:
require 'thumbnail'
t = Thumbnail::Client.new :access_key_id => YOUR_ACCESS_KEY_ID,
:secret_access_key => YOUR_SECRET_ACCESS_KEY,
:action => :redirect
url = t.get("www.twitter.com")
#=> http://ast.amazonaws.com/?Action=Redirect&AWSAccessKeyId=YOUR_ACCESS_KEY_ID
&Signature=sdhfiawrkjw3h9bncoa8ue&Timestamp=2007-06-14T09:09:18.000Z&Url=www.tw
itter.com&Size=Large
I'm working on a Rails plugin that will provide a view helper so you can easily do this from any template, but for now you'll have to be satisfied with the additional sample code available on the THUMBNAIL RDoc.
There are all kinds of other details available at the THUMBNAIL homepage. And the code is, of course, available to all for free under the MIT License. So, Go! Get your thumbs dirty! Enjoy!
Tagged: thumbnail, ruby, aws, amazon, web, service, alexa, api, rubyforge, gemPosted by Greg at 12:33 AM | Comments (2)
June 12, 2007
A Beginner's Guide to Practical Syntactic Magic: the tale of Hpricot's sudo-constructor
I spent much of today working with Hpricot. And so, as when spending significant solo time with any of why the lucky stiff's code, I found myself admiring all the neat little syntactic nicknacks strewn about to cozy up the place.
One of the best is the way you get started. Hpricot is a toolkit for parsing and manipulating XHTML. So, obviously enough, just about every time you invoke it, you're going to want to pass it an XHTML document so it can, you know, prep it for parsing and manipulation. And how do you do that? What's the syntax?
Hpricot(my_document)
That's it. There's no "Hpricot::Base.new(my_document).parse" nonsense, or any of the other more or less torturous common options. Not a single character of syntax is wasted.
But, if you're a mere Ruby mortal, like me, you're probably looking at that code and going: 'Huh?' Isn't Hpricot a constant? It's capitalized. But it's taking an argument like a method. How is that even valid Ruby? How can the parser tell if it's a constant or a method?
Well, it turns out that there's no rule against having capitalized method names; the parser can tell it's a method because it's got an argument. And that's all that's required for it to be sent off to method- instead of constant-dispatch (as Chris pointed out, this is one advantage of not having Ruby be "turtles all the way down"; Smalltalk couldn't do this).
Beyond providing fodder for a Language Nerd Attack, though, what's the upshot? How's this fact help the man on the street? Well: there's nothing actually sophisticated going on here. So: you can do it too.
Here's an admittedly contrived (and useless) example:
class Dogger
def initialize
puts "dog"
end
end
def Dogger()
Dogger.new
end
a simple class definition followed by a simple method invoking it.
Which leaves us with the ability to write two snippets of code that, while they may look nearly the same, do very different things:
>> Dogger
=> Dogger
>> Dogger()
dog
=> #<Dogger:0x15d2478>
and that is exactly from where _why's use of this little quirk derives its leverage. This trick makes you feel like you're invoking a constructor or calling some other kind of class method when you are, in fact, doing nothing of the sort. Just as our Dogger() method above needn't have done anything remotely related to the Dogger class, _why could have named his method Clown() or ChunkyBacon() while still calling Hpricot.parse(input, opts) inside it (which is exactly what Hpricot() does).
But his chosen usage is particularly inspired. In one fell swoop, he gives his whole complex feature-ful library a single welcoming point of entry. You need never concern yourself with the internal machinery; just heave a document over the transom and let the library figure out what to do with it. And this is the wider lesson of _why: real power comes from combining the playfulness (better: the insouciance) needed to probe, question, and even bend the limits of the language with the discipline and aesthetic sense required to use what you find not to obfuscate and confuse, but to write elegant and, above all, more humane code.
I mean, Hpricot would definitely not be a better library if that method was called ChunkyBacon(). Right?
Tagged: ruby, why, dispatch, hpricot, syntax, chunkybaconPosted by Greg at 1:29 AM | Comments (10)
May 2, 2007
My First Mongrel Handler: A Recipe for providing a JSONP Callback
One fact of life is: every technical community has its own recurring bugaboo, an unshakeable criticism that dogs it through its greatest successes, enduring in spite of furious defense and fact marshaling by supporters. Linux? Won't work on the desktop. Lisp? No standard libraries.
In the Rails community our bugaboo is performance. Whether you're a hawk or a dove on the issue, you probably weren't drawn to the framework because of its performance, but because it makes building even complex web apps a joy. You're also probably more than a little sick of hearing about performance in Rails.
But sometimes you're building something dead simple; you may not need Rails' leverage. And sometimes performance may actually be a showstopper. What then?
We encountered both of these conditions recently building a system to provide some JSONP callback functionality. Here's the spec: given a request for the JSON representation of some resource and the name of a callback function, return the JSON wrapped as an argument to the named function. In other words, take a request like:http://mydomain.com/resource/2d8dbb1119a8.js?callback=myFunctionAnd return a response in the form:
myFunction({...resource represented as JSON...})
The point is for the user's function to get executed on their page with our JSON as an argument, allowing for data integration without the domain restrictions of AJAX.
Now, what does the code on our side have to do? It has to get the Javascript representation of the resource located at "mydomain.com/resource/2d8dbb1119a8" and then wrap the result in the function named in the "callback" param. That's it. That's the whole feature. All we need to implement it is: access to our data and some kind of structured access to the request itself. And, since this feature is part of our public data API (and so could potentially get called programatically by other people's code) it would be great if it was fast and cheap.
One way to meet these demands would be to write an action in each of our controllers that loaded up the right objects and concatenated their JSON representations with the contents of the callback param to compose the proper text response. But, this would pollute our existing code and use all kinds of unnecessary resources. There must be a better way. If only we had a piece of code lying around that specialized in parsing requests and serving up representations of our resources...
And, of course, we do: Mongrel, everyone's favorite pure-Ruby web server. As a web server, Mongrel speaks HTTP natively; as a kick ass web server, Mongrel is very high performing; and as our web server, it sees all of our requests anyway before they even reach Rails.
At this point, you probably won't be surprised to learn that Mongrel has a feature specifically designed to do this kind of thing: Mongrel handlers. For our purposes, you can think of Mongrel handlers like controllers: they examine the request, do something based on what they find there, and return a response. The Mongrel handler sees the request before it ever reaches our Rails app, so it doesn't have to load up cgi.rb or any of the other resource intensive, performance troubled, parts of the framework. Unlike Rails, It can also handle multiple requests per Mongrel instance. All of which adds up to a major performance win.
So, how would we build our JSONP callback code in a Mongrel handler? This is how:
require 'open-uri'
class JsonpHandler < Mongrel::HttpHandler
def process(request, response)
response.start(200) do |head, out|
head["Content-Type"] = "text/javascript"
callback = Mongrel::HttpRequest.query_parse(request.params["QUERY_STRING"])["callback"]
json = open("http://#{DEPLOY_DOMAIN}#{request.params["PATH_INFO"]}").read
out.write "#{callback}(#{json})"
end
end
end
uri "/jsonp", :handler => JsonpHandler.new, :in_front => true
Let's go through this bit by bit starting with the last line. When our script gets loaded up on Mongrel's startup (more nitty gritty on that below) that line tells the server that our handler class should be invoked for any uri that matches 'http://mydomain.com/jsonp/*' (The eagle eyed amongst you will notice that this means the url for requests to this service will have changed slightly from what we mentioned above: http://mydomain.com/jsonp/resource/2d8dbb1119a8.js?callback=myFunction instead of http://mydomain.com/resource/2d8dbb1119a8.js?callback=myFunction. This is the one price we pay for using Mongrel handlers; we need these requests to have something unique after the first slash so that Mongrel can known to intercept them before they reach Rails.). The in_front flag means that Mongrel should check for a match before waking up Rails at all.
When out handler gets invoked, its 'process' method gets called. As you can see, that method takes two arguments: the request and the response. We read the request to figure out what to do, writing the response as we go. We're going to return a response with code 200, so we invoke the start method with that as the argument. That method also takes a block with two local variables representing our response's header and its body (for more info, see the Mongrel::HttpResponse docs).
Inside the block, the first thing we do is set a header: since we're going to be returning executable Javascript, we set the appropriate Content-Type, "text/javascript". Now we get to the real action. We want to get the name of the callback value out of the query, so we grab the query string out of the request's params hash and use the Mongrel::HttpRequest's relevant class method to parse it. The result is a hash with key-value pairs representing everything after the "?" in our url. We pull the callback out of that and remember it.
Now, we've gotten the name of the callback function out of the request, so we're halfway there. All we've got left is to get the JSON representation of our resource. We've got two choices: we could load up our actual Rails objects or, we could use their urls. If we wanted to go the first route, we'd end up using Merb, a whole microframework that combines Mongrel handlers with ActiveRecord and ERB templates to provide a complete lightweight alternative to Rails. We don't need to go that far. All we've got to do is make a request for our own resource's .js url and we'll have what we need. At first glance, this may seem to be a security flaw (since it causes us to grab a url that is interpolated from a request that comes in), but we're only going for urls on our own domain, and we're just storing their contents in a string and then returning them to the agent who asked. The worst someone could do would be to request an image file or other expensive resource, which would then be garbled by having some text prepended to it and returned to them. We use open/uri to request the resource: DEPLOY_DOMAIN is a project-wide constant (bascially localhost or our real domain) and request.params["PATH_INFO"] gives us the request from the first slash up to the '?' -- just what we need to construct request for our resource. (There may be some performance downside from having an open/uri call in the midst of a Mongrel handler, but I don't know nearly enough about threading to speak intelligently on the subject.)
The last line of the block just puts the pieces together and writes the result as the response's body.
Once we've got this code written all we've got to do is tell Mongrel to load it up on launch:
$mongrel_rails start -S path/to/jsonp_handler.rb
This post would not have been possible without: the slides from ezmobius's Merb/Mongrel talk. Read them for more (and much better informed) info on the subject.
Tagged: mongrel, handler, rails, ruby, jsonp, merbPosted by Greg at 10:23 PM | Comments (1)
February 7, 2007
Dipping a Toe in the C
Last night, I took an unexpected jacknife into the underground wellspring of C-code that burbles beneath the calm surface of Ruby. As part of some preliminary research for a super secret and incredibly exciting project I'm planning, I learned about two related libraries, Ruby2C and Ruby Inline, that dip into that pool to accomplish opposite, but complimentary, goals: translating Ruby into C for portability and flexibility, and writing custom Ruby methods in C in order to improve their performance.
Ruby was originally written in C and, for the most part, Ruby-related C coding is for the serious gray-bearded core language contributor. These two libraries give merely mortal programmers like me a chance to play with some of the power that comes from manipulating the language's internals. The downside of this dynamic is that I'm not smack dab in the center of the target audience for these projects the way I am with Rails and so I ran into a whole series of unaccustomed obstacles and inconveniences in the course of my dive including non-existent installation instructions, thin documentation, and incomplete, experimental code. While these may be familiar surroundings for the above-mentioned gray beards, they certainly aren't for me, and so I thought I'd take a moment to document what I learned about dealing with them on behalf of the next poor, desperately, Googling soul to follow in my footsteps.
Take Ruby2C, or, should I call it "ruby_to_ansi_c"? Part of the Metaruby project, which aims -- seemingly in a spirit of pure language geekery -- to rewrite Ruby's core classes in Ruby itself, this library provides machinery for translating ruby code into its C equivalent. Beyond whatever self-referentialist uses this might have, there's definitely a practical upside to the portability it provides. I don't want to tip my hand too thoroughly, but I can think of some neat places where I'd like to stick code that require C for entry.
'So far, this all sounds fine and dandy,' you say 'so what's the problem?' Well, my rhetorical question about the library's name hints at one facet of the unfriendliness involved. While the library usually calls itself Ruby2C in public, that is, in fact, almost never formally its name. Its rubygem goes by RubyToC and it includes two parallel libraries, one called "ruby_to_ansi_c" and one called "ruby_to_ruby_c". And, when it comes to code, names matter: installation, inclusion, and invocation are all impossible without getting them exactly right.
Another indicator of the size of the problem is the section of the README.txt that falls under the heading of Installation: "Um. Please don't install this crap yet..." So, I guess that leaves you with me. Anyway, without further ado, here's what I learned from installing Ruby2C and getting to hello world with it:
Install happens, like with pretty much any other gem (make sure you get the capitalization ), thusly:
$sudo gem install RubyToC
Now that we've got the library, let's write some code. We're going to write a class that, when translated into C will just print some text to the screen when compiled and run:
require 'ruby_to_ansi_c'
class MyTest
def say_hello
puts "hello"
end
def main
say_hello
return 0
end
end
Note the require line. The use of the "main" method is just a cute little hack from one of the RubyToC examples that makes the resulting program runnable. When translated, the new program will have a "main" function which is what gets called when you run a compiled C program from the command line.
Now, let's go ahead and do the translation:
result = RubyToAnsiC.translate_all_of MyTest puts resultThis produces the following C code:
long main();
void say_hello();
long
main() {
say_hello();
return 0;
}
void
say_hello() {
puts("hello");
}
That C source may look a little funny since the RubyToC generator isn't much for producing aesthetic whitespace, but it'll compile and run, which is what matters. To test it out, copy that snippet into my_c_test.c and fo the following:
$ gcc my_c_test.c -o my_c_test $ ./my_c_test helloWe could also just build the C for one individual method, like so:
RubyToAnsiC.translate(MyTest, "say_hello")which would return just the second of the two functions in the above C source.
All of this is pretty simple and very powerful once you get it working. Of course, the code itself has a beautiful and clean user interface (in the form of these class 'translate' methods). It's just the websites and documentation that suck!
Now, the opposite of the ability to convert Ruby to C is the ability to write your own Ruby methods in C, just like the gray beards do. Unlike its converse, this process has obvious and wide-ranging benefits in the form of significant performance enhancements. Ruby is a high level language and not an especially zippy one. C, being closer to the machine, will almost always take care of an equivalent task in less time. The point of Ruby Inline is to let you rewrite the biggest performance choke points of your code in C to speed them along. Here's a basic usage example that adds an instance method called "say_hello" to our MyTest class that simply prints the text "hello" as many times as is asked:
require 'rubygems'
require 'inline'
class MyTest
inline do |builder|
builder.c <<-CODE
void say_hello(int i){
int n = 0;
while(n < i){
puts("hello");
n++;
}
}
CODE
end
end
MyTest.new.say_hello 10
While this is a trivial example, its form is a common one for optimization based on Ruby Inline: a central slow loop or algorithm that we plan on running many times.
A few things of note. Ruby Inline offers a number of options for including libraries, providing compilation instructions, and such. Take a good look at the documentation for details. Also, I've had some problems while playing with Ruby Inline in irb. They tend to take the form of "Errno::ENOENT: No such file or directory" errors. I don't think that this kind of code was really meant to run in a shell. It wants a source file to compile from and a static place to compile to. In addition, I think that wirble, a set of irb-enhancing tools I use, exacerbates things. In summary, run your Inline examples from files.
Tagged: ruby, c, ruby2c, inline, internals, metarubyPosted by Greg at 3:04 AM | Comments (0)
January 26, 2007
Subverting Twitter
If it hasn't yet skittered across your radar, Twitter is a tiny web app that works a little like a universal IM away message. Created as a side project by Ev Williams and the gang at Odeo, Twitter lets you post tiny fragments of text (announcements of where you are and what you're doing, random thoughts and observations, etc.) via text message, IM, or the web. No more than 150 characters are allowed. It then broadcasts those messages to all of your friends and 'followers' and posts them to your own customizable page.
Here, for example, is my twitter page (warning: clicking may cause extreme boredom).
Anyway, it may be totally contrary to the spirit of such a pleasantly pointless thing, but I think I've found a use for Twitter that is actually…uh…useful: publishing commit messages from version-controlled coding projects.
Commit messages are the atomic unit of change in a project and can be the best way of keeping up with programming progress. Unfortunately, they all too often end up banished to obscure and unwieldy diff logs never to be regularly read. And worse, the knowledge of this sad fate leads harried coders to be lazy and uninformative in their composition making commit messages often doubly useless.
Maybe, publishing our commits to project-specific twitter pages, which our collaborators (and customers, and bosses) can follow in real time, will get us to give our messages some real zing as they become, not lost log entries, but comments in a conversation.
Or, at least, that's the theory, anyway. To test it out, I whipped up a solution for my current coding environment: a Rails project managed under Subversion. Specifically, I wrote a Rake task that prompts you for a commit message, runs the necessary svn commands to commit your code, and then forwards your commit message on to Twitter.
Well, more accurately, I extended Chris's svn tools plugin so that it uses addictedtonew's ruby twitter API library to post the message. Really, I did almost none of the work here, just tied together some real projects by others much more skilled than myself. Here's how you can do the same:
The first step is to get the svn tools plugin installed if you don't already have it. Chris mentioned today that he's thinking about refactoring it into a gem for use in Ruby projects more generally, but for now you can install it in your current Rails project like so:
> script/plugin install http://svn.rtra.in/public/plugins/svn_tools/
If you installed the plugin before January 26, 2007, you should reinstall it now, since Chris kindly made a little tweak to it to make my hack possible:
> script/plugin install http://svn.rtra.in/public/plugins/svn_tools/ --force
Once that's done, you're most of the way there. The next step is to install the twitter gem with hpricot upon which it depends:
> sudo gem install hpricot --source code.whytheluckystiff.net > sudo gem install twitterIf you've already got the most avant-garde version of hpricot (today, it was 0.4.2), then you can skip the first of these two lines, but without it things will be quite bumpy.
Now, finally, all that's left is to add a new task to your project's own Rakefile. Anywhere in there (but not inside of a pre-existing namespace), add:
namespace :svn do
task :twitter => :commit do
email = 'me@mydomain.com'
password = 'mydogsname'
Twitter::Base.new(email, password).post(@message)
end
end
Obviously, this needs to get filled out with a valid twitter email address and password. And don't forget to stick "require 'twitter'" somewhere above this to make the gem accessible.
That's it, you're totally setup. Run it from your project's root directory with:
> rake svn:twitterIt will prompt you for the commit message, commit your code, and then send your message off to twitter, as promised.
You can see an example of this system in action by following the twitter page for the commits on grabb.it, a new semi-super-secret mfdz skunkworks project Chris and I are working on.
If you try it out, stop by and let me know how it works for you. Plus, if there's demand, I can always package this up as a gem once Chris does likewise with his svn tools, which would greatly simplify the install process and make it valid for non-Rails projects as well.
Ok, Twitter away!
Tagged: twitter, ruby, api, rails, rake, svn, subversion, commit, message, version, controlPosted by Greg at 2:58 AM | Comments (5)
December 9, 2006
Getting ComputerKrafty: Arduino, Ruby, and Blurry Video of Some Blinking LEDs
(Arduino Serial Ruby on YouTube)
For the last month or so, Brett and Marcus from Tables Turned and I have been meeting weekly to teach ourselves Physical Computing, the use of micro-controllers like those found in cell phones and Roombas to build all kinds of interactive projects, from multimedia installations to scientific equipment.
We're using Arduino, a cheap and simple micro-controller chip and programming framework that's great for beginners. Between the three of us, we've got lots of ambitious projects we'd like to build, from immersive sound installations to wifi-enabled street walking robots, but in order to learn the basics, we're starting with a pretty simple project: building our own version of the children's toy Simon. If you're interested, you can follow our progress on the ComputerKraft wiki.
The two videos I've posted here show some early experiments we tried out while learning the ropes. The one below is amongst the first things we ever tried: reading the analog input from a knob and using its position to light up a changing number of LEDs.
The video at the top is from this week and I'm pretty proud of it. It shows a Ruby program running on my computer that reads input from a user and then lights up a different LED depending on what number it receives. This doesn't sound too impressive; after all, it's just another 'hello world'. But the elements involved are really exciting to me. With them in place, pretty much anything you can do in Ruby scripts, Arduino can know about -- reading RSS feeds, looking for files, user input, etc. Plus, from here, it doesn't take much more to get the interaction to flow both ways: when Arduino does something or senses something, it can get sent off to a Ruby program and from thence to files, the web etc.
If you're curious to know more about the technical details, you can check out the Ruby/serial demo page on the ComputerKraft wiki. It's got both the Ruby and C source code as well as an explanation of the hardware and links for downloading the ruby/serialport library (which does, in fact, work on OS X even though their documentation gives you little confidence that it would). Or, if Ruby's not your thing, you can check out Todbot's C code for doing this manually from the command line to accomplish something similar.
Tagged: arduino, ruby, serial, physical+computing, microcontroller, computerkraftPosted by Greg at 4:47 PM | Comments (7)
October 20, 2006
learns_to use Expect for Easy Automation
One of the great hopes you might have in beginning to learn about technology and computers is that they will save you time and effort. This is such an obvious expectation that it almost goes without saying, but, in my experience, it is rarely fulfilled and really unrelated to the true joy of technological learning. That joy comes in gaining whole new abilities, not in slightly improving existing capacities. I've was motivated to learn what I have about the web and programming because I wanted to publish my thoughts and my music for anyone in the world to read and hear and there was simply no other feasible way for me to do that. As my technical capacity has grown, I've come up with new ideas for things I wanted to do and make that I had never even known were possible. And now these ideas themselves drive me deeper into the technology in order to realize them.
Given this dynamic, I was a little shocked recently to come across Expect. For once, here's a command line utility that offers a staggering productivity increase without the attendant black hole of necessary technical mastery.
Expect is a tool for automating interactions with other programs. Expect scripts allow you to start up a program and then have the computer act use it in your stead. In your Expect script, you write out a dialogue for the interaction, e.g. 'if the program says that, respond with this,' and then the script holds up your side of the 'conversation' with the program, providing feedback, entering inputs, making simple decisions.
Why is this useful? With Expect, you can write scripts that fire off relatively complex interactions with a single command, so you don't have to remember all the individual sub-steps. Or, even sexier, you can automate multi-stage tasks you've previously had to do by hand so that you can trigger them with cron so you never have to think about them ever again.
This may sound fuzzy and abstract so far, but Expect scripts actually turn out to be a cinch to write. As proof, I'll show you the simple script I worked up last night to automate my daily "production process" for Largehearted Goat. In my original post on the subject, I mentioned that the code behind Largehearted Goat required "just a little hand holding." Here's what was involved: (1) run the ruby script which reads the Largehearted Boy RSS feed, finds the Goats, and rewrites the html, (2) sftp into my web hosting and copy the new html file over the existing one being served up to Largehearted Goat. And here's Expect script I worked up to get it all done (paths and passwords have been changed to protect the innocent):
#!/usr/local/bin/expect -f
spawn ruby /path/to/goat/script/goats.rb
expect eof
spawn sftp mylogin@myhost.com
expect -exact "Password:"
send "MySecretPassword
"
expect "sftp>"
send "put /path/to/my/new/html/file/goat.html path/to/my/online/goat/directory/
"
expect "sftp>"
send "exit
"
expect eof
So, here's how this works. The first line is just a necessary invocation to allow the Expect utility to read a set of commands from a file. The "spawn" command tells expect to start up a process, in the case of the second line, there, I'm running my ruby script. Already here, we have a big advantage over some other shell scripting choices available out there. Step (2), which I described above, only works properly if my ruby script has already been run. Otherwise, it would send the old version of the html up to the web and www.largeheartedgoat.com wouldn't change. Expect makes it incredibly easy to wait for the completion of that script. All we have to say is "expect eof" (for End Of File). That line tells Expect to wait for control to be returned to it from the previous process that it spawned before proceeding on.
Once the ruby script is done running, then it's time to go ahead and ftp it the new html file into place. Since my host requires ssh for login, I've got to use SFTP (Secure File Transfer Protocol), which I invoke with the next spawn line. From here on in, all I'm really doing is alternating prompts I "expect" to see from SFTP with commands I want to "send" to it. One of the best things about Expect is that if any of these "expect" conditions aren't met, the script won't just go ahead with the rest of the interaction running roughshod over your files, but will instead shut down without taking further action.
So, yeah, it's pretty easy. If you can do this task once by hand using SFTP, you can write this Expect interaction no problem. The only clever thing going on is the use of the new line when submitting a command, like so:
send "MySecretPassword
"
Think of this line break as hitting 'return' in order to actually submit the command.
Now, once your script is written, all you've got to do is make it executable by running 'chmod x' on it and then actually call it like
$ expect my_new_script
You should see all of the normal output of your commands scroll by in the terminal. And once you've got it working, you can check out this great crontab tutorial to set it up to run automatically!
I've only barely scratched the surface here of what Expect can do. It's a real programming language, allowing branching based on the response of the program you're interacting with and a full vocabulary for logic and variables, etc. But even with just this limited Expect vocabulary, I bet you can save yourself a ton of time. Is there a simple process like this that you have to do everyday? Automate it. Is there a complicated interaction you only have to do every once in a long while whose commands you always forget and so have to spend an hour re-googling? Next time you do it, capture it in an Expect script, save it somewhere and then just run it when you need it. Could you spend a long time fiddling with all the different options, improving your Expect chops? Sure you could. But why would you? This one's easy. This one's for getting things done.
Tagged: expect, shell, scripting, osx, mac, terminal, automation, crontab, sftp, ruby, largehearted, goatPosted by Greg at 6:00 PM | Comments (4)
September 10, 2006
learns_to Modules and Namespaces: Lessons from Wrapping the del.icio.us api
Last night, I started working on putting together a Ruby-wrapper for the del.icio.us api. I need it to execute this little idea I had recently (more about that when it's done) and I was surprised to find that there wasn't anything too useful out there -- though it's probably because the api is so easy to use you barely need a wrapper around it for most projects. There were a few libraries, but nothing really clean and complete and nothing using the new v1 of the api.
Anyway, in the course of working on the wrapper, I came across a common problem: the need for multiple namespaces. In the api, method names are not unique across objects. For example, there's a method that gets posts for a user and one that gets tags, both called "get" (api.del.icio.us/v1/posts/get and api.del.icio.us/v1/tags/get, respectively). Obviously, those urls leave no confusion as to which "get" method gets which type of object. The question is: what device in ruby should I use to capture this with equivalent clarity?
Two strategies occurred to me immediately: modules and subclassing. According to the relevant section of Programming Ruby, "modules are a way of grouping together methods, classes, and constants. . .[They] provide a namespace and prevent name clashes." Well, that sounds like exactly what I want to do. I want to group together the api methods for posts so that they don't pollute the namesapce for tags. Under this design, I would have multiple modules within my main class, one with the methods for each api "object," posts, tags, bundles, and whatnot.
So, to see if this would actually work, I ginned up a simple example of using modules inside a class. This is what it looked like:
#namespaced class methods
class Test
module Gar
def self.to_s
puts "gar!"
end
end
module Bax
def self.to_s
puts "bax!"
end
end
end
Test::Gar.to_s
Test::Bax.to_s
If you ruby this you'll see this output:
gar!
bax!
In other words, it seems to work for class methods.
But what about instance methods. I made my toy example a little more complicated:
class Best
attr_accessor :dog
def initialize
@dog = "bot!"
end
module Gar
def self.set_dog
@dog = "gar!"
end
end
module Bax
def self.set_dog
@dog = "bax!"
end
end
end
t = Best.new
puts t.dog
Best::Gar.set_dog
puts t.dog
Best::Bax.set_dog
puts t.dog
Unfortunately, this doesn't seem to work. The modules can't get access to the instance variable, @dog. The output ends up looking like this:
bot!
bot!
bot!
This means that I'm thrown back to trying to solve the problem with regular subclassing. I'll be defining a series of classes like this:
class Relicious
attr_accessor :username, :password
#my main class, connects to del.icio.us, etc.
end
class Post < Relicious
def get
#call the posts/get url
end
end
clas Tag < Relicious
def get
#call the tags/get url
end
end
That way, each separate subclass can implement identically-named methods with no danger of namespace confusion. My initial instinct was that this pattern was slightly less elegant than what I was trying to achieve with modules because the subclasses all have to access the centralized connection methods and such in the parent class. The resulting usage code looks like this:
post = Post.new
post.username = "myusername"
post.password = "mypassword"
post.get
which is ugly (a post doesn't really have a username) and inefficient (you'd have to set the username and password attributes fresh if you called Tag.new since you'd have a new instance).
Thankfully, today Chris proposed a better solution, which, in retrospect, should have been obvious to me: wrapping up the child objects inside of accessors in the parent class and then only ever accessing them from there. This would turn the above usage code into this:
rel = Relicious.new
rel.username = "myusername"
rel.password = "mypassword"
posts = rel.posts.get
The namespace problem is solved, everything is meaningfully encapsulated, and the syntax is concise and clear. Sounds like good design. Now, all that's left is to actually implement it. . .
Posted by Greg at 6:19 PM | Comments (2)
August 28, 2006
learns_to build Academic Archive::Part 2:Setting up a New Rails App and a First Iteration on the Paper Model, Featuring our First Tests
Welcome to Part 2 of learns_to build Academic Archive, where I try to blog every last detail involved in building a Ruby on Rails application for publishing and peer-editing academic papers. As requested by Benjamin in the comments on Part 1, from now on, I'll be providing a table of contents to each post. So if you're looking for some specific piece of knowledge, you can jump right into the middle to get it. If you have any other ideas on how to make this series better, I'd love to hear about them in the comments.
Contents
- Creating a new Rails project
- Designing the Paper Model
- Setting Up the Database and Generating the Model
- Validating the Presence of Papers' Titles
- Getting Started with Testing: Fixtures
- Testing the Fixtures: Our First Test and First Test Helper
- Running Tests: Under Rake, Under Ruby
- How To Write a Test: Given, When, Then
- Philosophy of Testing
Well, we're airborne now. I posted Part 1 just before boarding a flight for LA and we just reached our cruising altitude.
At the end of Part 1, we'd thought our way through to a good starting design for the whole app and we were ready to start writing some real code. Specifically, we wanted to start with our central object: the Paper model. But before we write even our first line, we've got to do some setup and the tiniest bit more thinking.
Creating a new Rails project
First thing's first: run the "rails" command to generate the spine of a new Rails application in the file system:
gabc:~/Sites Greg$ rails archive
I ran this command from my "Sites" directory where I keep all my projects. It will generate a new folder in there called "archive" and inside it will create a whole bunch of files and folders which constitute a fresh default Rails application.
If you cd into this directory and run "rails --version" you may find that you've got an old version of the framework (mine was at 1.2). Rails is a relatively new framework and it's undergoing a ton of rapid development. This is good because it means that new features get added all the time which make your job easier and old bugs get fixed. To take full advantage of this situation, we want to always be running the most recent version (as I write this it's 1.6). Thankfully all this takes is a single command:
gabc:~/Sites/archive Greg$ rake rails:freeze:edge
We're using Rake, the handy-dandy Ruby build utility. Rake automates common ruby programming tasks like creating, writing, and running files (especially tests). We'll be using Rake constantly in the setup and development of our app; to see all that it can do run "rake -T" and you'll see a list of all the available rake commands with their descriptions. This particular rake command makes sure that we're always running the most recently released version of Rails, going out and grabbing any new versions that come along. When you run it, you'll probably see a bunch of subversion changes scroll down your screen as the framework gets updated to the most recent version.
Now, I've got to confess that I did all of this setup so far at home last night. I knew that I'd be working without internet access while I was traveling and obviously commands like "rake rails:freeze:edge" have to go out over the wire to get their job done. Also, since I was going to be traveling, I wanted to grab a local copy of the Rails documentation which I normally use online. So, if you're working with dependable web access you might skip this step, but it's nice to know how for when you need it:
gabc:~/Sites/archive Greg$ rake doc:rails
Rake will go ahead and check to see if you've got any of the documentation, downloading it and installing it in your project's doc/api directory where you don't. It will take a good chunk of time and download a whole bunch of files.
Designing the Paper Model
Ok, we're good to go. Setup is done. We could start generating app-specific files and writing code right now if we wanted, but just the slightest bit more thinking and note-taking is probably in order first. We decided at the end of our last post that we were going start work by building papers and then the surrounding paper-approval-category relationship. What we didn't discuss was any of the specifics of the Paper model itself. What is a "paper" really? What attributes does it have? Is that really the right name for it? During the electronic blackout period of our ascent here, I sketched some answers in my moleskine. I'll explain them now.
Oops. Speaking of electronic blackouts, I lost battery power just as I polished off that last paragraph. I spent the rest of the flight into LA napping and reading. Not altogether unpleasant. Now, I'm in the corner of an LAX gate about 100 yards from where my flight will board, hunched over the only open outlet in the vicinity, trying to catch a quick charge before my flight for NY boards in 45 minutes.
Anyway, the last question that I asked in the air over Oregon may seem kind of nit-picky, but when it comes to domain modeling, the names we chose for things turn out to be surprisingly important. They should be expressive and unambiguous. We need be able to remember what they mean without confusion upon returning to our code after a long break. A good rule of thumb is: would this name make sense to someone who knows about the domain, but is not in any way a coder? For example, we could call our main object Article instead of Paper. Usage differs even within academia. In the humanities they tend to be papers when delivered at conferences and articles when printed in journals. Students and teachers think of them as papers. Engineers and scientists tend to lean towards papers as well -- for them "article" has a more formal ring to it. I chose paper instead of article because it has less linguistic ambiguity and talking about "an editor's articles" makes me think of parts of speech as much as written documents. You'll find as we go along that I do some hand wringing each time a new name needs to be coined. The process is even tougher when dealing with join models and other nouns that don't have a precise correlation to words in the real world (at work right now we're thinking about changing the name of a model from Batch to Batching because it really represents an event wherein some things are joined together into a batch. Both of those choices sound ugly and are confusing in different contexts).
So, what attributes does a Paper have? Here's a transcription of the sketch I made on my way in from Portland:
- title
- created_at
- updated_at
- url?
- file_column?
The first attribute is pretty self-explanatory. The next two are time stamps; created_at tells you when the paper first entered our system and updated_at when it was last changed. These are pretty standard in database-driven web apps and if you include them on your models in a Rails app, Rails will automatically make sure that they get set in the way you'd expect.
A note here about attributes and the role of the database in a Rails app. So far, we've talked about our models in terms of the way they capture real world objects into the abstraction of our design. From another point of view, though, our models are simply representations in code of the database tables we're going to create. The database acts as persistent memory for our program. Here's how it works. At various points along the way, for example when we create a fresh object, the instance of our model will correspond exactly to the state of one row in our database. In concrete terms, if we wrote:
thesis = Paper.create :title => "It's Not Just Academic"
Then the object stored in "thesis" would correspond exactly with a row in the papers table. Each of its attribute-reader methods would return precisely the values of the corresponding columns in the database. Now, say we start changing the values of our paper's attributes like so:
thesis.title = "It Is Just Academic"
Well now the object we have in memory, the paper we're working with in our Ruby code, has diverged from the corresponding paper that we've got saved in the database. This will remain true until we call "save" -- at which point Rails will write our version of the object to the database updating each of the columns so they represent the current values of the attributes -- or "reload," which causes rails to revert the paper we've got in memory to the state that it has stored in the database, attributes will get reset to the values of their corresponding columns, whatever information we'd placed into those variables will be overwritten.
The last two attributes on our Paper model, url and file_column, represent two different ideas I had for keeping track of the location of the actual HTML files that our authors upload. The first and simpler of the two (the one I'll probably start with, in other words) is url. That would just be a string that keeps track of the location in the file system to which we uploaded the HTML file. Under this system, the part of our code that accepts uploads will have to be sure to record the uploaded-file's name so that we'll know where to look for it and how to link to it. The other option "file_column" represents an option I know a little less about, the File Column Plugin. I've never actually used it myself, but I've heard tell of a Rails plugin that allows you to store uploaded files in the actual database itself, handling all of the conversion code so that you can access the file from the database just as you would any other attribute stored there. That sounds intriguing and probably has important optimization repercussions (in other words, it probably plays a big part in determining what resource the application will consume most voraciously: memory on disk, database calls, processor time, etc.). Right now, storing the url as a string seems simpler to me so I'm going to start with that while making a note that the file column plugin is something I should look into more closely later.
Setting Up the Database and Generating the Model
Now that does it for theory and it's time to start actually coding our app (finally!). Wait. Wait. I just realized we've got one more small piece of configuration business to take care of: setting up and configuring the database. This bit is easy and once you've made a few Rails apps you'll be able to do it by rote. There are a ton of different combinations of databases, database engines, operating systems, etc. out there, so I'm just going to tell you what I have to do to get setup. If you're running on a contemporary Mac with a well-configured copy of MySQL things shouldn't be too different for you. If not, Google around, there are plenty of resources out there to help you get things right. Here we go:
First I've got to create the trio of databases on which a Rails app depends: development, test, and production. I'll do this from the command line:
gabc:~/Sites/archive Greg$ mysql -p -u root
(type your root password)
mysql> create database archive_development;
mysql> create database archive_test;
mysql> create database archive_production;
mysql> exit
Then, I'll open up config/database.yml and add my MySQL password to each of the three entires. Now we should be totally good to go. Serious this time. Let's run the server just to make sure:
gabc:~/Sites/archive Greg$ mongrel_rails start -d
Bringing up localhost:3000 in my browser I see: "Welcome aboard: You're riding the Rails!"
At last, it's time to get started on our Paper model. First I'll run the Rails model generator to get all of the files I'll need created and setup:
gabc:~/Sites/archive Greg$ script/generate model Paper
This'll give us, in addition to the model itself, a unit test and fixtures that are all set up and ready to go as well as a migration for setting up the database to handle our new model.
I'll write the migration next since we've basically done all the work already when thinking about what attributes our papers need to have. Here it is (archive/db/migrate/001_create_papers.rb):
class CreatePapers < ActiveRecord::Migration
def self.up
create_table :papers do |t|
t.column :title, :string
t.column :url, :string
t.column :updated_at, :datetime
t.column :created_at, :datetime
end
end
def self.down
drop_table :papers
end
end
The generator left me with empty self.up and self.down methods, which I've filled in to create the papers table with all the proper fields. Like I said above, the table that corresponds to our model is basically just another view on our model. When we save an individual Paper object the table will store the values that we've assigned to the object. And Rails provides us with convenient methods for reading them back out again. In a minute we'll get to using those, but first let's actually run our migration:
gabc:~/Sites/archive Greg$ rake migrate
Now the papers table exists and has the right fields. We can even go in right away and make a paper by hand if we want via Rails' "console", a shell the framework provide for interacting directly with our data. The console is a great place to sift through your data by hand or try out expressions when you're working on writing custom methods:
gabc:~/Sites/archive Greg$ script/console
>> thesis = Paper.new :title => "It's Not Just Academic"
=> #<Paper:0x26b6e5c @attributes={"updated_at"=>nil, "title"=>"It's Not Just Academic", "url"=>nil, "created_at"=>nil}, @new_record=true>
>> Paper.count
=> 0
>> thesis.save
=> true
>> Paper.count
=> 1
>> thesis
=> #<Paper:0x26b6e5c @attributes={"updated_at"=>Mon Aug 21 14:47:04 EDT 2006, "title"=>"It's Not Just Academic", "url"=>nil, "id"=>1, "created_at"=>Mon Aug 21 14:47:04 EDT 2006}, @new_record=false, @errors=#<ActiveRecord::Errors:0x2637a6c @base=#<Paper:0x26b6e5c ...>, @errors={}>>
>> thesis.title
=> "It's Not Just Academic"
If you follow along with that input, you'll see that I made a new paper with the title "It's Not Just Academic," storing it in a local variable called "thesis". Since I hadn't yet saved the new paper, there were still no papers to be found in the database. Then I did save it, which succeeded, returning true, and re-counted the papers in the database to discover that it was there now. Next, I looked at the object stored in thesis to find a paper different from the one I'd originally put there. It now had non-nil values for "created_at" and "updated_at" along with an additional instance variable by the name of @errors where Rails would store any errors that it happened upon while saving the object (you can read out the current errors on any object by saying something like this: thesis.errors.full_messages). And finally I used a method automatically added by Rails to read off the thesis's title attribute.
Validating the Presence of Papers' Titles
Ok. Now that we're past the total basics of getting our Paper model up and running, we can actually start doing something with it. What do we want the Paper model to do? Well, from when we thought about our screens earlier we know that when users upload papers they're going to be giving us two things: the title, and the HTML file. We're then going to need to store the title in the database, store the file in the filesystem, and store the file's location in the database as well, specifically in the url field we added to the papers table. It would be great if we could give the papers nice urls. For example, I'd love it if the url for my thesis could be something along the lines of: www.academicarchive.org/borenstein/art_history/its_not_just_academic.html. Now I don't want to think too hard about the "/borenstein/art_history" part right now because that's going to have to do with routing and right now I'm trying to concentrate on the Paper model. What I do know from this is that we don't want to save any papers into the database that don't have titles and we're going to want to figure out a system for making the titles our users give us safe to use as urls (there are rules about what can and can't be in a url, i.e. you can't have spaces, can't have apostrophes, they have to be under a certain length, etc.).
I want to take the first of these first: making sure that every paper we save in the database has a title. Thankfully, Rails makes this super easy with a system called validations. In essence, validations are just methods that automatically get run at different points in an object's life cycle (when you make a new one, when it gets saved, etc.), throwing errors unless the object meets certain criteria. When our app has actual views, we can use the validation errors to let our users know that they've done something wrong through on-screen feedback. At this point though, we're just going to use it to make sure that all of our papers have titles. The validation is a one-liner add, like so (in archive/app/models/paper.rb):
class Paper < ActiveRecord::Base
validates_presence_of :title
end
What does the Rails' implementation of this validation actually look like in practice? Let's jump into script/console and find out:
gabc:~/Sites/archive Greg$ script/console
Loading development environment.
>> thesis = Paper.new
=> #<Paper:0x2662e9c @attributes={"updated_at"=>nil, "title"=>nil, "url"=>nil, "created_at"=>nil}, @new_record=true>
>> thesis.title
=> nil
>> thesis.save!
ActiveRecord::RecordInvalid: Validation failed: Title can't be blank
from ./script/../config/../config/../vendor/rails/activerecord/lib/active_record/validations.rb:756:in `save!'
from (irb):3
You can see that we built a new paper and didn't assign it a title. Then when we tried to save the paper, Rails raised an "ActiveRecord::Record Invalid" error that included a message explaining its cause and a traceback showing us exactly where in the code the problem came up (we called "save!" with the exclamation mark at the end because that tells Rails to throw an error in our face if one comes up instead of simply failing silently).
Getting Started with Testing: Fixtures
Now that we've finally written some actual code, our next job is to make sure that code actually works as we expect it to and that means tests. Testing is a big subject, but suffice it to say here that it has two main purposes: to make sure our code does what we think it does and to make it easy for us to change our code later on (if we make a major change and all the tests still pass, that's a good sign that the rest of our code still works; if they don't, well that means we've probably got some fixing to do). (Don't worry if you're totally new to testing and the whole concept seems a little fuzzy to you. It will become clear in a minute when we actually write our first test -- tests are one of those things, like spiral staircases, that are much easier to show than to describe.)
Anyway, for our tests to be most effective, we want to cover as much of our code as possible and that means starting right away. The more untested code you write the less likely you are to ever go back and add tests and the more likely you are to end up with confusing, unmaintainable code. In fact, some people insist that you should "test first," writing tests that define the behavior you want from your code before writing your code itself. That way you don't "overcode"; you make sure not only that your code works, but that it doesn't have any undesirable side effects. We may do some test first development a little later on, but right now we're in a simple enough situation that I'm perfectly happy to start testing with a whopping one line of existing code.
What do we want to test? We want to test that our code actually does require each paper to have a title like we're trying to get it to and, further, that a paper without a title will always throw an error. So, the first thing we need is some fake papers to play around with for testing. As part of its testing suite, Rails gives us a place to create these papers: the fixtures. You can think of fixtures as just like tables in the database, only they happen to be represented in a flat file. At the start of a test run, Rails loads the data in these files into a temporary testing database so you can access it in your test methods. This makes it perfect for creating different scenarios against which to run your code and make sure that it does the right thing. In our case, we're going to want to make some papers and see if our code can tell whether or not they're valid.
Rails already created our fixture file for us when we generated the Paper model, so let's open it up and take a look (it lives at test/fixtures/papers.yml):
# Read about fixtures at http://ar.rubyonrails.org/classes/Fixtures.html
first:
id: 1
another:
id: 2
Here's how this works: the non-indented lines are "names" by which we can refer to each entry. The other lines are pairs of column names and row values in the table. It will quickly become clear if I show you how I turned the version of my thesis we were playing with before in script/console into a fixture:
thesis:
id: 1
title: "It's Not Just Academic"
created_at: 2006-08-21 09:34:28
updated_at: 2006-08-21 09:34:28
Pretty self-explanatory. The one gotcha is the format of the "created_at" and "updated_at" fields, which look different than what Ruby printed to the screen when we were in script/console. This is MySQL datetime format. When I can't remember how it goes, I make a new record in script/console and then just go look at my database using a GUI tool like YourSQL (especially when I'm on an airplane on the way from NY to San Francisco with no access to the web). There are a few other things that commonly go wrong when working with fixtures and I'll just point them out here, while we're on the subject: (1) the .yml format (rhymes with "camel") is super picky about white space; indentations need to be 2-spaces wide, there can only be one space between the colon and the value, etc. (2) each entry in a particular fixture file needs to have a unique id; if you accidentally re-use the same id twice in one file everything will go haywire. (3) the test database doesn't necessarily get reloaded each time you run your test, only if you run it under rake; sometimes this can get especially confusing because the fixtures that get loaded up for one test tend to stick around for the next one and so you can have tests that pass or fail depending on what order you run them in (for example a functional test that fails when you run "rake test:functionals" may pass if you run just "rake" (which runs the units first before the functionals)).
If you're totally new to tests, some of that may have just seemed like gibberish. Don't worry about it. You can always reread that paragraph if you're running into mysterious errors as some future point done the line. . .
Testing the Fixtures: Our First Test and First Test Helper
I'm back in Portland now and recovered from my travels. Where were we? That's right. We've got our fixture in place so it's time to write some tests! Before we try and test our actual code, though, it's probably a good idea to make sure that our fixture itself is well-formed, or else our tests will be pretty useless. I've got a little test helper method from some earlier projects that's super helpful for this (for full disclosure, like most things it was probably actually Chris's idea). If we want a method to be available to all our test, we just stick it in test/test_helper.rb, so that's where we'll stick the following code (there's a helpful little comment in test_helper.rb that will guide you once you once you're in there):
def assert_all_valid klass
klass.find(:all).each do |obj|
assert obj.valid?, "#{obj.class} with id #{obj.id} is invalid"
end
end
Let's walk through this method. First of all, it takes a class as an argument. Since "class" itself is a reserved word (a word that has special properties in Ruby and is hence unavailable as a name for a normal variable) we call it "klass". We might as well have called it "bob," but "klass" is conventional because it's easy to remember what it means. Once that's understood, there's not too much else going on here. We use Rails' "find(:all)" syntax to find all the members of our class and then we assert the validity of each particular member in turn, printing out a helpful message if the object is not valid. When defining custom test_helper methods of your own you'll save yourself a lot of headaches if you add as specific as possible of an error message so that, when the test fails, it will be clear what went wrong as well as, importantly, which particular objects or attributes were involved (hence the inclusion of obj.class and obj.id in the message).
A note of syntactical explanation: Rails adds a method to our objects called "valid?" that returns true if the object passes its class's validations and false if not; "assert" is the simplest testing method, passing if its argument is true and failing if it is false. Put these two together and you've got a test that passes if and only if the object is valid.
Now, let's write and run the test. In the test for our Paper model that Rails automatically stubbed out for us (test/unit/paper_test.rb), we'll replace the sample method with:
def test_fixtures
assert_all_valid Paper
end
save the file and then run the tests like so:
gabc:~/Sites/archive Greg$ rake test:units
(in /Users/Greg/Sites/archive)
/opt/local/bin/ruby -Ilib:test "/opt/local/lib/ruby/gems/1.8/gems/rake-0.7.1/lib/rake/rake_test_loader.rb" "test/unit/paper_test.rb"
Loaded suite /opt/local/lib/ruby/gems/1.8/gems/rake-0.7.1/lib/rake/rake_test_loader
Started
F
Finished in 0.186302 seconds.
1) Failure:
test_fixtures(PaperTest)
[./test/unit/../test_helper.rb:30:in `assert_all_valid'
./test/unit/../test_helper.rb:29:in `assert_all_valid'
./test/unit/paper_test.rb:7:in `test_fixtures']:
Paper with id 2 is invalid.
<false> is not true.
1 tests, 2 assertions, 1 failures, 0 errors
rake aborted!
Command failed with status (1): [/opt/local/bin/ruby -Ilib:test "/opt/local...]
What's this? Our very first test and we've already failed it! Well, thanks to the message we added to our custom assertion, it's really easy to tell what's going on: we have an invalid paper in our fixtures (when tests fail or throw errors they print out Es and Fs and then report back on the problem with a trace, showing which lines in which files got run before the problem hit; if you're trying to track down a less obvious problem than this one, that trace will be your lifeline). If we look at our paper fixtures (test/fixtures/papers.yml), we'll see that, in addition to the paper we created above, we've got the second one that Rails automatically created for us still hanging around:
another:
id: 2
And that paper is definitely not valid. Remember, we're validating the presence of our papers' titles and this one hasn't got one. It's only got an id. So, in order to get this test to pass, we've got to either delete this paper from our fixture or edit it so it'll be valid. Let's do the latter, like so:
another:
id: 2
title: "Simulacra and Simulacrum"
created_at: 1996-08-21 09:34:28
updated_at: 1996-08-21 09:34:28
Now, saving the file and rerunning should result in our first clean test run:
gabc:~/Sites/archive Greg$rake db:test:prepare
(in /Users/Greg/Sites/archive)
rubygabc:~/Sites/archive Greg$ruby test/unit/paper_test.rb
Loaded suite test/unit/paper_test
Started
.
Finished in 0.194362 seconds.
1 tests, 2 assertions, 0 failures, 0 errors
This is great! After a little bit of setup, we've successfully tested the code we just wrote: our validation catches papers that don't have titles.
Looking a little closer at the output from the test run, notice that we got credit for two assertions rather than just one. That's because rake counted the internal call to "assert obj.valid?" as well as the direct call to "assert_all_valid" itself. If we had two papers in our fixtures, rake would have told us we wrote three assertions, and so on.
Running Tests: Under Rake, Under Ruby
It is probably worthwhile to spend a moment here on some of the specifics involved when running tests. There are four basic ways to run tests: a "full rake", just the units (the tests that exercise our models), just the functionals (those that exercise our controllers), or individual test files one at time. The first three we do by invoking rake ("rake", "rake test:units", and "rake test:functionals" respectively) and the last we do by just running the test file as if it was any other ruby program ("ruby test/units/paper_test.rb", for example). When you run your tests, Rails uses a different database from the one you're developing on. If you remember some of the configuration we did above, when we set up our database.yml file, we told Rails to use a database called "archive_test" for this purpose. At the start of each run, rake clears that database and then loads it up with the data you stored in your fixtures so that you'll have a controlled environment in which to do your testing. Further, the Rails testing framework keeps the data generated in each test method from polluting your database for other methods. Each test method gets a clean start.
Besides running different sets of test files, each of the three different rakes (full, units, and functionals) does this database destroying and recreating process separately. So, if you run a full rake, your test database gets destroyed and recreated twice, once at the start of the rake when the units run and once halfway through before the functionals do. Since rake only loads up the tables that you tell it to (by including different sets of fixtures at the top of each of your test files) this ordering can mean that you can get different results from the same test! Let's say you were working on a functional test. When you run that test under rake test:functionals only the set of tables explicitly asked for in the functionals tests get loaded. Under a full rake the units run first, so by the time your tests get run, the tables created by the units will still be hanging around. If your tests passage or failure hinges on this difference, you'll see different behavior in the two situations. If you encounter this issue just make sure that each of your tests calls all of the fixtures that it needs (don't forget the ones being referenced through associations either!).
And finally when you're just running a single test like "ruby test/units/paper_test.rb" -- which can be a real time saver once you've got a lot of tests written and running the whole suite takes a full minute or two -- you don't have the benefit of rake's database loading at all. Your test will run with whatever the current state of the test database was leftover from your last rake. This can result in some seriously strange results that will have you chasing ghost bugs that aren't really there. To prevent that problem, simply run "rake db:test:prepare" before your test and rake will setup your test database just how you want it.
How To Write a Test: Given, When, Then
Now, while our first test definitely exercised the code we just wrote (the validation obviously got run), it plays kind of a more general role: guarding our paper fixtures from any invalid data. More to the point, if we stopped validating on the presence of a paper's title, the test would still pass (try it, go delete the whole line and then rerun your tests). Therefore, this can't quite be said to be a test on that validation as such. So, let's write one.
How, generally, do you write a test? Well, most tests have three parts: the setup that must be in place to accomplish some action, the actual code that runs the action (this is the code you're trying to test), and then some ideas about what we expect the effect of that action to be. Splitting these parts up in your mind and then addressing them one at a time usually makes it much easier to write a test. When I start my test methods, I find it helps to start by writing these parts down explicitly as comments so I can keep focused on exactly what I have to do (plus it lets me do a bunch of typing, which feels productive, without having to actually do any thinking), like so (in test/units/paper_test.rb):
def test_validates_presence_of_title
#given
#when
#then
end
Giving tests descriptive names is always a good idea since the whole point of them is that if you ever see them in a test run they should tell you exactly what's gone wrong. Rails will only run test methods that actually start with "test_", so a good recipe for naming tends to be appending some description of what you're testing onto there.
Back to the question of how to test our validation. Let's try to say the three parts of our test in words. Given a paper that has no title, when we try to save it, then the paper should throw an error, remain unsaved, and report itself invalid. Now, that's starting to sound like something I could write up in code. I'll give it a shot. Here's my first draft:
def test_validates_presence_of_title
#given
p = Paper.new
#when
p.save!
#then
assert !p.valid?
end
I make a new paper. Don't assign it a title. Try to save it. And then assert that it is not valid. Just like I planned. What happens when I run that test?
1) Error:
test_validates_presence_of_title(PaperTest):
ActiveRecord::RecordInvalid: Validation failed: Title can't be blank
/Users/Greg/Sites/archive/config/../vendor/rails/activerecord/lib/active_record/validations.rb:756:in `save!'
./test/unit/paper_test.rb:14:in `test_validates_presence_of_title'
Oops! Trying to save the paper failed, like it was supposed to, but the error that it threw prevented the rest of our test from executing. What we need to do is wrap our save call in an assertion which knows to expect the error, like so:
def test_validates_presence_of_title
#given
p = Paper.new
#when
assert_raises(ActiveRecord::RecordInvalid){p.save!}
#then
assert !p.valid?
end
This is a passing test. Assert_raises takes an error type as an argument (thankfully we knew exactly what type of error to expect since we'd already seen it on the first run) and passes only if the code in its block throws that error.
Now, I'll show you just one more iteration of this test with a few more trimmings:
def test_validates_presence_of_title
#given
paper_count = Paper.count
p = Paper.new
assert !p.title
#when
assert_raises(ActiveRecord::RecordInvalid){p.save!}
#then
assert !p.valid?
assert_equal paper_count, Paper.count
end
What have I added? Start with the first and last lines. One of the things we'd said we wanted to test was that the paper should remain unsaved. Well, there's two sides to that: the object's side and the model's side. We're already testing for the error thrown by the call to "save!", but now we want to test the model side, i.e. that the number of papers in the database doesn't change. To test that, we store the count of papers into a local variable (paper_count) on the first line and then compare it to a fresh count on the last line ("count" is a useful method that Rails adds to all of your model classes, it returns the result of Model.find(:all).length). As long as these two are the same, we'll know that nothing we've done has affected the count of papers in the database.
The other thing I've added is the assertion that, just after it is newly made, the paper does not have a title. While somewhat extraneous, the purpose of this assertion is to make explicit one of the assumptions in our given state: a new paper doesn't have a title. Since it's the very absence of that title that renders the paper invalid, it made sense to write an assertion verifying it before getting to the heart of the matter.
Philosophy of Testing
Is this overkill? This particular example is obviously somewhat contrived. I probably wouldn't be this thorough in testing such a simple situation if I wasn't trying to demonstrate the ins and outs of my thought process while writing tests. But what should our "philosophy of testing" be? Is it possible to have too many tests? What should be the thrust of the tests that we do write?
Like so many other things, answers to these questions are partially a matter of taste and partially a matter of responding to the particular situation you find yourself in, both of which are things that are hard to learn through any other method besides experience (I work all day with coders who are better at them than I), but I think I can lay down a few guidelines that help guide my thinking.
Let's start with some don'ts:
- Don't test something that's part of the framework or a third-party library. If you don't trust other people's code enough to use it without redundant testing, you should probably just avoid using it altogether. Plus, this is just unnecessary extra work when the whole point of using libraries and frameworks is to avoid duplicating effort that other people have already put in. (To a certain extent we're breaking this rule in our test above, but not too badly. The key difference is that we're testing whether we've successfully used the framework to enforce a business logic rule (that papers must have titles) rather than whether or not the framework's code for enforcing that rule works in the first place.)
- Don't let your tests lock down the specifics of your code too much. When I first got into the swing of writing tests, I got hooked on assertions. I wanted to run up the score, to see more dots zoom across my screen. And so for awhile, I picked up the bad habit of writing assertions on everything I knew to be true in my code: the exact wording of error messages, the exact values of a bunch of attributes in the fixtures, etc. This turned out to be a bad idea because it made my tests incredibly fragile. Anytime I'd twiddle around with my fixtures at all (say, to fix a typo), my tests would break. My tests were making more work for me when they were supposed to make my life easier. Which brings me to. . .
-
Do write tests that ensure outcomes. Our goal with writing tests is to leverage a specific situation we've thought of (and, often, captured in the fixtures) into a general structure that will make sure that our code will act right in all situations. For example, in testing our validation above, we could have written something like this:
def test_validates_presence_of_title #given p = Paper.new assert !p.valid? #when p.title = "My title" #then assert p.valid? endOn the surface, this test seems a lot like the one we wrote above. It asserts that a paper without a title is not valid, adds a title to the paper, and then asserts that paper is valid. What it doesn't do is engage with the more general purpose of our validation: preventing papers that lack titles from getting saved to the database. It also has some specifics hard coded into it: the choice of "My title" as a title. While that seems fine right now, what if we made a change later on that, say, required all of our titles to be formatted in unicode for internationalization? Then this test would start to fail even though it was unrelated to the new code we were trying to write. It would become yet another spot in our code we had to change to add a new feature or to alter our design. -
Do write tests first to specify behavior. Often times tests are just a better medium in which to think about the design for your program than the program itself. Writing a test lets your think precisely about what you want your test to do without worrying about how it's going to have to get it done. For example, take the goal I mentioned of having pretty urls for our papers (getting the url for my thesis to end with "its_not_just_academic.html"). Well, I still don't have a clear plan for how to accomplish that goal, but I know how to write a test on it:
def test_paper_url #given p = Paper.new :title => "It's Not Just Academic" #when p.save! #then assert_equal "its_not_just_academic.html", p.url endRight now, running this test will result in a failure:1) Failure: test_paper_url(PaperTest) [./test/unit/paper_test.rb:28]: <"its_not_just_academic.html"> expected but was <nil>.But now I've got the beginning of a kind of objective standard against which I can write my system for generating papers' urls from their
