Art – Ideas For Dozens

2H2K Lawyer: Science Fiction Design, Artificial Labor, and Ubiquitous Interactive Machine Learning

greg — Tue, 07 Jan 2014 16:11:22 +0000

Intro

“2H2K: LawyeR” is a multimedia project exploring the fate of legal work in a future of artificial labor and ubiquitous interactive machine learning.

This project arose out of 2H2K, my ongoing collaboration with John Powers where we’re trying to use science fiction, urbanism, futurism, cinema, and visual effects to imagine what life could be like in the second half of the 21st century. One of the major themes to emerge in the 2H2K project is something we’ve taken to calling “artificial labor”. While we’re skeptical of the claims of artificial intelligence, we do imagine ever-more sophisticated forms of automation transforming the landscape of work and economics. Or, as John puts it, robots are Marxist.

Due to our focus on urbanism and the built-environment, John’s stories so far have mainly explored the impact of artificial labor on physical work: building construction, forestry, etc. For this project, I wanted to look at how automation will affect white collar work.

Having known a number of lawyers who worked at large New York firms such as Skadden and Kirkland and Ellis, one form of white collar work that seemed especially ripe for automation jumped out to me: document evaluation for legal discovery. As I’ll explain in more detail below, discovery is the most labor-intensive component of large corporate lawsuits and it seems especially amenable to automation through machine learning. Even the widespread application of technologies that already exist today would radically reduce the large number of high-paid lawyers and paralegals that currently do this work.

In the spirit of both 2H2K and the MIT Media Lab class, Science Fiction to Science Fabrication (for which this project acted as a final), I set out to explore the potential impact of machine learning on the legal profession through three inter-related approaches:

Prototyping a real interactive machine learning system for legal discovery.
Writing and illustrating a sci-fi comic telling the story of how it might feel to work in a law firm of 2050 that’s been transformed by this new technology.
Designing the branding for an imaginary firm working in this field.

For the rest of this post, I’ll discuss these parts of the project one-by-one and describe what I learned from each. These discussions will range from practical things I learned about machine learning and natural language processing to interface design issues to the narrative possibilities I discovered in my technical research (for example, the relationship between legal discovery and voyeurism).

Before beginning, though, I want to mention one of the most powerful and surprising things I learned in the course of this project. Using science fiction as the basis of a design process has lead me to think that design fiction is incredibly broken. Most design fiction starts off with rank speculation about the future, imagining a futuristic device or situation out of whole cloth. Only then does it engage prototyping and visual effects technologies in order to communicate the consequences of the imagined device through “diegetic prototypes”, i.e. videos or other loosely narrative formats that depict the imagined technology in use.

This now seems perfectly backwards to me. For this project, by contrast, I started with a real but relatively cutting edge technology (machine learning for document recall). I then engaged with it as a programmer and technologist until I could build a system that worked well enough to give me (with my highly specialized technical knowledge) the experience of what it would be like to really use such a system in the real world. Having learned those lessons, I then set out to communicate them using a traditional storytelling medium (in this case, comics). I used my technical know-how to gain early-access to the legendarily unevenly distributed future and then I used my storytelling ability to relay what I learned.

Design fiction uses imagination to predict the future and prototyping to tell stories. Imagination sucks at resolving the complex causes that drive real world technology development and uptake. Prototyping sucks at producing the personal identification necessary to communicate a situation’s emotional effect. This new process – call it Science Fiction Design, maybe? – reverses this mistake. It uses prototyping and technological research to predict the future and storytelling media to tell stories.

(Much of the content of this post is reproduced in the third episode of the 2H2K podcast where John and I discuss this project. The 2H2K podcast consists of semi-regular conversations between the two of us about the stories and technologies that make up the project. Topics covered include urbanism, labor, automation, robots, interactive machine learning, cross-training, cybernetics, and craft. You can subscribe here.)

What is Discovery?

According to wikipedia:

Discovery, in the law of the United States, is the pre-trial phase in a lawsuit in which each party, through the law of civil procedure, can obtain evidence from the opposing party by means of discovery devices including requests for answers to interrogatories, requests for production of documents, requests for admissions and depositions.

In other words, when you’re engaged in a lawsuit, the other side can request internal documents and other information from your company that might help them prove their case or defend against yours. This can include internal emails and memos, call records, financial documents, and all manner of other things. In large corporate lawsuits the quantity of documents involved can be staggering. For example, during the US government’s lawsuit against Big Tabacco six million documents were discovered totaling more than 35 million pages.

Each of these documents needs to be reviewed for information that is relevant to the case. This is not simply a matter of searching for the presence of absence of particular words, but making a legal judgment based on the content of the document. Does it discus a particular topic? Is it evidence of a particular kind of relationship between two people? Does it represent an order or instruction from one party to another?

In large cases this review is normally performed by hordes of first year associates, staff attorneys, and paralegals at large law firms. Before the crash of 2008, large law firms, which do the bulk of this kind of work and employ hundreds or even thousands of such workers, hired more than 30% of new law school graduates (see What’s New About the New Normal: The Evolving Market for New Lawyers in the 21st Century by Bernard A. Burk of UNC Chapel Hill).

As you can imagine, this process is wildly expensive both for law firms and their clients.

Legal Discovery and Machine Learning

Legal discovery is a perfect candidate for automation using recent advances in machine learning. From a machine learning perspective discovery is a classification task: each document must be labeled as either relevant or irrelevant to the case. Since the legal issues, people involved, and topics discussed vary widely between cases, discovery is a prime candidate for supervised learning, a machine learning approach where humans provide labels for a small subset of documents and then the machine learning system attempts to generalize to the full set.

Machine learning differs from traditional information retrieval systems such as full-text search exactly because of this ability to generalize. Machine learning systems represent their documents as combinations of “features”: the presence or absence of certain words, when a message was sent, who sent it, who received it, whether or not it includes a dollar amount or a reference to stock ticker symbol, etc. (Feature selection is the single most critical aspect of machine learning engineering; more about it below when I describe the development of my system.) Supervised machine learning algorithms learn the patterns that are present in these features amongst the labeled examples they are given. They learn what types of combinations of features characterize documents that are relevant vs irrelevant and then they classify a new unseen document by comparing its features.

Information retrieval systems are currently in widespread use throughout the legal field. One of the landmark information retrieval systems, IBM’s STAIRS system was even originally developed in order to reduce the expense of defending against an antitrust lawsuit in 1969 before being commercialized in 1973.

However, there is little public sign that machine learning techniques are in widespread use at all. (It’s impossible to know how widely these techniques are used within proprietary systems inside of firms, of course.) One of the most visible proponents of machine learning for legal discovery is former Bell Labs researcher, David Lewis. Lewis’s Purdue lecture, Machine Learning for Discovery in Legal Cases represents probably the best public survey of the field.

This seems on the verge of changing. In a March 2011 story in the New York Times, Armies of Expensive Lawyers, Replaced by Cheaper Software John Markoff reported on burgeoning set of companies beginning to compete in this field including Clearwell Systems, Cataphora, Blackstone Discovery, and Autonomy, which has since been acquired by HP. Strikingly, Bill Herr, one of the lawyers interviewed for Markoff’s story, used one of these new e-discovery systems to review a case his firm had worked in the 80s and 90s and learned that the lawyers had only been 60 percent accurate, only “slightly better than a coin toss”.

Prototyping an Interactive Machine Learning System for E-Discovery

Having reviewed this history, I set out to prototype a machine learning system for legal discovery.

The first thing I needed in order to proceed was a body of documents from a legal case against which I could train and test a classifier. Thankfully in Brad Knox’s Interactive Machine Learning class this semester, I’d been exposed to the existence of the Enron corpus. Published by Andrew McCallum of CMU in 2004, the Enron corpus collects over 650,000 emails from 150 users obtained during the Federal Energy Regulatory Commission’s investigation of Enron and made public as part of the federal case against the company. The Enron emails make the perfect basis for working on this problem because they represent real in situ emails from a situation where there were actual legal issues at stake.

After obtaining the emails, I consulted with a lawyer in order to understand some of the legal issues involved in the case (I chose my favorite criminal defense attorney: my dad). The government’s case against Enron was huge, sprawling, and tied up with many technicalities of securities and energy law. We focused on insider trading, situations where Enron employees had access to information not available to the wider public, which they used for their own gain or to avoid losses. In the case of Enron this meant both knowledge about the commodities traded by the company and the company’s own stock price, especially in the time period of the later’s precipitous collapse and the government’s investigation.

The World of Martin Cuilla

With this knowledge in hand, I was ready to label a set of Enron emails in order to start the process of training a classifier. And that’s when things started getting personal. In order to label emails as relevant or irrelevant to the question of insider training I’d obviously need to read them.

So, unexpectedly I found myself spending a few hours reading 1028 emails sent and received by Martin Cuilla, a trader on the Western Canada Energy Desk at Enron. To get started labeling, I chose one directory within the dataset, a folder named “cuilla-m”. I wasn’t prepared for the intimate look inside someone’s life that awaited me as I conducted this technical task.

Of the 1028 emails belonging to Mr. Cuilla, about a third of them relate to the Enron fantasy football league, which he administered:

A chunk of them from early in the dataset reveal the planning details of Cuilla’s engagement and wedding.

They include fascinating personal trivia like this exchange where Cuilla buys a shotgun from a dealer in Houston:

In the later period of the dataset, they include conversations with other Enron employees who are drunk and evidence of Cuila’s drinking and marital problems:

As well as evidence of an escalating gambling problem (not a complete shocker in a day trader):

And, amongst all of this personal drama, there are emails that may actually be material to the case where Cuilla discusses predictions of gas prices:

orders trades:

and offers to review his father’s stock portfolio to avoid anticipated losses (notice that his father also has an Enron email address):

In talking to friends who’ve worked at large law firms, I learned that this experience is common: large cases always become soap operas. Apparently, it’s normal when reading the previously private correspondence of any company to come across evidence of at least a few affairs, betrayals, and other such dramatic material. Part of working amongst hundreds of other lawyers, paralegals, and staff on such a case is the experience of becoming a collective audience for the soap opera that the documents reveal, gossiping about what you each have discovered in your reading.

As I learned in the course of building this prototype: this is an experience that will survive into a world of machine learning-based discovery. However, it will likely be transformed from the collective experience of large firms to a far more private and voyeuristic one as individuals (or distributed remote workers) do this work alone. This was an important revelation for me about the emotional texture of what this work might feel like in the future and (as you’ll see below) it became a major part of what I tried to communicate with the comic.

Feature Engineering and Algorithm Selection

Now that I’d labeled Martin Cuilla’s emails, I could begin the process of building a machine learning system that could successfully predict these labels. While I’ve worked with machine learning before, it’s always been in the context of computer vision, never natural language.

As mentioned above, the process of designing machine learning systems have two chief components: features engineering and learning algorithm selection. Feature engineering covers what information you extract from each document to represent it. The learning algorithm is how you use those features (and your labels) to build a classifier that can predict labels (such as relevant/irrelevant) for new documents. Most of the prestige and publicity in the field goes to the creation of learning algorithms. However, in practice, feature engineering is much more important for solving real world problems. The best learning algorithm will produce terrible results with the wrong features. And, given, good feature design, the best algorithms will only incrementally outperform the other options.

So, in this case, my lack of experience with feature engineering for natural language was a real problem. I barged forwards nonetheless.

For my first prototype, I extracted three different kinds of features: named entities, extracted addresses, and date-sent. My intuition was that named entities (i.e. stock symbols, company names, place names, etc) would represent the topics discussed, the people sending and receiving the messages would represent the command structure within Enron, and the date sent would relate to the progress of the government’s case and the collapse of the company.

I started by dividing Martin Cuilla’s emails into training and testing sets. I developed my system against the training set and then tested its results against the test set. I used CoreNLP, an open source natural language processing library from Stanford to extract named entities from the training set. You can see the results in the github repo for this project here, (Note: all of the code for this project is available in my github repo: atduskgreg/disco and the code from this stage of the experiment is contained in this directory). I treated this list as a “Bag of Words”, creating a set of binary features corresponding to each entity with the value of 1 given when an email included the entity and 0 when it did not. I then did something similar for the email addresses in the training set, which I also treated as a bag of words. Finally, to include the date, I transformed the date into a single feature: a float which was scaled to the timespan covered by the corpus. In other words, a 0.0 for this feature would mean an email was sent at the very start of the corpus and a 1.0 that it was the last email sent. The idea being that emails sent close together in time would have similar values.

For the learning algorithm, I selected Random Decision Forest. Along with Support Vector Machines, Random Decision Forests are amongst the most effective widely-deployed machine learning algorithms. Unlike SVMs though, Random Decision Forests have a high potential for transparency. Due to the nature of the algorithm, most Random Decision Forest implementations provide an extraordinary amount of information about the final state of the classifier and how it derived from the training data (see my analysis of Random Decision Forrest’s interaction affordances for more). I thought this would make it a superior choice for an interactive e-discovery system since it would allow the system to explain the reasons for its classifications to the user, increasing their confidence and improving their ability to explore the data, add labels, tweak parameters, and improve the results.

Since I maintain the OpenCV wrapper for Processing and am currently in the process of integrating OpenCV’s rich machine learning libraries, I decided to use OpenCV’s Random Decision Forest implementation for this prototype.

Results of the First Prototype: Accuracy vs Recall

The results of this first prototype were disappointing but informative. By the nature of legal discovery, it will always be a small minority of documents that are relevant to the question under investigation. In the case of Martin Cuilla’s emails, I’d labeled about 10% of them as relevant. This being the case, it is extremely easy to produce a classifier that has a high rate of accuracy, i.e. that produce the correct label for a high percentage of examples. A classifier that labels every email as irrelevant will have an accuracy rate around 90%. And, as you can see from the console output in the image above, that’s exactly what my system achieved.

While this might sound impressive on paper, it is actually perfectly useless in practice. What we care about in the case of e-discovery is not accuracy, but recall. Where accuracy measures how many of our predicted labels were correct, recall measures how many of the total relevant messages we found. Whereas accuracy is penalized for false positives as well as false negatives, recall only cares about avoiding false negatives: not missing any relevant messages. It is quite easy for a human to go through a few thousand messages to eliminate any false positives. However, once a truly relevant message has been missed it will stay missed.

With the initial approach, our classifier only ever predicted that messages were irrelevant. Hence, the 90+% accuracy rate was accompanied by a recall rate of 0. Unacceptable.

Improving Recall: Lightside and Feature Engineering for Text

In order to learn how to improve on these results, I consulted with Karthik Dinakar, a PhD candidate at the lab who works with Affective Computing and Software Agents and is an expert in machine learning with text. Karthik gave some advice about what kinds of features I should try and pointed me towards Lightside.

Based on research done at CMU, Lightside is a machine learning environment specifically tailored to working with text. It’s built on top of Weka, a widely-used GUI tool for experimenting with and comparing machine learning algorithms. Lightside adds a suite of tools specifically to facilitate working with text documents.

Diving into Lightside, I set out to put Karthik’s advice into action. Karthik had recommended a different set of features than I’d previously tried. Specifically, he recommended unigrams and bigrams instead of named entities. Unigrams and bigrams one- and two-word sequences, respectively. Their use is widespread throughout computational linguistics.

I converted the emails and my labels to CSV and imported them into Lightside. Its interface made it easy to try out these features, automatically calculating them from the columns I indicated. Lightside also made it easy to experiment with other computed features such as regular expressions. I ended up adding a couple of regexes designed to detect the presence of dollar amounts in the emails.

Lightside also provides a lot of additional useful information for evaluating classifier results. For example, it can calculate “feature weights”, how much each feature contributed to the classifier’s predictions.

Here’s a screenshot showing the highest-weighted features at one point in the process:

The first line is one of my regexes designed to detect dollar amounts. Other entries are equally intriguing: “trades”, “deal”, “restricted”, “stock”, and “ene” (Enron’s stock ticker symbol). Upon seeing these, I quickly realized that they would make an excellent addition to final user interface. They provide insight into aspects of the emails the system has identified as relevant and potentially powerful user interface hooks for navigating through the emails to add additional labels and improve the system’s results (more about this below when I discuss the design and implementation of the interface).

In addition to tools for feature engineering, Lightside makes it easy to compare multiple machine learning algorithms. I tested out a number of options, but Random Decision Forest and SVN performed the best. Here were some of their results early on:

As you can see, we’re now finally getting somewhere. The confusion matrices compare the models’ predictions for each value (0 being irrelevant and 1 being relevant) with reality, letting you easily see false negatives, false positives, true negatives, and true positives. The bottom row of each matrix is the one that we care about. That row represents the relevant emails and shows the proportions with which the model predicted 0 or 1. We’re finally getting predictions of 1 for about half of the relevant emails.

Notice also, the accuracy rates. At 0.946 the Random Decision Forest is more accurate than the SVM at 0.887. However, if we look again at the confusion matrix, we can see that the SVM detected 11 more relevant emails. This is a huge improvement in recall so, despite Random Forest’s greater potential for transparency, I selected SVM as the preferred learning algorithm. As we learned above, recall matters above all else for legal discovery.

Building a Web Interface for Labeling and Document Exploration

So, now that I had a classifier well-suited to detecting relevant documents I set out to build an interface that would allow someone with legal expertise to use it for discovery. As in many other interactive machine learning contexts, designing such an interface is a problem of balancing the rich information and options provided by the machine learning algorithms with the limited machine learning knowledge and specific task focus of the user. In this case I wanted to make an interface that would help legal experts do their work as efficiently as possible while exposing them to as little machine learning and natural language processing jargon as possible.

(An aside about technical process: the interface is built as a web application in Ruby and Javascript using Sinatra, DataMapper, and JQuery. I imported the Enron emails into a Postgres database and setup a workflow to communicate bidirectionally with Lightside via CSVs (sending labels to Lightside and receiving lists of weighted features and predicted labels from Lightside). An obvious next iteration would be to use Lightside’s web server example to provide classification prediction and re-labeling as an HTTP API. I did some of the preliminary work on this and received much help from David Adamson of the Lightside project in debugging some of the problems I hit, but was unable to finish the work within the scope of this prototype. I plan to publish a simple Lightside API example in the future to document what I’ve learned and help others who’d like to improve on my work.)

The interface I eventually arrived at looks a lot like Gmail. This shouldn’t be too surprisingly since, at base, the user task is quite similar to Gmail’s users: browse, read, search, triage.

In addition to providing a streamlined interface for browsing and reading emails, the basic interface also displays the system’s predictions: highlighting in pink messages predicted as relevant. Further, it allows users to label messages as relevant or irrelevant in order to improve the classifier.

Beyond basic browsing and labeling, the interface provides a series of views into the machine learning system designed to help the user understand and improve the classifier. Simplest amongst these is a view that shows the system’s current predictions grouped by whether they’re predicted to be relevant or irrelevant. This view is designed to give the user an overview of what kind of messages are being caught and missed and a convenient place to correct these results by adding further labels.

The messages that have already been labeled show up in a sidebar on all pages. Individual labels can be removed if they were applied mistakenly.

The second such view exposes some technical machine learning jargon but also provides the user with quite a lot of power. This view shows the features extracted by Lightside, organized by whether they correlate with relevant or irrelevant emails. As you can see in the screenshot above, these features are quite informative about what message content is found in common amongst relevant emails.

Further, each feature is a link to a full-text search of the message database for that word or phrase. This may be the single most-powerful aspect of the entire interface. One of the lessons of the Google-era seems to be a new corollary to Clarke’s Third Law: any sufficiently advanced artificial intelligence is indistinguishable from search. These searches quite often turn up additional messages where the user can improve the results by applying their judgment to marginal cases by labeling them as relevant or irrelevant.

One major lesson from using this interface is that a single classifier is not flexible enough to capture all of the subtleties of a complex legal issue like insider trading. I can imagine dramatically improving on this current interface by adding an additional layer on top of what’s currently there that would allow the user to create multiple different “saved searches” each of which trained an independent classifier and which were composable in some way (for example through interface option that would automatically add the messages matching highly negatively correlated terms from one search to the relevant group of another). The work of Saleema Amershi from Microsoft Research is full of relevant ideas here, especially her ReGroup paper about on-demand group-creation in social networks and her work on interactive concept learning.

Further, building this interface lead me to imagine other uses for it beyond e-discovery. For example, I can imagine the leaders of a large company wanting versions of these saved-search classifiers run against their employees’ communications in real time. Whether as a preventative measure against potential lawsuits, in order to capture internal ‘business intelligence’, or simply out of innate human curiousity it’s difficult to imagine such tools, after they come into existence, not getting used for additional purposes. To extend William Gibson’s famous phrase into a law of corporate IT: the management finds its own uses for things.

This leads me to the next part of the project: making a sci-fi comic telling the story of how it might feel to work in a 2050 law firm that’s been transformed by these e-discovery tools.

The Comic: Sci-Fi Storytelling

When I first presented this project in class, everyone nodded along to the technical parts, easily seeing how machine learning would better solve the practical problem. But the part that really got them was when I told the story of reading and labeling Martin Cuilla’s emails. They were drawn into Cuilla’s story along with me and also intrigued by my experience of unexpected voyeurism.

As I laid out in the beginning of this post, the goal of this project was to use a “Science Fiction Design” process – using the process of prototyping to find the feelings and stories in this new technology and then using a narrative medium to communicate those.

In parallel with the technical prototype, I’ve been working on a short comic to do just this. Since I’m a slow writer of fiction and an even slower comics artist, the comic is still unfinished. I’ve completed a script and I have three pages with finished art, only one of which (shown at the top of this section) I’ve also lettered and completed post-production. In this section, I’ll outline some of the discoveries from the prototype that have translated into the comic, shaping its story and presenting emotional and aesthetic issues for exploration. I’ll also show some in-progress pages to illustrate.

The voyeurism inherent in the supervised learning process is the first example of this. When I experienced it, I knew it was something that could be communicated through a character in my comic story. In fact, it helped create the character: someone who’s isolated, working a job in front of a computer without social interaction, but intrigued by the human stories that filter in through that computer interface, hungering to get drawn into them. This is a character who’s ripe for a mystery, an accidental detective. The finished and lettered page at the top of this section shows some of this in action. It uses actual screenshots of the prototype’s interface as part of a section of the story where the character explains his job and the system he uses to do it.

But where does such a character work? What world surrounds him, in what milieu does e-discovery take place? Well, thinking about the structure of my machine learning prototype, I realized that it was unlikely that current corporate law firms would do this work themselves. Instead, I imagined that this work would be done by the specialized IT firms I already encountered doing it (like Cataphora and Blackstone Discovery).

Firms with IT and machine learning expertise would have an easier time adding legal expertise by hiring a small group of lawyers than law firms would booting up sophisticated technical expertise from scratch. Imagine the sales pitch an IT firm with these services could offer to a big corporate client: “In addition to securely managing your messaging and hosting which we already do, now we can also provide defensive legal services that dramatically lower your costs in case of a lawsuit and reduce or eliminate your dependence on your super-expensive external law firm.” It’s a classic Clayton Christensen-esque case of disruption.

So, instead of large corporate law firms ever fully recovering from their circa–2008 collapse, I imagined that 2050 will see the rise of a new species of firm to replace them: hybrid legal-IT firms with heavy technological expertise in securely hosting large amounts of data and making it discoverable with machine learning. Today’s rooms full of paralegals and first-year associates will be replaced with tomorrow’s handful of sysadmins.

This is where my character works: at a tech company where a handful of people operate enormous data centers, instantly search and categorize entire corporate archives, and generally do the work previously done by thousands of prestigious and high-paid corporate lawyers.

And, as I mentioned in the last section, I don’t imagine that the services provided by such firms will stay limited to legal discovery. To paraphrase Chekov, if in the first act you have created way of surveilling employees, then in the following one you will surveil your own employees. In other words, once tools are built that use machine learning to detect messages that are related to any topic of inquiry, they’ll be used by managers of firms for preemptive prevention of legal issues, to capture internal business intelligence, and, eventually, to spy on their employees for trivial personal and political purposes.

Hence, in my comic’s story twists comes when it turns out that the firm’s client has used their tools inappropriately and when, inevitably, the firm itself is also using them to spy on my main character. While he enjoys his private voyeuristic view into the lives of others, someone else is using the same tools to look into his.

Finally, a brief note about the style of the comic’s art. As you can see from the pages included here, the comic itself includes screenshots of the prototype interface I created early in the process. In addition to acting as background research, the prototype design process also created much more realistic computer interfaces than you’d normally see in fiction.

Another small use of this that I enjoyed was the text of the emails included at the bottom of that finished page. I started with the Enron emails and then altered the text to fit the future world of my story. (See the larger version where you can read the details.) My small tribute to Martin Cuilla and all he did for this project.

The other thing I’ve been experimenting with in the art style is the use of 3D models. In both the exterior of the building and the server room above, I downloaded or made 3D models (the building was created out of a 3D model of a computer fan, which I thought appropriate for a futuristic data center), rendered them as outlines, and then glued them onto my comics pages where I integrated them (through collage and in-drawing) with hand-drawn figures and other images. This was both pragmatic – radically accelerating the drawing of detailed perspective scenes, which would have otherwise been painstaking to create by hand – and designed to make the technology of this future world feel slightly absent and undefined, a blank slate onto which we can project our expectations of the future. After all, isn’t this how sci-fi props and scenery usually acts?

Lawgorithm.com

Last and definitely least, as a lark I put together a website for the fictional firm described in the story (and whose branding adorned the interface prototype). I was quite proud of the domain I manage to secure for this purpose: lawgorithm.com. I also put an unreasonable amount of time into copying and satirizing the self-presentation style I imagined such a firm using: an unholy mashup of the pompous styling of corporate law firm websites like Skadden’s and the Apple-derivative style so common amongst contemporary tech startups. (I finished it off by throwing in a little Lorem Gibson for good measure.)

Despite a few masterpieces, satirical web design is an under-utilized medium. While comedic news sites like The Onion and The Daily Currant do look somewhat like the genre of news sites they skewer, they don’t take their visual mockery nearly as far as their textual mockery.

GestuRe: An Interactive Machine Learning System for Recognizing Hand Gestures

greg — Fri, 11 Oct 2013 14:09:38 +0000

(This post describes a project I completed for MAS.S62 Interactive Machine Learning at the MIT Media Lab. The assignment was to create a simple interactive machine learning system that gave the user visibility into the learning algorithm.)

GestuRe is a mixed-initiative interactive machine learning system for recognizing hand gestures. It attempts to give the user visibility into the classifier’s prediction confidence and control of the conditions under which the system actively requests labeled gestures when its predictions are uncertain.

Training object or gesture recognition systems is often a tedious and uncertain process. The slow loop from gathering images to training a model to testing the classifier separates the process of selecting training samples from the situation in which the system is actually used.

In building such systems, I’ve frequently been frustrated by the inability to add corrective samples at the moment a mistaken classification occurs. I’ve also struggled to feel confident in the state of the classifier at any point in the training process. This system attempts to address both problems, producing a recognition system that is fluid to train and whose reliability can be understood and improved.

GestuRe creates a feature vector from an input image from a live camera using Histogram of Oriented Gradients [Dalal & Triggs 2005]. The user selects a class label for these input images and the system then trains a Support Vector Machine-based classifier on these labeled samples [Vapnik 1995].

The system then displays to the user the prediction likelihood for each submitted class as well as the current classification. Further, the system shows the user all of the training samples captured for each class. Note: everywhere in the interface that the system presents a class to the user, it uses one of the training images to represent that class rather than text. This makes the interface easier to comprehend when the user’s attention is split between its output and their own image in the live video feed.

The user submits labeled samples in two phases. First they establish the classes by submitting a series of images for each distinct gesture they want the system to detect. Then, after initial training, the system begins classifying the live gestures presented by the user and displaying its results. In this phase, whenever the system’s confidence in its top prediction (as represented by the gap in probability between the most likely class and the second most likely) falls below a user-defined threshold, the system prompts the user for additional training samples.

This prompt consists of a modal interface which presents the user with a snapshot of the gesture that caused the low confidence prediction. Alongside this snapshot, the system presents a series of images representing each known class. The user selects the correct class, creating a new labeled sample and the system retrains the classifier, hopefully increasing prediction confidence.

This modal active learning mode places high demands on the user, risking the danger of the user feeling like they’re being treated as an oracle. To alleviate this feeling, GestuRe gives the user a series of parameters to control the conditions under which it prompts them for a labeled sample. First amongst these is a “confidence threshold”, which determines the minimum probability gap between the top two classes that will trigger a request for a label. A lower confidence threshold results in fewer requests but more incorrect predictions. A high threshold results in more persistent requests for labels but the eventual training of a higher quality classifier.

Second, the user can control how long the system will endure low-confidence predictions before requesting a sample. Since prediction probabilities fluctuate rapidly with live video input, the confidence threshold alone would trigger active learning too frequently, even on a well-trained classifier, simply do to the variations in the video input and, especially, the ambiguous states as the user moves their hands between well-defined gestures. The “time before ask” slider allows the user to determine the number of sequential below-threshold predictions before the system will prompt for a labeled sample. The system also displays a progress bar so the user can get a sense for when the system’s predictions are below the confidence threshold and how close its coming to prompting for more labeled samples.

Finally, the system allows the user to turn active training off altogether. This mode is especially useful when adding a new gesture to the system by submitting a batch of samples. Also, it allows the user to experience the current quality of the system without being interrupted for new labels.

GestuRe could be further improved in two ways. First, it would help to show the user a visualization of the histogram of oriented gradients representation that is actually used in classification. This would help them identify noisy scenes, variable hand position, and other factors that were contributing to low classification confidence. Secondly, it would help to identify which classes needed additional clarifying samples. Possibly performing offline cross-validation on the saved samples in the background could help determine if the model had lower accuracy or precision for any particular class.

Finally, I look forward to testing GestuRe on other types of recognition problems beyond hand gestures. In addition to my past work with object recognition mentioned above, during development I discovered that GestuRe can be used to classify facial expressions as well as this video, using an early version of the interface, demonstrates:

Interactive Machine Learning with Funny Faces from Greg Borenstein on Vimeo.

I implemented GestuRe using the OpenCV computer vision library, libsvm, and the Processing creative coding framework. In particular, I used OpenCV for Processing, my own OpenCV wrapper library, as well as PSVM, my libsvm wrapper. (While OpenCV includes implementations of a series of machine-learning algorithms, its SVM implementation is based on an older version of libsvm which performs significantly worse with the same settings and data.)

GestuRe’s source code is available on Github here: atduskgreg/gestuRe.

Case and Molly: A Game Inspired by Neuromancer

greg — Fri, 04 Oct 2013 00:49:55 +0000

Case and Molly First Playtest from Greg Borenstein on Vimeo.

“Case and Molly” is a prototype for a game inspired by William Gibson’s Neuromancer. It’s about the coordination between the virtual and the physical, between “cyberspace” and “meat”.

Neuromancer presents a future containing two aesthetically opposed technological visions. The first is represented by Case, the cyberspace jockey hooked on navigating the abstract information spaces of The Matrix. The second is embodied by Molly, an augmented mercenary who uses physical prowess and weaponized body modifications to hurt people and break-in places.

In order to execute the heists that make up most of Neuromancer’s plot, Case and Molly must work together, coordinating his digital intrusions with her physical breakings-and-enterings. During these heists they are limited to an extremely asymmetrical form of communication. Case can access Molly’s complete sensorium, but can only communicate a single bit of information to her.

On reading Neuromancer today, this dynamic feels all too familiar. We constantly navigate the tension between the physical and the digital in a state of continuous partial attention. We try to walk down the street while sending text messages or looking up GPS directions. We mix focused work with a stream of instant message and social media conversations. We dive into the sudden and remote intimacy of seeing a family member’s face appear on FaceTime or Google Hangout.

(Note: “Case and Molly” is not a commercial project. It is a game design meant to explore a set of interaction ideas. It was produced as a project for an MIT Media Lab class Science Fiction to Science Fabrication. The code is available under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 license for educational purposes. Please do not use it to violate William Gibson’s intellectual property.)

Gameplay

“Case and Molly” uses the mechanics and aesthetics of Neuromancer’s account of cyberspace/meatspace coordination to explore this dynamic. It’s a game for two people: “Case” and “Molly”. Together and under time pressure they must navigate Molly through a physical space using information that is only available to Case. Case can see Molly’s point of view in 3D but can only communicate to her by flipping a single bit: a screen that’s either red or green.

Case is embedded in today’s best equivalent of Gibsonian cyberspace: an Oculus Rift VR unit. He oscillates between seeing Molly’s point of view and playing an abstract geometric puzzle game.

Molly carries today’s version of a mobile “SimStim unit” for broadcasting her point of view and “a readout chipped into her optic nerve”: three smartphones. Two of the phones act as a pair of stereo cameras, streaming her point of view back to Case in 3D. The third phone (not shown here) substitutes for her heads-up display, showing the game clock and a single bit of information from Case.

The game proceeds in alternating turns. During a Molly turn, Case sees Molly’s point of view in 3D, overlaid with a series of turn-by-turn instructions for where she needs to go. He can toggle the color of her “readout” display between red and green by clicking the mouse. He can also hear her voice. Within 30 seconds, Molly attempts to advance as far as possible, prompting Case for a single bit of direction over the voice connection. Before the end of that 30 second period, Molly has to stop at a safe point, prompting Case to type in the number of a room along the way. If time runs out before Case enters a room number, they lose. When Case enters a room number, Molly stays put and they enter a Case turn.

During his turn, Case is thrust into an abstract informational puzzle that stands in for the world of Cyberspace. In this prototype, the puzzle consists of a series of cubes arranged in 3D space. When clicked, each cube blinks a fixed number of times. Case’s task is to sort the cubes by the number of blinks within 60 seconds. He can cycle through them and look around by turning his head. If he completes the puzzle within 60 seconds they return to a Molly turn and continue towards the objective. If not, they lose.

At the top of this post is a video showing a run where Case and Molly make it through a Molly turn and a Case turn before failing on the second Molly turn.

Play Testing and Similarities and Differences from Neuromancer

In play testing the game and prototyping its constituent technology I found ways in which the experience resonated with Gibson’s account and others in which it radically diverged.

One of the strongest resonances was the dissonance between the virtual reality experience and being thrust into someone else’s point of view. In Neuromancer, Gibson describes Case’s first experience of “switching” into Molly’s subjective experience, as broadcast by a newly installed “SimStim” unit:

The abrupt jolt into other flesh. Matrix gone, a wave of sound and color…For a few frightened seconds he fought helplessly to control her body. Then he willed himself into passivity, became the passenger behind her eyes.

This dual description of sensory richness and panicked helplessness closely matches what it feels like to see someone else’s point of view in 3D. In Molly mode, the game takes the view from each of two iPhones aligned into a stereo pair and streams them into each eye of the the Oculus Rift. The resulting 3D illusion is surprisingly effective. When I first got it working, I had a lab mate carry the pair of iPhones around, placing me into different points of view. I found myself gripping the arms of my chair, white-knuckled as he flew the camera over furniture and through obstacles around the room. In conventional VR applications, the Oculus works by head tracking, making the motions of your head control the direction of a pair of cameras within the virtual scene. Losing that control, having your head turned for you, and having your actual head movements do nothing is extremely disorienting.

Gibson also describes the intimacy of this kind of link, as in this exchange where Molly speaks aloud to case while he rides along with her sensorium:

“How you doing, Case?” He heard the words and felt her form them. She slid a hand into her jacket, a fingertip circling a nipple under warm silk. The sensation made him catch his breath. She laughed. But the link was one-way. He had no way to reply.

While it’s not nearly as intimate as touch, the audio that streamed from “Molly”’s phone rig to “Case” in the game provided an echo of this same experience. Since Molly holds the phones closely and moves through a crowded public space, she speaks in a whisper, which stays close in Case’s ears even as she moves ever further away in space.

Even in simpler forms, this Case-Molly coordination can be interesting. Here’s a video from an early prototype where we try to coordinate the selection of a book using only the live camera feed and the single red/green bit.

Case and Molly: First Coordination Prototype from Greg Borenstein on Vimeo.

One major aspect of the experience that diverged from Gibson’s vision is the experience of “cyberspace”. The essence of this classic idea is that visualizing complex data in immersive graphical form makes it easier to navigate. Here’s Gibson’s classic definition:

Cyberspace. A consensual hallucination experienced daily by billions of legitimate operators…A graphic representation of data abstracted from the banks of every computer in the human system. Unthinkable complexity. Lines of light ranged in the non space of the mind, clusters and constellations of data. Like city lights, receding"

Throughout Neuromancer, Gibson emphasizes the fluency achieved by Case and other cyberspace jockeys, the flow state enabled by their spacial navigation of the Matrix. Here’s a passage from the first heist:

He flipped back. His program had reached the fifth gate. He watched as his icebreaker strobed and shifted in front of him, only faintly aware of his hands playing across the deck, making minor adjustments. Translucent planes of color shuffled like a trick deck. Take a card, he thought, any card. The gate blurred past. He laughed. The Sense/Net ice had accepted his entry as a routine transfer from the consortium’s Los Angeles complex. He was inside.

My experience of playing as Case in the game could not have been more opposed to this. Rather than a smooth flow state, the virtual reality interface and the rapid switches to and from Molly’s POV left me cognitively overwhelmed. The first time we successfully completed a Molly turn, I found I couldn’t solve the puzzle because I’d essentially lost the ability to count. Even though I’d designed the puzzle and played it dozens of times in the course of implementing it, I failed because I couldn’t stay focused enough to retain the number of blinks of each cube and where they should fit in the sorting. This effect was way worse than the common distractions of email, Twitter, texts, and IM many of us live with in today’s real computing environments.

Further, using a mouse and a keyboard while wearing a VR helmet is surprisingly challenging itself. Even though I am a very experienced touch-typist and am quite confident using a computer trackpad, I found that when presented with contradictory information about what was in front of me by the VR display, I struggled with basic input tasks like keeping my fingers on the right keys and mousing confidently.

Here you can see a video an early run where I lost the Case puzzle because of these difficulties:

Case and Molly prototype: first Case fail from Greg Borenstein on Vimeo.

Technical Implementation

Lacking an Ono Sendai and a mobile SimStim unit, I built this Case and Molly prototype with a hodgepodge of contemporary technologies. Airbeam Pro was essential for the video streaming. I ran their iOS app on both iPhones which turned each one into an IP camera. I then ran their desktop client which captured the feeds from both cameras and published them to Syphon, an amazingly-useful OSX utility for sharing GPU memory across multiple applications for synced real time graphics. I then used Syphon’s plugin for the Unity3D game engine to display the video feeds inside the game.

I built the game logic for both the Case and Molly modes in Unity using the standard Oculus Rift integration plugin. The only clever element involved was placing the Plane displaying the Syphon texture from each separate camera into its own Layer within Unity so the left and right cameras could be set to not see the overlapping layer from the other eye.

To communicate back from Case to Molly, I used the Websockets-Sharp plugin for Unity to send messages to a Node.js server running on Nodejitsu, the only Node host I could find that supported websockets rather than just socket.io. My Node app then broadcasts JSON with the button state (i.e. whether Case is sending a “red” or “green” message) as well as the game clock to a static web page on a third phone, which Molly also carries.

The code for all of this can be found here: case-and-molly on Github.

Special thanks to Kate Tibbetts for playing Molly to my Case throughout the building of this game and for her endlessly useful feedback.

Introducing 2H2K: Previz for the Second Half of the 21st Century

greg — Thu, 03 Oct 2013 15:46:02 +0000

I’m excited to announce a new collaborative project with my friend, John Powers, called “2H2K”. 2H2K brings together our shared interests in science fiction, urbanism, futurism, cinema, and visual effects into a multimedia art project that imagines what life could be like in the second half of the 21st century.

We’ve structured the project to proceed as if we were developing and doing the pre-production for an imagined sci-fi movie. We’re using fiction, drawing, sculpture, collage, comics, conversation, technical research, speculative design, and interactive technology to explore the cultural and human effects of the big changes coming to our world in the 21st century. As John set out in his introduction to the project, these changes include slowing population growth, mass migration of people into cities, and technological transformations in the value and organization of labor.

John and I have been discussing the project and working on it in preliminary forms for the past six months. This week, John began posting the first of the planned 12 short stories he’s writing to kick us off. They’re organized around each of the twelve months of the year 2050. Read the first three here (along with John’s introductions to them):

In this post, I’ll provide some background on my own interests that lead me to the project, how I came to approach John about it, and some of the areas that I’ve been thinking about so far.

From Star Wars to Jedi

My earliest memory of television comes from when I was three. The image turns out to be part of From Star Wars to Jedi: The Making of a Saga, a television documentary that aired in the manic run-up to the release of Return of the Jedi.

The memory starts with a shot from Empire. It’s a shot of an Imperial Walker in the snow. Everything about the shot looks perfect, just like the movie. The Walker is frozen mid-stride on a snowy plain in front of far away jagged hills under a pale sky of puffy clouds. And then, suddenly, a huge man emerges from beneath the horizon. He’s bigger than the Walker. Much bigger. He dwarfs it. After a moment’s consideration he reaches in to adjust it.

Phil Tippet animating an Imperial Walker

Something about this image hit me hard. In remembering it since two ideas have wrapped themselves around the memory. The first one is about scale. The magical transporting images of these movies were made out of stuff. Small stuff. Bits of real things that were made and manipulated by people. They’d formed these bits into a model of another world that they could look down into and change and work.

The second idea was about people. My god, this was someone’s job! To get in there amongst the stuff and get your hands on it. They got to live in these unfinished worlds with all their raw edges. They got to see them while they were still part of the real world, before the camera came in with its mercilessly abstracting rectangle and hid all the supports and jigs and armatures and mechanisms, leaving just a stump, dead and ready for display.

The man in that image, of course, was Phil Tippet, the brilliant stop motion artist responsible for much of the magic in the original trilogy. His job, it turned out, was called “visual effects”. That moment kicked off a life-long love of visual effects for me – a love not just of the effects the field could achieve, but of that always fragmentary world of objects, of all of those supports and mechanisms, that lay behind it.

More even than visual effects movies themselves, I love the artifacts of them behind-the-scenes and in-progress: plates with partially-rendered creatures, bits of sets bristling with equipment, un-chromakeyed green screen, etc. This is “world building” not just as an act of imagination, but as the palpable pushing of atoms and bits.

Approaching John

With this start (and an early adulthood that included constant reading of science fiction, an art education studying under an unclassifiable post-minimalist painter, and a spate covering urban planning in Portland), I may have been the single perfect reader for John Powers’ essay, Star Wars: A New Heap. The piece hit me like a lightning bolt. It combined the politics of urbanism, the philosophy of minimalist art, and the material texture of Star Wars’ visual effects into a coherent aesthetic and political world view. For me, it provided the unique pleasure only available when the external world conspires to combine multiple of your own seemingly disparate interests in a way that reveals their interrelations. John, via Robert Smithson, had put a name on what drew me to behind-the-scenes images: the power of the “discrete stage”.

When I moved to New York years later, I sought John out and we became friends. Earlier this year I came to him with a proposal for a collaborative project. I’d been thinking about one of the challenges of a discrete stage/behind-the-scenes aesthetic: you need a final artifact to head towards, a show who’s scenes you can be behind.

In searching for a sci-fi story to tell, I’d struck on an image: one of John’s sculptures, scaled up to the size of a building and towering over the New York skyline (as in the collage at the top of this section). Once I imagined it, my head started filling up with questions. Was it a building? If so what had happened to the world that had pushed architecture to such radical extremes? What if it wasn’t a building? What other kind of technological process or cultural entity could have made it and for what purpose? Was it even physical? What if it wasn’t present at all, but some kind of Modernist augmented reality fantasy meant to spice up a post-Thingpunk world?

I didn’t have answers to these questions, but they felt like speculations that could generate stories so I decided to bring the idea to John. I showed him the image and asked him: what would make the world like this?

John had already shown himself to be a handy (re)writer of science fiction with his re-invention of the unfortunate Ridley Scott Alien prequel, Prometheus (re)Bound. And we’d had some really interesting conversations about technology and aesthetics at Robotlife a meetup organized by Joanne McNeil and Molly Steenson in response to the New Aesthetic.

We started the process with a few wide-ranging conversations about the future. We talked about climate change, population growth, the explosion of cities, changes in the lives of artists, the future of technologies like cameras, 3D printing, robotics, and artificial intelligence. As John set off to write stories, I started bringing together background research, design speculation, drawings, and sketches. I found myself imagining the future through sketches for products, imagined Wikipedia articles, and dreams of images made by impossible cameras. As the project continues I’ll be posting artifacts of those in the forms of drawings, 3D models, comics, collages, prototypes, etc.

These artifacts will sometimes illustrate John’s stories, sometimes explore ideas from our conversations that didn’t make it into them directly, and sometimes expand our imagined future beyond them. They’re an attempt to do science fiction with objects, images, text, and code.

Announcing OpenCV for Processing

greg — Wed, 10 Jul 2013 20:09:02 +0000

I’m proud to announce the release of OpenCV for Processing, a computer vision library for Processing based on OpenCV.

You can download it here.

The goal of the library is to make it incredibly easy to get started with computer vision, to make it easy to experiment with the most common computer vision tools, and to make the full power of OpenCV’s API available to more advanced users. OpenCV for Processing is based on the official OpenCV Java bindings. Therefore, in addition to a suite of friendly functions for all the basics, you can also do anything that OpenCV can do.

The library ships with 20+ examples demonstrating its use for everything from face detection:

(Code here: FaceDetection.pde)

to contour finding:

(Code here: FindContours.pde)

to background subtraction:

(Code here: BackgroundSubtraction.pde)

to depth from stereo:

(Code here: DepthFromStereo.pde)

and AR marker detection:

(Code here: MarkerDetection.pde)

So far, OpenCV for Processing has been tested on 64-bit Mac OSX (Lion and Mountain Lion, specifically) as well as 32-bit Windows 7 and 64- and 32-bit Linux (thanks Arturo Castro). Android support hopefully coming soon (pull requests welcome).

It’s already been used in the software for Kinograph, Matt Epler’s DIY film scanner.

OpenCV for Processing was made possible by the generous support of the Processing Foundation and O’Reilly Media. I’ve received invaluable guidance along the way from Dan Shiffman, Andres Colubri, Kyle McDonald, and Golan Levin. A lot of the library’s style was inspired by Kyle McDonald’s excellent ofxCv.

A Book!

While the documentation for OpenCV for Processing may look slim at the moment, I’m working on remedying that in a big way. I’m currently under contract with O’Reilly to write an introduction to computer vision, which will act as comprehensive documentation for OpenCV for Processing as well as a general introduction to the field of computer vision.

I’ve already begun work on the book and I’m really excited about it. It will be available through Atlas, O’Reilly’s new online learning environment. As befits a book about computer vision, it’ll make extensive use of multimedia and interaction. I’m also proud to announce that I’ve worked with O’Reilly to ensure that the book will be Creative Commons licensed from its inception. It will live on Github and accept contributions and corrections from the community. Watch this repo for details.

Why a New OpenCV Library for Processing?

Previously, there have been two OpenCV libraries for Processing, both of them French in origin.

There’s the venerable Ubaa.net library by Atelier Hypermedia. This library was based on OpenCV 1.0 and hasn’t been updated in quite awhile. It never made the jump to Processing 2.0.

There’s also JavacvPro, which is based on JavaCV, a widely used Java wrapper for OpenCV. While I’ve used JavacvPro successfully in projects before, it has a number of shortcomings. It requires its user to build OpenCV from source, which is a major stumbling block, especially for the typical Processing user. OpenCV for Processing, on the other hand, bundles OpenCV so it installs like any other Processing library. While JavacvPro uses a relatively recent version of OpenCV, it is written in an older style, using OpenCV classes that require manual memory management. The result is that JavacvPro leaks memory and has some other erratic runtime behaviors. OpenCV for Processing uses the official Java API, which only provides access to modern memory-managed OpenCV structures. Hence, it benefits from the memory correctness and efficiency of the OpenCV developers (who are much smarter than I could ever hope to be) and doesn’t have (known) memory leaks.

Finally, JavacvPro depends on JavaCV, which slows the rate at which it keeps up with changes in the OpenCV API, and also makes it impossible for end-users to benefit from the huge amount of OpenCV documentation and support available online. Users of OpenCV for Processing can simply open the official OpenCV javadocs and start calling functions.

Caveats and Concerns

UPDATE: This section describes a problem that was present when this library was released in July of 2013. As of Processing 2.0.3 (circa Fall 2013) this problem is fixed and OpenCV for Processing should work fine with any subsequent version of Processing.

OpenCV for Processing is currently at version 0.4. It most certainly has bugs and could use serious improvement. Please find these bugs and tell me about them!

The most significant known problem is, thankfully, a temporary one that most affects users on Macs with Retina displays attempting to process video.

In the official release of Processing 2.0, the Capture and Movie libraries don’t provide access to the pixels[] array in the OpenGL-based renderers (P2D and P3D). This is a temporary stop-gap condition that will be fixed in the next release of Processing (hopefully coming in the next few weeks).

On non-retina machines, you can fix the problem by switching to the JAVA2D renderer. However, that renderer doesn’t work on Retina Macs. If you’re on a Retina Mac, you have two options: you can build Processing from source or download the older 2.0b8 version.

Hopefully all of this will be fixed soon due to Andres’s amazingness and we can forget about it.

Enjoy playing with OpenCV for Processing and be sure to show me what you build!

iPads in Space: Star Trek’s Internet-Free Future

greg — Tue, 14 May 2013 20:23:13 +0000

“A good science fiction story should be able to predict not the automobile but the traffic jam” – Fredrik Pohl

I’ve been watching Star Trek: Deep Space Nine recently for the first time in 20 years. I have vague memories of the pilot from its original airing when I was in middle school, mostly of marveling at its digital effects, which were shocking to see on TV in that era. This time around, however, what hit me was something different: the characters on the show are constantly handing each other iPads.

Here’s a characteristic example, some anonymous yellow shirt giving Worf a report:

The iPads seem to be the method of choice for delivering reports, important documents, and programs.

In another episode, Little Green Men, O’Brien and Bashir give a young Ferengi heading off to Star Fleet Academy a gift: “It’s not just a guidebook! It’s a completely interactive program detailing Earth’s customs, culture, history, geography.”

And this iPad-handing wasn’t just a DS9 phenomenon. It happened across the entire Star Trek franchise at that time. Here’s an example from the 1996 movie Star Trek: First Contact. The overworked Picard’s desk is overflowing with them.

After doing some research, I learned that in the Star Trek universe, these things are called PADDs for Personal Access Display Device. And they’ve actually been around since the original series episode The Man Trap.

Obviously, PADDs are physically very similar to iPads. At first, they seem like another example of Star Trek’s track record as a predictor of futuristic devices. The most famous example of this is the communicators from the original series, which Martin Cooper, inventor of the mobile phone, cited as an inspiration.

But, as I watched the PADDs circulate around the show, I slowly realize that they’re not actually used like iPads at all. In fact, they’re more like fancy pieces of paper. Individual PADDs correspond to specific documents like the Earth guidebook shown above. To give someone a document, people carry PADDs around and then leave them with the new owner of the document.

Further, the existence of PADDs and incredibly powerful computers seems to have in no way transformed the way citizens of the 24th century consume or distribute culture. A Deep Space Nine episode, The Visitor, centers on Jake Sisko’s career as an author. Here’s what his books look like:

If the books are digital documents with digital covers, why do they each have their own piece of hardware? Why don’t individual PADDs store millions of books?

Further, much of the plot of the episode turns on Jake’s success getting published. The publishing industry of the late 24th century seems in no way disrupted or altered by the existence of digital technologies.

From a 2013 point of view, these uses seem completely inside out. Each PADD is bound to an individual document rather than a person or location. This is a universe where its easier to copy physical objects (in a replicator) than digital ones.

After thinking about this a bit I realized the problem: they don’t have the Internet!

During the run of The Next Generation (1987–1994) and Deep Space Nine (1993–1998), the Internet wasn’t part of most people’s lives. The Next Generation averaged 9 or 10 million viewers per season or about 3 times the total number of US Internet users at the time (1.2% of the US population had Internet access in 1991) Hotmail launched on July 4, 1996, two years after The Next Generation went off the air. Google launched in 1998, as Deep Space Nine was winding down. Email, search, and the web itself were only starting to be part of large numbers of peoples lives by the late 90s as DS9 spiraled towards cancellation.

Obviously, the Internet existed before this time and I’d bet a disproportionate number of the writers of TNG and DS9 were on it. But the usage patterns that have emerged with culture-wide adoption weren’t in place yet. And they clearly wen unimagined by Star Trek’s creative team.

The entire communications model on these shows is based on phone calls and radio. Everything is realtime. They have subspace communications, which is basically faster-than-light radio transmission.

Even those vaunted communicators are basically just fancy CB radios. You have to have a live connection to the other side or you can’t send a message, a device used constantly int the plots of individual episodes.

They use live video chat regularly (as the original series did), presumably over subspace.

But they don’t seem to have any forms of remote asynchronous communication or collaboration. They don’t use text messages. Scientists are constantly physically visiting various facilities in order to access their data.

Unlike “subspace communications”, the Internet is not a technique for transmitting information through space, it’s a scheme for organizing its transmission, regardless of medium. As Cory Docotorow has said, the Internet is a “machine for copying”. That’s why the prevalence of these PADDs seems so absurd to a modern eye. In any future that includes the Internet, digital documents will always be more ubiquitous than the physical devices for displaying them. If you have the ability to send the data required to replicate a new PADD to display your document, how much easier must it inevitably be to just send the document where it needed to go in the first place?

In fact, iPads are feasible and desirable exactly because of the patterns of information transmission created by the Internet. We chiefly use them to consume downloaded media, to read from and post to communication networks like Twitter and Facebook, to send and receive email, to browse the web. Without net access, an iPad would be a Newton, a technology whose lifespan coincidentally corresponds almost exactly with the run of TNG and DS9.

Star Trek may have imagined the physical form of the iPad, but they didn’t imagine such a form’s dependence on the much larger and more meaningful change represented by the Internet. Hence, their portrayal of tablet computing ends up looking chiefly decorative in just the way of a lot of science fiction design, reading as “space paper” and “space books” rather than anything truly new.

This brings me back to the topic of my recent post on Thingpunk. The real mistake here, again, is believing that the physical shape of technology is always the futuristic bit, that by predicting the form of devices Star Trek had captured something important about the future. Instead, even in the absence of transporters and replicators, the invisible network that gets those reports and interactive guidebooks onto our PADDs has re-arranged our society in a thousand ways that Star Trek never imagined.

On Thingpunk

greg — Tue, 07 May 2013 15:31:54 +0000

Today, @kellan and I coined a word for the opposite of the New Aesthetic: Thingpunk, the fetishizing of the stubbornly non-digital.

— Greg Borenstein (@atduskgreg) July 24, 2012

The symbols of the divine initially show up at the trash stratum.
— Philip K. Dick

Increasingly, it feels like we live in a kind of Colonial Williamburg for the 20th century. With the introduction of networked digital devices we’ve gone through an epochal technological transformation, but it hasn’t much changed the design of our physical stuff. We’re like historical reenacters hiding our digital watches under bloused sleeves to keep from breaking period. We hide our personal satellite computers in our woodsman’s beards and flannels.

Is this a problem? Is the digital revolution incomplete until it visibly transforms our built environment? Is the form of our physical stuff a meaningful yardstick for progess?

British designer Russell Davies, of Newspaper Club and the Really Interesting Group, thinks so. Back in 2010, Davies wrote a lament for the lack of “futureness” in the physical stuff that populates our lives:

Every hep shop seems to be full of tweeds and leather and carefully authentic bits of restrained artisanal fashion. I think most of Shoreditch would be wandering around in a leather apron if it could. With pipe and beard and rickets. Every new coffee shop and organic foodery seems to be the same. Wood, brushed metal, bits of knackered toys on shelves. And blackboards. Everywhere there’s blackboards.

Davies has an expectation that the physical environment should be futuristic:

Cafes used to be models of the future. Shiny and modern and pushy. Fashion used to be the same – space age fabrics, bizarre concoctions. Trainers used to look like they’d been transported in from another dimension, now they look like they were found in an estate sale.

Davies worries that the Steampunk aesthetic of our physical things (all that brass, blackboard, and leather) is evidence of a fundamental conservatism in our design culture. But in focusing on physical stuff as the primary place to look for signs of the future, he ends up advocating a different, deeper kind of conservatism that I’ve taken to calling “Thingpunk”.

Thingpunk is a deep bias in design thinking that sees physical products and the built environment as the most important venues for design and innovation even as we enter a world that’s increasingly digital. It has roots in the history of design as a discipline over the last 100+ years, the relative stagnation of digital technology in the social media era, and “Tumblr Modernism”: a fetish for Modernist style as it appears in images as divorced from its built and political reality.

“Thingpunk”, as a term, came out of a conversation I had with Kellan Elliott-McCrea last year at Etsy. It is meant to be understood by analogy to Steampunk. Where Steampunk displaces 19th century styling onto 21st century products and spaces, Thingpunk attempts to continue the 20th century obsession with physical objects into a 21st century permeated by digital and network technologies. Thingpunk worries about the design of physical stuff above all else. Even when engaging with digital technologies, Thingpunk is primarily concerned not with their effect on our digital lives, but in how they will transform physical products and the built environment.

Less Than 100% Physical

The biggest technological change in our lifetimes is the rise of networked digital devices. Before the last 30 years or so, for most of us, no part of our daily experience took place through computers or networks. Now, at least some portion does, often quite a significant part. Since our days and our lives didn’t get any longer during that time, this new digital portion of our experience necessarily displaced physical (non-digital) experiences to at least some extent.

The core experience of what’s new about the digital is its non-physicality, the disembodied imaginary space it creates in our minds. This idea dates backs to the origins of Gibsonian cyberspace:

Cyberspace is the “place” where a telephone conversation appears to occur. Not inside your actual phone, the plastic device on your desk. Not inside the other person’s phone, in some other city. The place between the phones.

Invoking “cyberspace” may sound hopelessly old-fashioned. But regardless of that term being rendered retro by 90s overuse, the problem it expresses is still a pressing concern. Just this week Quinn Norton, noted chronicler of “decentralized networked organisms” such as Occupy and Anonymous, vividly described the challenge of writing compellingly of contempoary life:

There is an aesthetic crisis in writing, which is this: how do we write emotionally of scenes involving computers? How do we make concrete, or at least reconstructable in the minds of our readers, the terrible, true passions that cross telephony lines? Right now my field must tackle describing a world where falling in love, going to war and filling out tax forms looks the same; it looks like typing.

The digital non-space of the net didn’t turn out to be a visualization. No cubes of glowing information or 3D avatars. That itself was a fantasy of the continued primacy of the physical. Instead, our lives are shaped by the new aesthetic and personal experiences that actually happen in this digital non-space built of typing: the scrolling micro-updates through which we do both our social grooming and our collective experience of profound events, the emails and Facebook messages through which we conduct our courtships, affairs, and feuds, the alternatingly personal and random images from around the world that stream through our pocket satellite-connected supercomputers.

A common Thingpunk response to articulating this non-physcial quality of digital experience is to refocus on the objects and buildings that make up the physical infrastructure of the net. From Andrew Blum’s Tubes to James Bridle on the architecture of data centers, these accounts tend to have a slightly conspiratorial tone, as if they were revealing a secret truth hidden from us by the deception of our digital experiences. But while we should certainly pay attention to the massive construction projects being driven by the importance of networks, like the $1.5 billion dollar fiber cable through the Arctic, these physical portions of the network are not more real than the mental and inter-personal experiences that happen through them.

And it is exactly those latter experiences that most of today’s designers actually work on, with, and through rather than these physical mega-infrastructures.

Design Turns Like a Steamship

Interactive digital design has only been around for about 30 years and for half that time it was practiced solely by a tiny handful of designers at the few companies with the resources to ship operating systems or boxed software. The real explosion of GUI design as a major fraction of design as a discipline began in the late 90s with the rise of the web and (later) mobile applications.

Fifteen years of thinking about websites can’t overcome the past 100+ years of design as a tradition of thinking about and through physical things, especially when so much of design on the web is what design professionals would condescendingly call “vernacular”, i.e. made by amateurs. The towering figures of pre-digital design, from the Arts and Crafts movement through the Bauhaus to the work of Charles and Ray Eames, still shape design’s critical vocabulary, educational objectives, and work methods.

Where the tools for making websites and mobile apps differ from those for making furniture and appliances the ideas transmitted by this tradition become an increasingly bad match for today’s design work. The malleability of code, the distributed nature of collaboration, and the importance of math are just the first three of the many profound sources of this mismatch. Each of them are key to the craft of digital design and at best completely outside the scope of the pre-digitial tradition.

Further, this design tradition preaches a set of values that’s powerfully at odds with lived digital reality.

Despite their differences, pre-digital design movements are united in the qualities of experience they promise: authenticity, presence, realness, permanence, beauty, depth. These are essentially spiritual virtues that people have hungered after in different forms throughout modern history.

Digital technology is endlessly criticized for failing to provide these virtues, for being artificial, false, disposable, ugly, superficial, and shallow. Ironically, nearly identical arguments were made at the start of the Industrial Revolution against machine-made objects as detached from the human and spiritual virtues of handicrafts, arguments which the discourse of modern design spent much of its history trying to overcome. This historical echo is often audible in the Maker rhetoric around 3D printing and “the Internet of Things”: that they represent a return to something more authentic and personal than the digital. This move is most obviously visible with the Maker obsession with “faires” and hackerspaces, venues for in-person sociability, which is represented as obviously more spiritually nourishing than its remote digital equivalent.

The problem of the persistence of these traditional values is that they prevent us from addressing the most pressing design questions of the digital era:

How can we create these forms of beauty and fulfill this promise of authenticity within the large and growing portions of our lives that are lived digitally? Or, conversely, can we learn to move past these older ideas of value, to embrace the transience and changeability offered by the digital as virtues in themselves?

Thus far, instead of approaching these (extremely difficult) questions directly, traditional design thinking has lead us to avoid them by trying to make our digital things more like physical things (building in artificial scarcity, designing them skeumorphically, etc.) and by treating the digital as a supplemental add-on to primarily physical devices and experiences (the Internet of Things, digital fabrication).

The Great Social Media Technology Stagnation

And meanwhile our digital technologies have stagnated.

While there are a lot of reasons for this stagnation, one I’d like to highlight here is the role of social media. Building a technology that lets technologists and designers feel (and act) like celebrities is dangerously fascinating. Creating Yet Another Social Media startup or web framework will get you a lot of social attention, tens or hundreds of thousands of followers, maybe, which as social creatures we’re addicted to for evolutionary reasons. It’s like the ancient instinct that tells us to eat every fatty and sugary food within reach, which may have been a good plan when we never knew when the tribe would next bring down a buffalo, but doesn’t work as well in the industrial food landscape.

The result of this Junk Food Technology has been that digital technologies, and especially the web, have degraded into an endless series of elaborations on social media, making physical technologies seem more innovative by comparison.

But there are still lots of real hard important things to be done on the web and in digital technologies more generally, many of them arising from the profound design questions mentioned in the last section:

Taking the seemingly endless pile of technological wonders produced by cutting edge computer science research and making them into culture.
Doing more with the super-computer satellite camera sensor platforms we constantly carry with us (more than using them as clients for reading social media).
Figuring out how to teach each other and do new research without digging ourselves under mountains of debt.
Making media that moves people in 30 second chunks when consumed out of context.
Telling emotional stories through the strange lives of bots and pseudonymous twitter writing
Breaking out of our bubbles to find empathy with far-flung people less like us around the world.

It’s by wrestling with these problems (and many others like them) that we’ll define the appropriate values that should drive design in a digital era, not by trying to shoehorn the older era’s values into our new digital venues.

A Fetish for 20th Century Modernism: Do Big Things vs. Fuck Yeah Brutalism

To conclude, I want to return to Davies’ dream of a design futurism that would visibly transform our cafes and neighborhoods.

One of the chief dangers of a futurism that’s centered on the built environment is that it lives in the shadow of 20th Century Modernism, the high church of the religion that changing the visual style of the built environment was inseparable from radical transformations in how we live our lives. Modernism was a project of gigantic scale with huge ambitions from transforming our politico-economic systems to remaking our infrastructure and physical environment. Its legacy is extremely mixed: it changed the way we live substantially in ways that are sometimes quite troubling.

If you are committed to expressing the future through physical things, if you are going to speak in a Modernist language of transforming the built environment, what will your relationship be to that legacy? Do you want to transform the world with huge projects? Or is that ambition just another fetish for a historical style (of raw concrete, shiny metals, and polished glass instead of blackboards and brass)?

An example of the former is Neal Stephenson’s Heiroglyph Project. Stephenson wants to push science fiction authors to tell stories that can inspire the doing of new Modernist-scale dreams. Personally, he wants to build a 2km tall tower to make it cheaper to put things into space.

To this, I say: fuck yeah. We need these big dreams to try to dynamite us out of our incrementalism (in both physical and digital innovation). If Neal and his buddies can do it then I’d love to see them take the scale of Modernist ambition and prove that it can be done without the attendant de-humanizing that lead us to reject Modernism in the 20th Century.

The latter relationship to Modernism, though, is much more common. The design world is full of fetish material for 20th Century Modernism as a lifestyle, especially in interior design and minimalist magazines like Dwell and about a billion Tumblrs.

The worst offender, to my mind, is Fuck Yeah Brutalism, which posts a parade of pictures and drawings of Brutalist architecture (like this drawing of a proposed Seward Park Extension from 1970) and has over 100,000 followers.

This kind of pixel-deep appreciation treats Moderism as a sexy design style that looks pretty on websites, completely divorcing it from its huge, and often extremely troubling, human and political effects.

For three years, I lived across the street from the Riis Houses and the Lillian Wald Houses in Alphabet City, Manhattan:

They are what Brutalist architecture and Modernist planning often became in practice, a vertical filing cabinet for the city’s poorest and least politically powerful populations whose maintenance has been visibly abandoned by the city.

It’s easy to fetishize Brutalist buildings when you don’t have to live in them. On the other hand, when the same Brutalist style is translated into the digital spaces we daily inhabit, it becomes a source of endless whinging. Facebook, for example, is Brutalist social media. It reproduces much the same relationship with its users as the Riis Houses and their ilk do with their residents: focusing on control and integration into the high-level planning scheme rather than individual life and the “ballet of a good blog comment thread”, to paraphrase Jane Jacobs.

The divide between these two ways of adapting Modernism into the digital age, powerfully illustrates the threat of Thingpunk. Its real danger lies in its superficiality, its mistaking of the transformation of surface style for evidence of systemic change.

Thanks to Rune Madsen and Jorge Just for feedback on a draft of this.

Making Photomosaics in Processing

greg — Mon, 14 Jan 2013 00:28:10 +0000

This past Friday, Tom Henderson tweeted me a question:

@atduskgreg Can you think of any computational collage for newbies resources? FWIW I know a bit of ruby and a week of clojure.

— Tom Henderson (@mathpunk) January 12, 2013

Upon further questioning, Tom pointed to some inspirations for the type of thing he wanted to try to make: Sergio Albiac’s collages…

…Lewis Walsh’s typographical collages…

…and Dada collage (like this Hannah Hoch’s example):

Having gotten a sense of what he was going for, I suggested that Processing might be a good place to start, mainly because of how easy it makes it to work with images and how many resources there are out there for learning it. (Though I did suggest checking out Zajal as well since he already knows some Ruby.)

Further, I offered to put together a Processing example of “computational collage” specifically to help. While there are a lot of great texts out there for getting started with Processing (I especially recommend Dan Shiffman’s Learning Processing) it can be really helpful to have an in-depth example that’s approximately in the aesthetic direction in which you’re trying to proceed. While such examples might be a lot more complex and therefore much more difficult to read through, they can demonstrate how someone with more experience might approach the overall problem and also point at a lot of little practical tips and tricks that will come in handy as you proceed.

So, after a bit of thinking about it, I decided to write a Processing sketch that produces photomosaics. A photomosaic reproduces a single large image by combining many other smaller images. The smaller images act as the “pixels” that make up the larger image, their individual colors blending in with their neighbors to produce the overall image.

Here’s an example of the effect, produced by the sketch I ended up creating for Tom:

Check out the larger size to see the individual pictures that go into it.

Here’s another example based on a picture I took of some friend’s faces:

Original size.

For the rest of this post, I’ll walk through the Processing (and bit of Ruby) code I used to create this photomosaic. I’ll explain the overall way it works and point out some of the parts that could be re-usable for other projects of this sort (loading images from a directory, dividing up an image into a grid, finding the average color of an image, etc.). At the end, I’ll suggest some ways I’d proceed if I wanted to produce more work in this aesthetic of “computational collage”.

A Note of Warning

This post is far longer and more detailed than your usual “tutorial”. That is intentional. I wanted to give Tom (and anyone else in a similar position) not just some code he could use to create an effect, but a sense of how I think through a problem like this. And also a solid introduction into some conceptual tools that will be useful to him in doing work in and around this area. I hope that the experience is a little like riding along in my brain as a kind of homunculus – but maybe a little better organized than that. This is exactly the kind of thing that I wished people would do when I was first starting out so I thought I’d give it a shot to see if it’s useful to anyone else.

Overall Plan

Let’s start by talking about the overall plan: how I approached the problem of making a sketch that produced photomosaics. After thinking about how photomosaics work for a little while (and looking at some), I realized the basic plan was going to look something like this:

Download a bunch of images from somewhere to act as the pixels.
Process a source image into a grid, calculating the average brightness of each square.
Go through each square in this grid and find one of the downloaded images that can substitute for it in the photomosaic.
Draw the downloaded images in the right positions and at the right sizes.

In thinking through this plan, I’d made some immediate decisions/assumptions. The biggest one: I knew the photomosaics were going to be black and white and that I’d mainly use black and white images as my downloaded images. This choice radically simplified the process of matching a portion of the original image with the downloaded images – it’s much easier to compare images along a single axis (brightness) than along the three that are necessary to capture color (red, green, blue or hue, saturation, value). Also, aesthetically, most of Tom’s example images were black and white so that seemed like a nice trade-off.

After a first section in which I explain how to use some Ruby code to download a bunch of images, in the rest of the sections, I’ll mainly describe the thinking behind how I approached accomplishing each of the stages in Processing. The goal is to give you an overall sense of the structure and purpose of the code rather than to go through every detail. To complement that, I’ve also posted a heavily-commented version of the photomosaic sketch that walks through all of the implementation details. I highly recommend reading through that as well to get a full understanding. I’ve embedded a gist of that code at the bottom of this post.

Downloading Images

The first step in making a photomosaic is to download all the images that are going to act as our tiles – the individual images that will stand in for the different grays in the original image. So, what we need is a bunch of black and white images with different levels of brightness ranging from pale white to dark black.

By far the easiest way to get these images is to download them from Flickr. Flickr has a great, rich API, which has been around for quite a long time. Hence there are libraries in tons of different languages for accessing its API, searching for images, and downloading them.

Even more conveniently, this is a task I’ve done lots of time before, so I already had my own little Ruby script sitting around that handles the job. Since Tom had mentioned he knew some Ruby this seemed like the perfect solution. You can get my Ruby script here: flickr_downloader.rb. To use this script you’ll have to go through a number of steps to authorize it with Flickr.

Apply for API access
Enter the API key and shared secret they give you in the appropriate place in the flickr_downloader.rb script.

Now you need permission to login as a particular user. This is done using an authentication process called “oauth”. It is surprisingly complicated, especially in the relatively simple case of what we want to do here. For our purposes, we’ll break down oauth into two steps:

Give our code permission to login as us on Flickr.
Capture the resulting token and token secret for reuse later.

This example from the flickraw gem will take you through the process of giving our code permission to log in to flickr: auth.rb. Download it and run it. It will guide you through the process of generating an oauth url, visiting Flickr, and giving permission to your code.

At the end of that process, be sure to capture the token and token secret that script will spit out. Once you’ve got those, go back to our flickr_downloader.rb script and paste them in the appropriate places marked ACCESS_TOKEN and ACCESS_SECRET.

Now the last step is to select a group to download photos from. I simply searched for “black and white flickr group” and picked the first one that came up: Black and White. Once you’ve found a group, grab its group id from the URL. This will look something like “16978849@N00” and it’s what you need for the API to access the group’s images. When you’ve got the group id, stick it in the flickr_downloader.rb script and you’re ready to run it.

Make sure you have a directory called “images” next to the flickr_downloader.rb script – that’s where it wants to put the images it downloads. Start it running and watch the images start coming down.

Process the Source Image into a Grid

Now that we’ve got the images that will populate each of our mosaic’s tiles, the next step is to process the source image to determine which parts of it should be represented by which of our tile images.

When you look at the finished sketch, you’ll see that, there, the code that does this job actually comes at the end. However, in the process of creating the sketch it was actually one of the first things I did – while I was still thinking about exactly the best way to match downloaded images to each part of the source image – and it was a very early version of the sketch that produced the screenshot above. This kind of change is very common when working through a problem like this: you dive into one part because you have an idea for how to proceed regardless of whether that will be the first piece of the code in the final version.

Creating this grid of solid shades of gray consisted of two main components:

Loop through the rows and columns of a grid and copy out just the portion of the original image within each cell.
Pass these sub-images to a function that calculates the average brightness of an image.

First I defined the granularity of the grid: the number of rows and columns I wanted to break the original image up into. Based on this number, I could figure out how big each cell would be: just divide the width and height of the source image by how many cells you wanted to split each side into.

Once I knew those numbers, I could create a nested for-loop that would iterate through every column in every row in the image while keeping track of the x- and y-coordinates of each cell. With this information in-hand I used Processing copy() function to copy the pixels from each cell one-by-one into their own image so that I could calculate their average brightness.

See the drawPhotomosaic() function in the full Processing code below for a detailed description of this.

I implemented a separate function to calculate the average brightness of each of these sub-images. I knew I’d need this function again when processing the downloaded tile candidates. I was going to want to find their brightness as well so I could match them with with these cells. See the aveBrightness() function in the Processing code for the details of how to find the average brightness of an image.

In my original experiments with this, I simply drew a solid rectangle in place of each of these cells. Once I’d calculated the average brightness of that part of the source image, I set fill() to the corresponding color and drew a rectangle with rect() using the x- and y-coordinates I’d just calculated. Later, after I’d figured out how to match the tile images with these brightness colors, it was simple to simply draw the tile images at the same coordinates as these rectangles. The call to rect() simply got substituted for one to image().

Matching Tile Images

In many ways, this is the core of the photomosaic process. In order to replace our original image with the many images we downloaded, we need a way to match each cell in the original image to one of them.

Before settling on the final technique, I experimented with a few different ways of accomplishing this. Each of them had a different aesthetic effect and different performance characteristics (i.e. took a different amount of time to create the photomontage and that time got longer at different rates depending on different attributes).

For example, early on, it occurred to me that the grid of grayscale cells (as shown in the screenshot above) didn’t look very different if I used all 256 possible shades of gray or if I limited it to just 16 shades. This seemed promising because it meant that instead of having to use (and therefore download and process) hundreds of tile images, I could potentially use a much smaller number, i.e. as few as 8 or 16.

So, my first approach was to divide the possible range of 256 grayscale values into large “bins”. To do 16 shades of gray, for example, each bin would cover 16 different adjacent grayscale values. Then, I started loading the source images, checking to see which of these 16 bins they fit into, and moving on if I already had an image in that bin. The goal being to select just 16 images to cover the full range of values in the original image.

However, when actually running this approach, I found that it was surprisingly hard to find images for all of the bins. Most of my tile images had similar brightnesses. So while I’d find five or six of the middle bins immediately, it would process a huge number of images while failing to find the most extreme bins.

I eventually did manage to produce a few photomosaics using this method:

However, I decided to abandon it since it required a really large set of tile images to search through – and then didn’t use 98 percent of them – and also created a distracting visual texture by repeating each tile image over and over (which could be a nice effect in some circumstances).

After trying a few other similar approaches, it eventually occurred to me: instead of starting with a fixed set of grayscale colors I was looking for as my “palette” I should just organize the actual tile images that I had on hand so that I could pick the best one available to match each cell in the source image.

Once I’d had that revelation, things proceeded pretty quickly. I realized that in order to implement this idea, I need to be able to sort all of the tile images based on their brightness. Then I could simply select the right image to match the cell in the source image based on its position, i.e. if I need a full black image, I could grab ones at the front of my sorted list, if I needed ones near full white, I could grab ones at the end, and so forth for everything between. The image I grabbed to correspond to a full black pixel might not be all-black itself (in fact it almost definitely wouldn’t be – who posts all-black images to Flickr?), but it would be the best match I could get given the set of tile images I’d downloaded.

In order to make my tile images sortable, I had to build a class to wrap them. This class would hold both the necessary information to load and display the images (i.e. their paths) as well as their average brightness – calculated using the same aveBrightness() function I’d already written. Then, once I had one of these objects for each of my tile images, I could simply sort them by their brightness score and I’d have everything I needed to select the right image to correspond to each cell in the source image.

The code to accomplish this makes up most of the sketch. See the PixelImage class, the PixelImageComparator class, and most of the setup() function in the full sketch for details. I’ve written lots of comments there walking your through all of the ins and outs.

Once it was in place, my sketch started producing pretty nice photomosaics, like this one based on Tom’s twitter icon:

(View at original size.)

Though I found the result worked especially well with relatively high-contrast source images – like the black and white portrait I posted above or this one below based on an ink drawing of mine. I think this is because the tiles only have a limited range of grays that they cover. Hence, images that depend for their legibility on fine distinctions amongst grays can end up looking a little muddled.

Future Improvements

At this point, I’m pretty happy with how the photomosaic sketch came out. I think its results are aesthetically nice and fit into the “computational collage” category that Tom set out to start with. I also think the code covers a lot of the bases that you’d need in doing almost any kind of work in this general area: loading images from a directory, processing a source image, laying things out in a grid, etc.

That said, there are obvious improvements that could be made as next steps starting from this code:

Use tile pictures that are conceptually related to the source image. To accomplish this I’d probably start by digging more into the Flickr api to make the downloader pick images based on search terms or person tags – or possibly I’d add some OpenCV to detect faces in images…
Vary the size of the individual images in the grid. While the uniformity of the grid is nice for making the image as clear as possible, it would be compositionally more interesting (and more collage-like) to have the size of the images vary more, as Tom’s original references demonstrate. For a more advanced version you could even try breaking up the rectangular shape of each of the source images (Processing’s mask() function would be a good place to start here).
Another obvious place to go would be to add color. To do this you’d need a different measure of similarity between each cell and the tile images. And you’d need one that wouldn’t involve searching through all of the tile images to match each cell. I’d think about extending the sorting technique we’re using in this version. If you figured out a way to translate each color into a single number in some way that was perceptually meaningful, you could use the same sorting technique to find the closest available tile. Or, you could treat the tile images as red, green, and blue pixels and then combine three of them in close proximity (and at appropriate levels of color intensity) to produce the average color of any cell in the image.
One aspect of Tom’s references not covered here is the use of typography. Rune Madsen’s Printing Code syllabus is an amazing resource for a lot of computational design and composition tasks in Processing and his section on typography would be especially useful for working in this direction.
Finally, one way to break off of the grid that structures so much of this code would be to explore Voronoi stippling. This is a technique for converting a grayscale image into a series of dots of different weights to represent the darkness of each region in a natural way, much like a stippled drawing created by hand. Evil Mad Science Laboratories recently wrote an extensive post about their weighted voronoi stippling Processing implementation to create art for their Egg Bot machine. They generously provide Processing code for the technique which would make an interesting and convenient starting point.

Winning the New Aesthetic Death Match

greg — Fri, 01 Jun 2012 03:22:30 +0000

Yesterday I participated in the Flux Factory New Aesthetic Death Match, a lively public debate that the art space hosted. My fellow debaters were Kyle McDonald, Molly Steenson, and Carla Gannis. Molly and Kyle I already knew well, but Carla I hadn’t had the pleasure of meeting until just before the debate last night.

The debate was structured as a kind of 1980s MTV take on the traditional Oxford debating society rules. There were strict timed statement and rebuttal structures and a voted winner at the end. There were also smoke machines and “smack downs”. There was also a surprisingly large audience with something like three times as many people as chairs.

As panelists we were actually quite friendly and so it was, perhaps, good, at least for the audience’s amusement, that the rules were in place to ensure some conflict. The result was a stimulating and lively conversation that actually managed to touch on some of the deeper issues with the New Aesthetic. I was impressed by much of what my fellow panelists said. It’s surpassingly difficult to be coherent and entertaining off the cuff and under a ticking clock.

I’m also proud to say that at the end of the night, I was chosen the winner by audience applause.

It’s impossible to sum up all the points that were made, but I quite liked this trio of tweets by Marius Watz this morning summing things up:

There’s not, as far as I know, video of the event online anywhere. So the best documentation I can provide is my opening statement, which was requested to take up one minute and kicked off the night. I scrawled it in my notebook on my way out to Long Island City and read it over this video (the full text is below):

For the first forty years of their existence we thought of technologies like full text search, image processing, and large scale data analysis as components in a grand project to build an artificial humanlike intelligence.

During this time these technologies did not work very well.

In the last 15 years they’ve started working better. We have Google search, Facebook face detection, and high frequency trading systems.

More and more of our daily lives are lived through computer screens and the network services on them. Hence a huge amount of our visual, emotional, and social experiences take place in the context of these algorithmic artifacts, these digital things interacting with each other a billion times a second. Like the slinky on the treadmill here they take on a kind of life of their own, a life none of their human makers explicitly chose.

Our struggles to understand that life and learn to engage with it in our artistic and design practices is the heart of the New Aesthetic.

This quick statement summarized other things I’ve said at more length here, here, and here.

AI Unbundled

greg — Sun, 15 Apr 2012 20:09:53 +0000

Shaky (1966-1972), Stanford Research Institute’s mobile AI platform, and the Google Street View car. The project of Artificial Intelligence has undergone a radical unbundling. Many of its sub-disciplines such as computer vision, machine learning, and natural language processing have become real technologies that permeate our world. However the overall metaphor of an artificial human-like intelligence has failed. We are currently struggling to replace that metaphor with new ways of understanding these technologies as they are actually deployed.

At the end of the 1966 spring term, Seymour Papert, a professor in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) initiated the Summer Vision Project, “an attempt to use our summer workers effectively in the construction of a significant part of a visual system” for computers. The problems Papert expected his students to overcome included “pattern recognition”, “figure-ground analysis”, “region description”, and “object identification”.

Papert had assigned a group of graduate students the task of solving computer vision as a summer homework assignment. He thought computer vision would make a good summer project because, unlike many other problems in the field of AI, “it can be segmented into sub-problems which allow individuals to work independently”. In other words, unlike “general intelligence”, “machine creativity”, and the other high-level problems in the AI program, computer vision seemed tractable.

Thirty five years later computer vision is a major sub-discipline of computer science with dozens of journals, hundreds of active researchers, and thousands of published papers. It’s a field that’s made substantial breakthroughs, particularly in the last few years. Many of its results are actively deployed in products you encounter every day, from Facebook’s face tagging to the Microsoft Kinect. But I doubt any of today’s researchers would call any of the problems Papert set for his grad students ‘solved’.

Papert and his graduate students were part of an Artificial Intelligence group within CSAIL lead by John McCarthy and Marvin Minsky. McCarthy defined the group’s mission as “getting a computer to do things which, when done by people, are said to involve intelligence”. In practice, they translated this goal into a set of computer science disciplines such as computer vision, natural language processing, machine learning, document search, text analysis, and robotic navigation and manipulation.

Over the last generation, each of these disciplines underwent similar arcs of development as computer vision: slow painstaking progress for decades punctuated by rapid growth sometime in the last twenty years resulting in increasingly practical adoption and acculturation. However, as they developed they showed no tendency to become more like McCarthy and Minsky’s vision of AI. Instead they accumulated conventional human and cultural uses. Shaky became the Google Street View car and begat 9-eyes. The semantic web became Twitter and Facebook and begat @dogsdoingthings. Machine learning became Bayesian spam filtering and begat Flarf poetry.

Now, looking back on them as mature disciplines, there’s little to be seen in these fields of their AI parentage. None of them seems to be on the verge of some Singularitarian breakthrough. Each of them is part of an ongoing historical process of technical and cultural co-evolution. Certainly these fields’ cultural and technological development overlap and relate and there’s a growing sense of them as some kind of new cultural zeitgeist, but, as Bruce Sterling has said, AI feels like “a bad metaphor” for them as a whole. While these technologies had their birth in the AI project, the signature themes of AI — “planning”, “general intelligence”, “machine creativity”, etc. — don’t do much to describe the way we experience them in their daily deployment.

What we need now is a new set of mental models and design procedures that address these technologies as they actually exist. We need a way to think of them as real objects that shape our world (in both its social and inanimate components) rather than as incomplete predecessors to some always-receding AI vision.

We should see Shaky (and its cousin, the SAIL cart, shown here) not as the predecessor not to the Terminator but to Google’s self-driving car.

Rather than personifying these seeing-machines, embodying them as big burly Republican governor-types, we should try to imagine how they’ll change our roads both for blind people like Steve Mahan here as well as for all of the street signs, concrete embankments, orange traffic cones, and overpasses out there.

As I’ve written elsewhere I believe that the New Aesthetic is the rumblings of us beginning to do just this: to think through these new technologies outside of their AI framing with a close attention to their impact on other objects as well as ourselves. Projects like Adam Harvey’s CV Dazzle are replacing the AI understanding of computer vision embodied by the Terminator HUD with one based on the actual internal processes of face detection algorithms.

Rather than trying to imagine how computers will eventually think, we’ve started to examine how they currently compute. The “Clink. Clank. Think.” of the famous Time Magazine cover of IBM’s Thomas Watson is becoming “Sensor. Pixel. Print.”