A Menagerie of CV Markers

This week, the net’s been exploding with responses to James Bridle’s work on the New Aesthetic. Bruce Sterling set the fuse for this particular conflagration with his Essay on the New Aesthetic in Wired.

My own response, published in The Creator’s Project on Friday, was called What It’s Like to be a 21st Century Thing. I tried to put NA in the context of Object-Oriented Ontology arguing that NA “consists of visual artifacts we make to help us imagine the inner lives of our digital objects and also of the visual representations produced by our digital objects as a kind of pigeon language between their inaccessible inner lives and ours.” This is an approach I’m excited about and plan to flesh out more here soon.

Today, though, I want to engage in a bit of OOO ontography and close-looking as a way of responding to what I thought was one of the more interesting takes on Sterling’s essay.

In his post Why the New Aesthetic isn’t about 8bit retro, the Robot Readable World, computer vision and pirates, Rev Dan Catt tries to address the 8-bit quality of much New Aesthetic visual work. Specifically, he’s trying to answer a criticism of NA as retro, a throwback to “the colors and 8 bit graphics of the 80s” as Tom Coates put it.

For Catt that resemblance comes from the primitive state of computer vision today. “Computer vision isn’t very advanced, to exist with machines in the real world we need to mark up the world to help them see”, he says. In other words, the current limitations of computer vision algorithms require intentionally designed bold blocky 8-bit graphics for them to function. And therefore the markers we design to meet this requirement end up looking like primitive computer graphics, which resulted from similar technical limitations in the systems that produced them. As Catt says, “put another way, current computer vision can probably ‘see’ computer graphics from around 20–30 years ago.”

In a conversation about this idea, Kyle McDonald argued that Catt’s taking the comparison too far. While there is a functional comparison between the current state of computer vision and the state of computer graphics in the 80s, the actual markers we’re using in CV work today don’t much resemble 8-bit graphics aesthetically.

To explore this idea, Kyle and I decided to put together a collection of as many different kinds of markers as we could think of along with links to the algorithms and processes that create and track them (though I’m sure there are many we’ve missed – more contributions are welcome in the comments). It was our hope that such a collection might widen the New Aesthetic visual vocabulary by adding additional ingredients as well as focusing some attention on the actual computational techniques used to create and track these images. Since so many of us were raised looking at 8-bit video games and graphics I think it quite helps to look at the actual markers themselves in their surprising variety rather than just filing them away with Pitfall Harry’s rope, Mario’s mushroom, and Donkey Kong’s barrel, which we already know so well.

So, what do real CV markers actually look like? Browse the images and links below to see for yourself, but I’ll make a few quick general characterizations. There is a lot of high contrast black and white as well as stark geometry that emphasizes edges. However the grid that characterizes 8-bit images and games is nearly never kept fully in tact. Most of the marker designs are specifically trying to defeat repetition in favor of identifying a few specific features. Curves and circles are nearly as common as squares and grids.

I’d love to collect more technical links about the tracking techniques associated with each of these kinds of markers. So jump in with the comments if you’ve got suggestions.

figcaption {
display: none;
}

OpenCV calibration checker pattern for homography

opencv checkerboard
opencv checkerboard

Reactivision

reactivision
reactivision

(Original paper: Improved Topological Fiducial Tracking in the reacTIVision System)

Graphtracker

Graphtracker
Graphtracker

(Original paper: Graphtracker: A topology projection invariant optical tracker)

Rune Tags

Rune Tags
Rune tags

(Original paper: RUNE-Tag: a High Accuracy Fiducial Marker with Strong Occlusion Resilience)

Corner detection for calibration

corner detection
corner detection

Dot tracking markers

dot trackers
dot trackers

Traditional bar codes

bar codes
bar codes

Stacked bar code

stacked bar code
stacked bar code

Data Matrix 2D

Data Matrix 2D
Data Matrix 2D

Text EZCode

Text EZCode
Text EZCode

Data Glyphs

Data Glyphs
Data Glyphs

QR codes

qr code
qr code

Custom QR codes

custom qr codes
custom qr codes
custom qr rabbit
custom qr rabbit

Microsoft tags aka High Capacity Color Barcodes

microsoft tags
microsoft tags

Maxi Codes

Maxi Codes
Maxi Codes

Short Codes

Short codes
Short codes

Different flavors of Fiducial Markers

fiducial markers
fiducial markers
fiducial marker
fiducial marker
Ftag
Ftag
fiducial
fiducial

9-Point Landmark

9-point landmark
9-point landmark

Cantags

Cantag
Cantag

AR tracking marker for After Effects

AR tracking marker
AR tracking marker

ARTag markers

ar toolkit tracking markers
ar toolkit tracking markers

Retro-reflective motion capture markers

motion capture markers
motion capture markers

Hybrid marker approaches

hybrid
hybrid
Posted in Opinion | Leave a comment

Machine Pareidolia: Hello Little Fella Meets FaceTracker

In a recent post on the BERG blog, Gardens and Zoos, Matt Jones explored a series of ideas for designing personality and life into technology products. One of the most compelling of these takes advantage of pareidolia, the natural human inclination to see faces everywhere around us.


Jones’s slide introducing pareidolia.

Jones advocates designing faces into new technology products as a way of making them more approachable, using pareidolia to give products personality and humanize them without climbing all the way down into the Uncanny Valley. He even runs a Flickr group collecting images of pareidolia-inducing objects: Hello Little Fella!

Lately I’ve been thinking a lot about faces. I’ve had mine scanned and turned it into a digital puppet. I’ve been working extensively with face tracking, building a series of experiments and prototypes with Kyle McDonald’s ofxFaceTracker, an OpenFrameworks frontend to Jason Saradigh’s excellent FaceTracker project. Most publicly so far, I demonstrated that FaceTracker can track hand-drawn faces.

Using FaceTracker OSC to draw in Processing

Accessing FaceTracker data in Processing.

Facial recognition techniques give computers their own flavor of pareidolia. In addition to responding to actual human faces, facial recognition systems, just like the human vision system, sometimes produce false positives, latching onto some set of features in the image as matching their model of a face. Rather than the millions of years of evolution that shapes human vision, their pareidolia is based on the details of their algorithms and the vicissitudes of the training data they’ve been exposed to.

Their pareidolia is different from ours. Different things trigger it.

Face In The Window

Face in the Window. FaceTracker seeing a face in a window at CMU’s Studio for Creative Inquiry during Art && Code.

After reading Jones’s post, I came up with an experiment designed to explore this difference. I decided to run all of the images from the Hello Little Fella Flickr group through FaceTracker and record the result. These images induce pareidolia in us, but would they do the same to the machine?

Using the Flickr API, I pulled down 681 images from the group. I whipped up an OpenFrameworks app that loaded each image and passed it to FaceTracker for detection, saving an image of the resulting face if it was detected. The result was that FaceTracker detected a face in 50 of the images, or about 7%.

When I looked through the results I found that they broke down into three different categories in terms of how the face detected by the software related to the face that a person would see in the photo: agreement, near agreement, and totally other. Each of these categories reveals a different possible relationship between the human vision system and the software vision system. Significantly I also found that I had a different emotional reaction to each of these types of results. I think the spectrum of possibilities outlined by these three categories is one we’re going to see a lot as we find ourselves surrounded by more and more designed objects that are embedded with computer vision. At the end of this post I’ll share some ideas about the repercussions this might have for the design of the Robot-Readable World, both for the robots themselves and the things we create for them to look at.

But first a little more about each of the categories.

Agreement

Agreement happens when the face tracking system detects exactly the part of the scene that originally induced pareidolia in the photographer, inspiring them to take the photo in the first place. In many ways these are the most satisfying results. They give you the confirming feeling that YES it saw just what I saw. Here are some results that show Agreement:

450

320

281

This one is rather good. I hadn’t really even been able to see the face in this cookie until the app showed it to me.

508

I think this one is especially exciting because there’s an inductive implication that it could see all of these:

201

50

One major ingredient of Agreement seems to be a clearly defined boundary around the prospective face’s features. I discovered something similar when experimenting with getting FaceTracker to see hand-drawn faces.

Near Agreement

The next category is Near Agreement. Near Agreement takes place when some — but not all — facial features the algorithm picks out match those a human eye would see.

For example, here’s a case where it sees the same eyes as I do, but we disagree about the nose and mouth.

28

I see the black hole there as the mouth of the little fella. The algorithm sees that as his nose and the shift in the reflection below that as the mouth.

When these kinds of Near Agreements occur I find myself going through a quick series of emotions. Excitement: it sees it! Let down: oh, but that’s not quite it. Empathy: you were so close; just a little to left, I see where you went wrong…

662

Got the mouth right, but the eyes were just a little too far out of reach:

633

The back of this truck I actually find quite compelling. I think the original photographer was thinking of arrows at the top as the eyes and the circular extrusion as the border of the face. But now, having seen the face that the algorithm detected, I can actually see that face more clearly than the one I think the photographer intended.

468

369

181

Totally Other

This last category is the one I find the most fascinating. Sometimes FaceTracker would detect a face in a part of the image totally separate from the face the image was intended to capture. Something in that portion of the image, which frequently looked like an undifferentiated portion of some surface, or a bit of seemingly meaningless detail, triggered the system’s pattern for a face.

These elicit the most complex emotional response of all. It starts off with “huh?”, a sense of mystification about what the algorithm could be responding to. Then there’s a kind of aesthetic of the glitch. “Oh it’s a screw up, how funny and slightly troubling”. But then finally, the more of these I saw, the more the effect started to feel truly other: like a coherent, but alien idea of what faces were. It made me wonder what I was missing. “What is it seeing there?” It’s a feeling akin to having a conversation with someone who’s gradually losing interest in what you’re saying and starting to scan the room over your shoulder.

445

438

436

29

You can see the rest of the 50 photos in my Machine Pareidolia set on Flickr.

So what can we learn from these results? Let’s return to Mr. Jones for a moment. He explained his interest in human pareidolia thusly:

One of the prime materials we work with as interaction designers is human perception. We try to design things that work to take advantage of its particular capabilities and peculiarities.

As designers of the Robot-Readable World we need to have a similar sense of the capabilities and peculiarities of this new computational perception. Hopefully this experiment can give us some sense of the texture of that perception, an idea of how much of its circle overlaps with ours in the venn diagram of vision systems and how the non-overlapping parts look and behave.

Human-machine venn diagram

Posted in Art | 8 Comments

26 Books in 2011

Last year, I read 43 books, a relatively high annual total for me. This was largely due to spending so much time that year working on a stop-motion animated music video which lead to a huge amount of audio book listening. This year, I read much less. The two main factors in this falloff were my busy last semester at ITP and the fact that I spent much of the second half of the year writing a book. The total for this year came out to 26 books. Plus an eight additional comics, an area I’ve started dabbling in due to the influence of Matt Jones and Jack Schulze of BERG London who I had the pleasure to meet this year.

Looking at the list, the topics of this year’s books much resemble the list from last year with sci-fi and special effects behind-the-scenes making up the lionshare. Of these, I wanted to specially point out The Gone-Away World by Nick Harkaway, which I just finished recently. It’s a great weird mix of post-apocalyptic sci-fi, coming-of-age college novel, and Tarrantino-esque madcap kung-fu. But somehow darker and more moving than that description makes it sound. There are also a few tech/business history books: the Steve Jobs bio, Steven Levy on Google, The Toyota Way, and The Gun by CJ Chivers, which is an excellent history of the AK-47 and one of the best books on design I’ve ever read.

Here are the comics I read this year (I would link to these too, but, weirdly enough, I have no clue of the best place to acquire them online having, amazingly, actually bought nearly all of them from in-person “stores” such as Forbidden Planet and St. Marks Comics.):

  • SVK by Warren Ellis
  • Invincible Iron Man: The Five Nightmares by Matt Fraction
  • Invincible Iron Man: Extremis by Warren Ellis
  • Transmetropolitan Vol 1 by Warren Ellis
  • Planetary Vol 1 by Warren Ellis
  • The Punisher: Born by Garth Ennis
  • The Punisher MAX, Vol 1 by Garth Ennis
  • Usagi Yojimbo Book 2: Samurai by Stan Sakai
Posted in Opinion | Leave a comment

A Personal Fabrication Nightmare

Just received the following story from my friend Devin Chalmers. I asked for his permission to publish it because I think it is telling and disturbingly likely to come true.

I had a personal fabrication nightmare last night. I’d just gotten off a roller coaster, and at the photo booth where you can get commemorative prints of your shit-your-pants face they had just gotten a whole 3D printing/lasercutter workflow set up. I was overwhelmed by the choices of materials and patterns: the sample book was like 40 pages long. They could do steins, shot glasses, brass plaques, 3D and 2.5D scene reconstructions, six different sorts of wood, marquetry, choices of how to define figure and ground—it was all very confusing. I came back after an hour to let the crowd die down and I still couldn’t decide what the best way to physicalize my roller coaster adventure would be. I awoke still anxious.

Posted in Opinion | Leave a comment

Announcing ofxaddons.com, a directory of OpenFrameworks extensions

At Art && Code 3D a few weeks back I met James George. We immediately found we had a lot in common, kicking off a wide-ranging conversation about everything from miniature worlds to Portland food carts to ways of making the OpenFrameworks community more accessible. On this last topic, we even conceived a project: an website that searches Github for OpenFrameworks addons written by the community and indexes them for easier discovery. Today, I’m proud to announce the launch of exactly that site: ofxaddons.com.

The site features nearly 300 addons that we’ve divided into 13 categories: Animation, Bridges, Computer Vision, Graphics, GUI, Hardware Interface, iOS, Physics, Sound, Typography, Utilities, Video/Camera, and Web/Networking. We’ve also put together a how-to guide on creating your own addons. That guide includes standards for how to structure an addon so it is easy to install and will work smoothly for all users of OpenFrameworks. It’s based on the emerging standards coming out of the community of addon authors.

While categorizing them, James and I came across a bunch of really remarkable addons. In the rest of this post, I want to highlight a few of the addons that most struck us.

ofxGrabCam

ofxGrabCam by Elliot Woods provides an intuitive interactive camera for 3D apps. It was inspired by the camera in Google Sketchup: it uses the z-buffer to automatically select the object that’s under your mouse when you click as the center of your translations and rotations. Here’s a video Elliot made showing it in action:

And here’s Elliot’s full write-up. Rumor on the street is that this might make it into OF core in a future version, so check it out now.

ofxGifEncoder and ofxGifDecoder

Both by Jesus Gollonet, this pair of libraries lets you create and parse animated GIFs. ofxGifEncoder does the creating and ofxGifDecoder does the parsing. You can create GIFs programmatically to look however you want. The animated GIF above shows an awesome glitch I achieved recently while screwing up some pixel math on one of the sample OF videos.

FUGIFs is an app that use ofxGifEncoder to automatically turn video files into animated GIFs. Sounds like it was made by a frustrated designer of animated flash banners. Useful.

ofxGts

ofxGts is an addon from Karl D.D. Willis that wraps the Gnu Triangulated Surface Library, a useful set of tools for dealing with 3D surfaces. GTS can add vertices to meshes to make them smoother (as shown in the horse model illustrated above), it can simplify models, it can decompose models into triangle strips, etc., etc.

Karl’s version of the addon seems to have some compatibility issues with OF 007 so James put together a fork that fixes those: obviousjim/ofxGts. Merge that pull request Karl!

ofxKyonyu: Kinect Breast Enlarger

This addon by novogrammer was too absurd not to share. It seems (the site (and most of the documentation/code comments) is in Japanese) to use the Kinect to enlarge the breasts of people it detects. I’m sure this will get reused in tons of projects.

ofxSoftKeyboard

ofxSoftKeyboard

Here’s a great addon that could have a lot of application in accessibility and kiosk work: ofxSoftKeyboard by Lensley. This addon provides an onscreen software keyboard that generates key events when the user clicks (or taps, etc.) on a key. It works well and they’ve already accepted James’ pull request updating it to full OF 007 compatibility!

ofxUeye

Last, but not least, we’ve got this addon which provides an interface to the GigE uEye SE, a small form-factor Gigabit Ethernet camera that looks really useful. It’s windows only at the moment so we haven’t been able to actually run it, but it seems quite well put together.

That’s just a sampling of all of the great addons that are available. If you browse around the site for just a few minutes I’ll bet you’ll be amazed at what you find. In fact, I bet, like me, you’ll immediately think of three projects ideas just seeing what kinds of cool things are possible.

Posted in Opinion | Tagged | Leave a comment

Streaming Kinect skeleton data to the web with Node.js

This past weekend, I had the honor of participating in Art && Code 3D, a conference hosted by Golan Levin of the CMU Studio for Creative Inquiry about DIY 3D sensing. It was, much as Matt Jones predicted, “Woodstock for the robot-readable world”. I gave two talks at the conference, but those aren’t what I want to talk about now (I’ll have a post with a report on those shortly). For the week before the start of the actual conference, Golan invited a group of technologists to come work collaboratively on 3D sensing projects in an intensive atmosphere, “conference as laboratory” as he called it. This group included Diederick Huijbers, Elliot Woods, Joel Gethin Lewis, Josh Blake, James George, Kyle McDonald, Matt Mets, Kyle Machulis, Zach Lieberman, Nick Fox-Gieg, and a few others. It was truly a rockstar lineup and they took on a bunch of hard and interesting projects that have been out there in 3D sensing and made all kinds of impressive progress.

One of the projects this group executed was a system for streaming the depth data from the Kinect to the web in real time. This let as many as a thousand people watch some of the conference talks in a 3D interface rendered in their web browser while they were going on. An anaglyphic option was available for those with red-blue glasses.

I was inspired by this truly epic hack to take a shot at an idea I’ve had for awhile now: streaming the skeleton data from the Kinect to the browser. As you can see from the video at the top of this post, today I got that working. I’ll spend the bulk of this post explaining some of the technical details involved, but first I want to talk about why I’m interested in this problem.

As I’ve learned more and more about the making of Avatar, amongst the many innovations, one struck me most. The majority of the performances for the movie were recorded using a motion capture system. The actors would perform on a nearly empty motion capture stage, just them, the director, and a few technicians. After they had successful takes, the actors left the stage, the motion capture data was edited, and James Cameron, the director, returned. Cameron was then able to play the perfect, edited performances back over-and-over ad infinitum as he chose angles using a tablet device that let him position a virtual camera around the virtual actors. The actors performed without the distractions of a camera on a nearly black box set. The director could work for 18 hours on a single scene without having to worry the actors getting tired or screwing up any takes. The performance of the scene and the rendering of it into shots had been completely decoupled.

I think this decoupling is very promising for future creative filmmaking environments. I can imagine an online collaborative community triangulated between a massively multiplier game, an open source project, and a traditional film crew where some people contribute scripts, some contribute motion capture recorded performances of scenes, others build 3D characters, models, and environments, still others light and frame these for cameras, still others edit and arrange the final result. Together they produce an interlocking network of aesthetic choices and contributions that produce not a single coherent work, but a mesh of creative experiences and outputs. Where current films resemble a giant shrink-wrapped piece of proprietary software, this new world would look more like Github, a constantly shifting graph of contributions and related evolving projects.

The first step towards this networked participatory filmmaking model is an application that allows remote real time motion capture performance. This hack is a prototype of that application. Here’s a diagram of its architecture:

Skelestreamer architecture diagram

The source for all of the components is available on Github: Skelestreamer. In explaining the architecture, I’ll start from the Kinect and work my way towards the browser.

Kinect, OpenNI, and Processing

The Processing sketch starts by accessing the Kinect depth image and the OpenNI skeleton data using SimpleOpenNI, the excellent library I’m using throughout my book. The sketch waits for the user to calibrate. When they user has calibrated, it begins capturing the position of each of the user’s 14 joints into a custom class designed for the purpose. The sketch then sticks these objects into a queue, which is consumed by a separate thread. This separate thread takes items out of the queue, serializes them to JSON, and sends them to the server over a persistent socket connection that was created at the time the user was calibrated and we began streaming. This background thread and queue is a hedge against the possibility of latency in the streaming process. Right now as I’ve been running everything on one computer, I haven’t seen any latency, the queue nearly always runs empty. I’m curious to see if this level of throughput will continue once the sketch needs to stream to a remote server rather than simply over localhost.

Note, many people have asked about the postdata library my code uses to POST the JSON to the web server. That was an experimental library that was never properly released. It has been superseded by Rune Madsen’s HTTProcessing library. I’d welcome a pull request that got this repo working with that library.

Node.js and Socket.io

The server’s only job is to accept the stream from the Processing sketch and forward it on to any browsers that connect and ask for the data. In theory I thought this would be a perfect job for Node.js and it turned out I was right. This is my first experience with Node and while I’m not sure I’d want to build a conventional CRUD-y web app in it, it was a joy to work with for this kind of socket plumbing. The Node app has two components: one of them listens on a custom port to accept the streaming JSON data from the Processing sketch. The other component accepts connections on port 80 from browsers. These connections are made using Socket.io. Socket.io is a protocol meant to provide a cross-browser socket API on top of the the rapidly evolving state of adoption of the Web Sockets Spec. It includes both a Node library and a client javascript library, both of which speak the Socket.io protocol transparently, making socket communication between browsers and Node almost embarrassingly easy. Once a browser has connected, Node begins streaming the JSON from Processing to it. Node acts like a simple t-connector in a pipe, taking the stream from one place and splitting it out to many.

Three.js

At this point, we’ve got a real time stream of skeleton data arriving in the browser: 45 floats representing the x-, y-, and z-components of 15 joint vectors arriving 30 times a second. In order to display this data I needed a 3D graphics library for javascript. After the Art && Coders’ success with Three.js, I decided to give it a shot myself. I started from a basic Three.js example and was easily able to modify it to create one sphere for each of the 15 joints. I then used the streaming data arriving from Socket.io to update the position of each sphere as appropriate in the Three.js render function. Pointing the camera at the torso joint brought the skeleton into view and I was off to the races. Three.js is extremely rich and I’ve barely scratched the surface here, but it was relatively straightforward to to build this simple application.

Conclusion

In general I’m skeptical of the browser as a platform for rich graphical applications. I think a lot of the time building these kinds of apps in the browser has mainly novelty appeal, adding levels of abstraction that hurt performance and coding clarity without contributing much to the user experience. However, since I explicitly want to explore the possibilities of collaborative social graphics production and animation, the browser seems a natural platform. That said, I’m also excited to experiment with Unity3D as a potential rich client environment for this idea. There’s ample reason to have a diversity of clients for an application like this where different users will have different levels of engagement, skills, comfort, resources, and roles. The streaming architecture demonstrated here will act as a vital glue binding these diverse clients together.

One next step I’m exploring that should be straightforward is the process of sending the stream of joint positions to CouchDB as they pass through Node on the way to the browser. This will automatically make the app into a recorder as well as streaming server. My good friend Chris Anderson was instrumental in helping me get up and running with Node and has been pointing me in the right direction for this Couch integration.

Interested in these ideas? You can help! I’d especially love to work with someone with advanced Three.js skills who can help me figure out things like model importing and rigging. Let’s put some flesh on those skeletons…

Posted in kinect | 15 Comments

Making Things See Available for Early Release

I’m proud to announce that my book, Making Things See: 3D Vision with Kinect, Processing, and Arduino, is now available from O’Reilly. You can buy the book through O’Reilly’s Early Release program here. The Early Release program lets us get the book out to you while O’Reilly’s still editing and designing it and I’m still finishing up the last chapters. If you buy it now, you’ll get the preface and the first two chapters immediately and then you’ll be notified as additional chapters are finished and you’ll be able to download them for free until you have the final book. This way you get the immediate access to the book and I get your early feedback to help me find mistakes and improve it before final publication.

So, what’s in these first two chapters? Chapter One provides an in-depth explanation of how the Kinect works and where it came from. It covers how the Kinect records the distance of the objects and people in front of it using an infrared projector and camera. It also explains the history of the open source efforts that made it possible to work with the Kinect in creative coding environments like Processing. After this technical introduction, the chapter includes interviews with seven artists and technologists who do inspiring work with the Kinect: Kyle McDonald, Robert Hodgin, Elliot Woods, blablablab, Nicolas Burrus, Oliver Kreylos, Alejandro Crawford, and Phil Torrone and Limor Fried of Adafruit. The idea for this section of the book was suggested to me by Zach Lieberman and it’s ended up being one of my favorites. Each one of the people I interviewed had a different set of interests and abilities that lead them to the Kinect and they’ve each used it in a radically different way. From Adafruit’s work initiating the project to create open drivers to Oliver Kreylos’s integration of the Kinect into his cutting edge virtual reality research to Alejandro Crawford’s use of the Kinect to create live visuals for the band MGMT, they each explore a different aspect of the creative possibilities unlocked by this new technology. Their diversity shows just how broad of an impact affordable depth cameras will potentially have going forwards.

Chapter Two begins the real work of learning to make interactive programs with the Kinect. It walks you through installing the SimpleOpenNI library for Processing and then shows you how to use that to access the depth image from the Kinect. We explore all kinds of aspects of the depth image and then use it to create a series of projects ranging from a virtual tape measure to a Minority Report-style app that lets you move photos around by waving your hands. Since the book as a whole is designed to be accessible to beginner programmers (and to help them “level up” to more advanced graphical skills), the examples in this chapter are all covered clearly and thoroughly to make sure that you understand fundamentals like how to loop through the pixels in an image.

I’m looking forward to more chapters coming out in the coming weeks, including the next two on working with point clouds and using the skeleton data. I’m currently working closely with Brian Jepson, my editor at O’Reilly, as well as Dan Shiffman (an ITP professor and the author of the first Kinect library for Processing) and Max Rheiner (an artist and lecturer at Zurich University and the author of SimpleOpenNI) to prepare them for publication. I can’t thank Brian, Dan, and Max enough for their help on this project.

I’m also excited to see what O’Reilly’s design team comes up with for a cover. The one pictured above is temporary. As soon as these new chapters (or the new cover) are available, I’ll announce it here.

Enjoy the book! And please let me know your thoughts and comments so I can improve it during this Early Release period.

Posted in kinect | 7 Comments

Techniques of the Observer

Last night at ITP’s Theory Club (a group that meets bi-weekly to discourse on abstract topics of interest), I gave a presentation on Jonathan Crary’s Techniques of the Observer. I called the talk Techniques of the Observer: Vision and Technology from the Camera Obscura to OpenGL. It was based on one portion of the proposal for a Platform Studies book on OpenGL I wrote over the summer. In Techniques of the Observer, Crary proposes a technique for characterizing a historical period’s ideas about vision by looking at its optical technologies and the metaphors they embody. The Camera Obscura tells you a lot about the Renaissance’s objective and universal geometric world view. Stereographs, phenakistoscopes, and film, all from the Modern era, couldn’t be more different from the Camera Obscura: they build the image inside the user’s mind using tricks of perception, hacks of the user’s sensorium. These resonate with a Modern world view of a series of independent subjectivities bound together into a consensual democracy.

In that earlier blog post and in this talk I set out to extend this way of thinking to cover contemporary computer-generated imagery. For the last 20 or so years, our most contemporary images have been the product of computer simulations designed to emulate an objective Renaissance perspective, but convert it into something fungible enough to become interactive and, when we want it, fantastical. And now, right now, we’re beginning to connect a new set of powerful artificial eyes to this simulation. We’re introducing something like the Reality Effect but to an inhuman mind’s eye. I think this combination explains some of the new Sensor Vernacular aesthetic that many of us have been struggling to put our fingers on. It is comprised of the first works of a new regime of vision struggling to be born.

I’ve uploaded my slides to Speaker Deck, a great new service that actually makes the process of uploading and viewing slide decks online simple and pleasurable. Here they are:

Posted in Opinion | Leave a comment

Announcing Drift

I’m proud to announce the launch of Drift, a text editor for iPad I’ve made with Devin Chalmers. Drift is a simple text editor that stores your documents as GitHub Gists so that they’re always backed up and easily shared. Drift also makes it simple to collaborate with other GitHub users on anything from a TODO list to an essay. You can search for gists created by other Github users, create your own copy with your own changes, and then share a link to your version. Plus since all gists are git repos, you can always browse your history of changes to see and even restore old versions. Or you can use the app completely anonymously without ever creating a Github account and Drift will remember all of your gist URLs for you. We hope that its great for active Github users and simple for everyone.

Drift is available in the App Store now for $1.99. We’d love to hear your feedback on it.

I built the original prototype for a desktop version of Drift using MacRuby back in the summer of 2009. Then, last summer, Devin and I starting talking about reviving the project. In a startling short time, Devin had built an iPad version and we’d won an honorable mention at iPadDevCamp 2010. Around the start of this year we began working on cleaning Drift up for submission to the App Store. We commissioned a logo from the excellent Rune Madsen and we did a few rounds of UI polishing. Over the summer we deemed Drift ready to go and submitted it to the App Store.

It was rejected. Repeatedly. Through five rounds of resubmission over more than two months. Over the course of these months we exchanged extensive phone calls and repeated emails with Apple’s App Store appeals board to try to discover why they were rejecting Drift. We eventually understood their full reasoning and were able to alter our app to gain approval. What we learned in the process will be relevant to many other app developers, especially those interested in building on top of existing web service APIs. Devin has written up a great comprehensive post explaining the situation: Stuck in the Middle with Users: Apple, Apps, Appeals, & Appeasement: the Story of Drift. I highly recommend you read the whole thing, but here’s the gist:

App Store guideline 11.13 requires that any purchases initiated from within an app go through Apple’s in-app purchasing mechanism. Apps are forbidden from linking out to an external website for the user to make a “purchases or subscription”. Drift was rejected because Apple interpreted our links to Github.com as an upsell to the paying Github service. This is true even though a free Github account is perfectly adequate for all of the API features used in Drift.

And here’s the disturbing take away message for other app developers: any link out to an external site that has a pay service can potentially be rejected under the rubric of 11.13. This could prevent the creation of a lot of apps. Mashing up APIs and building clever clients is at the heart of contemporary programming culture. As Devin puts it:

One of the great App Store gold rushes was Twitter clients. What makes our undertaking, a Gist client, different from theirs? Well, largely it’s that GitHub has a business model, while Twitter doesn’t. Think of that: if Twitter charged a buck a month for their service, instead of aggregating your sentiments to sell to OmniCom and co. to turn into failed viral marketing campaigns, Loren Brichter might still be quietly churning out the most polished iPhone apps in the world; people might meet Craig Hockenberry and exclaim “oh, the Icon Factory! It’s so cool to meet a graphic designer at WWDC.”

Posted in Permanent Maintenance | 3 Comments

Today

Today.

Today I saw for the first time a video (embedding disabled) that demonstrates a new technique that uses Functional Magnetic Resonance Imaging to detect the brain activity of a person and from that reproduce the visual image that person is seeing in real time. If you watch the video, the images on the left represent what the person was seeing, the images on the right represent what the system was able to reconstruct from the live brain scan. This could realistically be expected to work with dreams as well. Technology that (it must accurately be said) reads minds. (You’ll note that our brains really like faces.)

Today CERN announced that it may have detected a particle moving faster than the speed of light. CERN’s OPERA particle detector (Oscillation Project with Emulsion-tRacking Apparatus) in the Italian Alps moved a neutrino at faster than the speed of light. If the discovery is verified it would be direct evidence that contradicts Einstein’s Special Theory of Relativity and would throw much of the understanding of the universe painstakingly built by 20th century physics into doubt.

Today I spent most of my day printing out plastic objects using a small 3D printer that sits on my desk at school. I created many of these objects using a cheap toy 3D scanner.

Frequently, in my field, I have experiences that feel “futuristic”. New gadgets, gizmos and experiences that come my way that will one day be ubiquitous. But today was different. Everything that happened felt part of some new world. Not alien bits protruding in, but a whole new fabric. Still struggling mightily to make sense, even to itself, but having little relationship to the 20th century other than as history.

Posted in Opinion | Leave a comment