kinect – Ideas For Dozens

Streaming Kinect skeleton data to the web with Node.js

greg — Thu, 27 Oct 2011 05:01:59 +0000

This past weekend, I had the honor of participating in Art && Code 3D, a conference hosted by Golan Levin of the CMU Studio for Creative Inquiry about DIY 3D sensing. It was, much as Matt Jones predicted, “Woodstock for the robot-readable world”. I gave two talks at the conference, but those aren’t what I want to talk about now (I’ll have a post with a report on those shortly). For the week before the start of the actual conference, Golan invited a group of technologists to come work collaboratively on 3D sensing projects in an intensive atmosphere, “conference as laboratory” as he called it. This group included Diederick Huijbers, Elliot Woods, Joel Gethin Lewis, Josh Blake, James George, Kyle McDonald, Matt Mets, Kyle Machulis, Zach Lieberman, Nick Fox-Gieg, and a few others. It was truly a rockstar lineup and they took on a bunch of hard and interesting projects that have been out there in 3D sensing and made all kinds of impressive progress.

One of the projects this group executed was a system for streaming the depth data from the Kinect to the web in real time. This let as many as a thousand people watch some of the conference talks in a 3D interface rendered in their web browser while they were going on. An anaglyphic option was available for those with red-blue glasses.

I was inspired by this truly epic hack to take a shot at an idea I’ve had for awhile now: streaming the skeleton data from the Kinect to the browser. As you can see from the video at the top of this post, today I got that working. I’ll spend the bulk of this post explaining some of the technical details involved, but first I want to talk about why I’m interested in this problem.

As I’ve learned more and more about the making of Avatar, amongst the many innovations, one struck me most. The majority of the performances for the movie were recorded using a motion capture system. The actors would perform on a nearly empty motion capture stage, just them, the director, and a few technicians. After they had successful takes, the actors left the stage, the motion capture data was edited, and James Cameron, the director, returned. Cameron was then able to play the perfect, edited performances back over-and-over ad infinitum as he chose angles using a tablet device that let him position a virtual camera around the virtual actors. The actors performed without the distractions of a camera on a nearly black box set. The director could work for 18 hours on a single scene without having to worry the actors getting tired or screwing up any takes. The performance of the scene and the rendering of it into shots had been completely decoupled.

I think this decoupling is very promising for future creative filmmaking environments. I can imagine an online collaborative community triangulated between a massively multiplier game, an open source project, and a traditional film crew where some people contribute scripts, some contribute motion capture recorded performances of scenes, others build 3D characters, models, and environments, still others light and frame these for cameras, still others edit and arrange the final result. Together they produce an interlocking network of aesthetic choices and contributions that produce not a single coherent work, but a mesh of creative experiences and outputs. Where current films resemble a giant shrink-wrapped piece of proprietary software, this new world would look more like Github, a constantly shifting graph of contributions and related evolving projects.

The first step towards this networked participatory filmmaking model is an application that allows remote real time motion capture performance. This hack is a prototype of that application. Here’s a diagram of its architecture:

The source for all of the components is available on Github: Skelestreamer. In explaining the architecture, I’ll start from the Kinect and work my way towards the browser.

Kinect, OpenNI, and Processing

The Processing sketch starts by accessing the Kinect depth image and the OpenNI skeleton data using SimpleOpenNI, the excellent library I’m using throughout my book. The sketch waits for the user to calibrate. When they user has calibrated, it begins capturing the position of each of the user’s 14 joints into a custom class designed for the purpose. The sketch then sticks these objects into a queue, which is consumed by a separate thread. This separate thread takes items out of the queue, serializes them to JSON, and sends them to the server over a persistent socket connection that was created at the time the user was calibrated and we began streaming. This background thread and queue is a hedge against the possibility of latency in the streaming process. Right now as I’ve been running everything on one computer, I haven’t seen any latency, the queue nearly always runs empty. I’m curious to see if this level of throughput will continue once the sketch needs to stream to a remote server rather than simply over localhost.

Note, many people have asked about the postdata library my code uses to POST the JSON to the web server. That was an experimental library that was never properly released. It has been superseded by Rune Madsen’s HTTProcessing library. I’d welcome a pull request that got this repo working with that library.

Node.js and Socket.io

The server’s only job is to accept the stream from the Processing sketch and forward it on to any browsers that connect and ask for the data. In theory I thought this would be a perfect job for Node.js and it turned out I was right. This is my first experience with Node and while I’m not sure I’d want to build a conventional CRUD-y web app in it, it was a joy to work with for this kind of socket plumbing. The Node app has two components: one of them listens on a custom port to accept the streaming JSON data from the Processing sketch. The other component accepts connections on port 80 from browsers. These connections are made using Socket.io. Socket.io is a protocol meant to provide a cross-browser socket API on top of the the rapidly evolving state of adoption of the Web Sockets Spec. It includes both a Node library and a client javascript library, both of which speak the Socket.io protocol transparently, making socket communication between browsers and Node almost embarrassingly easy. Once a browser has connected, Node begins streaming the JSON from Processing to it. Node acts like a simple t-connector in a pipe, taking the stream from one place and splitting it out to many.

Three.js

At this point, we’ve got a real time stream of skeleton data arriving in the browser: 45 floats representing the x-, y-, and z-components of 15 joint vectors arriving 30 times a second. In order to display this data I needed a 3D graphics library for javascript. After the Art && Coders’ success with Three.js, I decided to give it a shot myself. I started from a basic Three.js example and was easily able to modify it to create one sphere for each of the 15 joints. I then used the streaming data arriving from Socket.io to update the position of each sphere as appropriate in the Three.js render function. Pointing the camera at the torso joint brought the skeleton into view and I was off to the races. Three.js is extremely rich and I’ve barely scratched the surface here, but it was relatively straightforward to to build this simple application.

Conclusion

In general I’m skeptical of the browser as a platform for rich graphical applications. I think a lot of the time building these kinds of apps in the browser has mainly novelty appeal, adding levels of abstraction that hurt performance and coding clarity without contributing much to the user experience. However, since I explicitly want to explore the possibilities of collaborative social graphics production and animation, the browser seems a natural platform. That said, I’m also excited to experiment with Unity3D as a potential rich client environment for this idea. There’s ample reason to have a diversity of clients for an application like this where different users will have different levels of engagement, skills, comfort, resources, and roles. The streaming architecture demonstrated here will act as a vital glue binding these diverse clients together.

One next step I’m exploring that should be straightforward is the process of sending the stream of joint positions to CouchDB as they pass through Node on the way to the browser. This will automatically make the app into a recorder as well as streaming server. My good friend Chris Anderson was instrumental in helping me get up and running with Node and has been pointing me in the right direction for this Couch integration.

Interested in these ideas? You can help! I’d especially love to work with someone with advanced Three.js skills who can help me figure out things like model importing and rigging. Let’s put some flesh on those skeletons…

Making Things See Available for Early Release

greg — Sat, 08 Oct 2011 19:02:10 +0000

I’m proud to announce that my book, Making Things See: 3D Vision with Kinect, Processing, and Arduino, is now available from O’Reilly. You can buy the book through O’Reilly’s Early Release program here. The Early Release program lets us get the book out to you while O’Reilly’s still editing and designing it and I’m still finishing up the last chapters. If you buy it now, you’ll get the preface and the first two chapters immediately and then you’ll be notified as additional chapters are finished and you’ll be able to download them for free until you have the final book. This way you get the immediate access to the book and I get your early feedback to help me find mistakes and improve it before final publication.

So, what’s in these first two chapters? Chapter One provides an in-depth explanation of how the Kinect works and where it came from. It covers how the Kinect records the distance of the objects and people in front of it using an infrared projector and camera. It also explains the history of the open source efforts that made it possible to work with the Kinect in creative coding environments like Processing. After this technical introduction, the chapter includes interviews with seven artists and technologists who do inspiring work with the Kinect: Kyle McDonald, Robert Hodgin, Elliot Woods, blablablab, Nicolas Burrus, Oliver Kreylos, Alejandro Crawford, and Phil Torrone and Limor Fried of Adafruit. The idea for this section of the book was suggested to me by Zach Lieberman and it’s ended up being one of my favorites. Each one of the people I interviewed had a different set of interests and abilities that lead them to the Kinect and they’ve each used it in a radically different way. From Adafruit’s work initiating the project to create open drivers to Oliver Kreylos’s integration of the Kinect into his cutting edge virtual reality research to Alejandro Crawford’s use of the Kinect to create live visuals for the band MGMT, they each explore a different aspect of the creative possibilities unlocked by this new technology. Their diversity shows just how broad of an impact affordable depth cameras will potentially have going forwards.

Chapter Two begins the real work of learning to make interactive programs with the Kinect. It walks you through installing the SimpleOpenNI library for Processing and then shows you how to use that to access the depth image from the Kinect. We explore all kinds of aspects of the depth image and then use it to create a series of projects ranging from a virtual tape measure to a Minority Report-style app that lets you move photos around by waving your hands. Since the book as a whole is designed to be accessible to beginner programmers (and to help them “level up” to more advanced graphical skills), the examples in this chapter are all covered clearly and thoroughly to make sure that you understand fundamentals like how to loop through the pixels in an image.

I’m looking forward to more chapters coming out in the coming weeks, including the next two on working with point clouds and using the skeleton data. I’m currently working closely with Brian Jepson, my editor at O’Reilly, as well as Dan Shiffman (an ITP professor and the author of the first Kinect library for Processing) and Max Rheiner (an artist and lecturer at Zurich University and the author of SimpleOpenNI) to prepare them for publication. I can’t thank Brian, Dan, and Max enough for their help on this project.

I’m also excited to see what O’Reilly’s design team comes up with for a cover. The one pictured above is temporary. As soon as these new chapters (or the new cover) are available, I’ll announce it here.

Enjoy the book! And please let me know your thoughts and comments so I can improve it during this Early Release period.

Back to Work No Matter What: 10 Things I’ve Learned While Writing a Technical Book for O’Reilly

greg — Sun, 17 Jul 2011 20:05:00 +0000

I’m rapidly approaching the midway point in writing my book. Writing a book is hard. I love to write and am excited about the topic. Some days I wake excited and can barely wait to get to work. I reach my target word count without feeling the effort. But other days it’s a battle to even get started and every paragraph requires a conscious act of will to not stop and check twitter or go for a walk outside. And either way when the day is done the next one still starts from zero with 1500 words to write and none written.

Somewhere in the last month I hit a stride that has given me the beginnings of a sense confidence that I will be able to finish on time and with a text that I am proud of. I’m currently preparing for the digital Early Release of the book which should happen by the end of the month, which is a big landmark that I find both exciting and terrifying. I thought I’d mark the occasion by writing down a little bit of what I’ve learned about the process of writing.

I make no claim that these ten tips will apply to anyone else, but identifying them and trying to stick by them has helped me. And obviously my tips here are somewhat tied in with writing the kind of technical book that I’m working on and would be much less relevant for a novel or other more creative project.

Write everyday. It gets easier and it makes the spreadsheet happy. (I’ve been using a spreadsheet to track my progress and project my completion date based on work done so far.)
Everyday starts as pulling teeth and then goes downhill after 500 words or so. Each 500 words is easier than the last.
Outlining is easier than writing, if you’re stuck outline what comes next.
Writing code is easier than outlining. if you don’t know the structure, write the code.
Making illustrations is easier than writing code. If you don’t know what code to write make illustrations or screen caps from existing code.
Don’t start from a dead stop. read, edit, and refine the previous few paragraphs to get a running start.
If you’re writing sucky sentences, keep going, you can fix them later. Also they’ll get better as you warm up.
When in doubt make sentences shorter. they will be easier to write and read.
Reading good writers makes me write better. This includes writers in radically different genres from my own (DFW) and similar ones (Shiffman).
Give yourself regular positive feedback. I count words as I go to see how much I’ve accomplished.

A note of thanks: throughout this process I’ve found the Back to Work podcast with Merlin Mann and Dan Benjamin to be…I want to say “inspiring”, but that’s exactly the wrong word. What I’ve found useful about the show is how it knocks down the process of working towards your goals from the pedestal of inspiration to the ground level of actually working every day, going from having dreams of writing a book to being a guy who types in a text file five hours a day no matter what. I especially recommend Episode 21: Assistant to the Regional Monkey. and the recent Episode 23: Failure is ALWAYS an Option. The first of those does a great job talking about how every day you have to start from scratch, forgiving yourself when you miss a day and not getting too full of yourself when you have a solid week of productivity. The second one speaks eloquently of the dangers of taking on a big project (like writing a book) as a “side project”. Dan and Merlin talked about the danger of not fully committing to a project like this. For my part I found these two topics to be closely related. I’ve found that a big part of being fully committed to the project is to forgive myself for failures — days I don’t write at all, days I don’t write as much as I want, sections of the book I don’t write as well as I know I could. The commitment has to be a commitment to keep going despite these failures along the way.

And I’m sure I’ll have plenty more of those failures in the second half of writing this book. But I will write it regardless.

Two Kinect talks: Open Source Bridge and ITP Camp

greg — Thu, 30 Jun 2011 07:18:36 +0000

In the last couple of weeks, I’ve given a couple of public presentations about the Kinect. This post will be a collection of relevant links, media, and follow up to those talks. The first talk, last week, was in Portland, Oregon at Open Source Bridge. It was a collaboration with Devin Chalmers, my longtime co-conspirator. We designed out talk to be as much like a circus as possible. We titled it Control Emacs with Your Beard: the All-Singing All-Dancing Intro to Hacking the Kinect.

Devin demonstrates controlling Emacs with his “beard”.

Our first demo was, as promised in our talk title, an app that let you control Emacs with your “beard”. This app included the ability to launch Emacs by putting on a fake beard, to generate all kinds of very impressive looking C code by waving your hands in front of you (demonstrated above), and to quit Emacs by removing your fake beard. Our second app sent your browser tabs to the gladiator arena. It let you spare or execute (close) each one by giving a caesar-esque thumbs up or thumbs down gesture. To get you in the mood for killing it also played a clip from Gladiator each time you executed a tab.

Both of these apps used the Java Robot library to issue key strokes and fire off terminal commands. It’s an incredibly helpful library for controlling any GUI app on your machine. All our code (and Keynote) is available here: github/osb-kinect. Anyone working on assistive tech (or other kinds of alternative input to the computer) with gestural interfaces should get to know Robot well.

In addition to these live demos, we also covered other things you can do with the Kinect like 3D printing. I passed around the Makerbot-printed head of Kevin Kelly that I made at FOO camp:

Kevin Kelly with a tiny 3D printed version of his own head.

We also showed Nicholas Burrus’s Kinect RGB Demo app which does all kinds of neat things like scene reconstruction:

Me making absurd gestures in front of a reconstructed image of the room

Tonight I taught a class at ITP Camp about building gestural interfaces with the Kinect in Processing. It had some overlap with the Open Source Bridge talk. In addition to telling the story of the Kinect’s evolution, I showed some of the details of working with Simple OpenNI’s skeleton API. I wrote two apps based on measuring the distance between the user’s hands. The first one simply displayed the distance between the hands in pixels on the screen. The second one used that distance to scale an image up and down and the location of one of the hands to position that image: a typical Minority Report-style interaction.

The key point was: all you really need to make something interactive in a way that the user can viscerally understand is a single number that tightly corresponds to what you’re doing as the user. With just that ITP-types can make all kinds of cool interactive apps.

The class was full of clever people who asked all kinds of interesting questions and had interesting ideas for ways to apply this stuff. I came away with a bunch of ideas for the book, which is helpful because I’m going to be starting the skeleton tracking chapter soon.

Of course, all of the code for this project is online in the
ITP Camp Kinect repo on Github. That repo includes all of the code I showed as well as a copy of my Keynote presentation.

Into The Matrix: Proposal for a Platform Studies Approach to OpenGL

greg — Mon, 27 Jun 2011 21:31:16 +0000

In the last few years, new media professors Ian Bogost (Georgia Tech) and Nick Montfort (MIT) have set out to advance a new approach to the study of computing. Bogost and Montfort call this approach Platform Studies:

“Platform Studies investigates the relationships between the hardware and software design of computing systems and the creative works produced on those systems.”

The goal of Platform Studies is to close the distance between the thirty thousand foot view of cultural studies and the ant’s eye view of much existing computer history. Scholars from a cultural studies background tend to stay remote from the technical details of computing systems while much computer history tends to get lost in those details, missing the wider interpretative opportunities.

Bogost and Montfort want to launch an approach that’s based “in being technically rigorous and in deeply investigating computing systems in their interactions with creativity, expression, and culture.” They demonstrated this approach themselves with the kickoff book in the Platform Studies series for MIT Press:
Racing the Beam: The Atari Video Computer System. That book starts by introducing the hardware design of the Atari and how it evolved in relationship to the available options at the time. They then construct a comprehensive description of the affordances that this system provided to game designers. The rest of the book is a history of the VCS platform told through a series of close analyses of games and how their creators co-evolved the games’ cultural footprints with their understanding of how to work with, around, and through the Atari’s technical affordances.

Bogost and Montfort have put out the call for additional books in the Platform Studies series. Their topic wish list includes a wide variety of platforms from Unix to the Game Boy to the iPhone. In this post, I would like to propose an addition to this list: OpengGL. In addition to arguing for OpenGL as an important candidate for inclusion in the series, I would also like to present a sketch for what a Platform Studies approach to OpenGL might look like.

According to Wikipedia, OpenGL “is a standard specification defining a cross-language, cross-platform API for writing applications that produce 2D and 3D computer graphics.” This dry description belies the fact that OpenGL has been at the center of the evolution of computer graphics for more than 20 years. It has been the venue for a series of negotiations that have redefined visuality for the digital age.

In the introduction to his seminal study, Techniques of the Observer: On Vision and Modernity in the 19th Century, Jonathan Crary describes the introduction of computer graphics as “a transformation in the nature of visuality probably more profound than the break that separates medieval imagery from Renaissance perspective”. Crary’s study itself tells the story of the transformation of vision enacted by 19th century visual technology and practices. However, he recognized that, as he was writing in the early 1990s, yet another equally significant remodeling of vision was underway towards the “fabricated visual spaces” of computer graphics. Crary described this change as “a sweeping reconfiguration of relations between an observing subject and modes of representation that effectively nullifies most of the culturally established meanings of the term observer and representation.”

I propose that the framework Crary laid out in his analysis of the emergence of modern visual culture can act as a guide in understanding this more recent digital turn. In this proposal, I will summarize Crary’s analysis of the emergence of modern visual culture and try to posit an analogous description of the contemporary digital visual regime of which OpenGL is the foundation. In doing so, I will constantly seek to point out how such a description could be supported by close analysis of OpenGL as a computing platform and to answer the two core questions that Crary poses of any transformation of vision: “What forms or modes are being left behind?” and “What are the elements of continuity that link contemporary imagery with older organizations of the visual?” Due to the nature of OpenGL, this analysis will constantly take technical, visual, and social forms.

As a platform, OpenGL has played stage to two stories that are quintessential to the development of much 21st century computing. It has been the site of a process of industry standardization and it represents an attempt to model the real world in a computational environment. Under close scrutiny, both of these stories reveal themselves to be tales of negotiation between multiple parties and along multiple axes. These stories are enacted on top of OpenGL as what Crary calls the “social surface” that drives changes in vision:

“Whether perception or vision actually change is irrelevant, for they have no autonomous history. What changes are the plural forces and rules composing the field in which perception occurs. And what determines vision at any given historical moment is not some deep structure, economic base, or world view, but rather the functioning of a collective assemblage of disparate parts on a single social surface.”

As the Wikipedia entry emphasized, OpenGL is a platform for industry standardization. It arose out of the late 80s and early 90s when a series of competing companies (notably Silicon Graphics, Sun Microsystems, Hewlett-Packard, and IBM) each brought incompatible 3D hardware systems to market. Each of these systems were accompanied by their own disparate graphics programming APIs that took advantage of the various hardware systems’ different capabilities. Out of a series of competitive stratagems and developments, OpenGL emerged as a standard, backed by Silicon Graphics, the market leader.

The history of its creation and governance was a process of negotiating both these market convolutions and the increasing interdependence of these graphics programming APIs with the hardware on which they executed. An understanding of the forces at play in this history is necessary to comprehend the current compromises represented by OpenGL today and how they shape the contemporary hardware and software industries. Further OpenGL is not a static complete system, but rather is undergoing continuous development and evolution. A comprehensive account of this history would represent the backstory that shapes these developments and help the reader understand the tensions and politics that structure the current discourse about how OpenGL should change in the future, a topic I will return to at the end of this proposal.

The OpenGL software API co-evolved with the specialized graphics hardware that computer vendors introduced to execute it efficiently. These Graphical Processing Units (GPUs) were added to computers to make common graphical programming tasks faster as part of the competition between hardware vendors. In the process the vendors built assumptions and concepts from OpenGL into these specialized graphics cards in order to improve the performance of OpenGL-based applications on their systems. And, simultaneously, the constraints and affordances of this new graphics hardware influenced the development of new OpenGL APIs and software capabilities. Through this process, the GPU evolved to be highly distinct from the existing Central Processing Units (CPUs) on which all modern computing had previously taken place. The GPU became highly tailored to the parallel processing of large matrices of floating point numbers. This is the fundamental computing technique underlying high-level GPU features such as texture mapping, rendering, and coordinate transformations. As GPUs became more performant and added more features they became more and more important to OpenGL programming and the boundary where execution moves between the CPU and the GPU became one of the central features in the OpenGL programming model.

OpenGL is a kind of pidgin language built up between programmers and the computer. It negotiates between the programmers’ mental model of physical space and visuality and the data structures and functional operations which the graphics hardware is tuned to work with. In the course of its evolution it has shaped and transformed both sides of this negotiation. I have pointed to some ways in which computer hardware evolved in the course of OpenGL’s development, but what about the other side of the negotiation? What about cultural representations of space and visuality? In order to answer these questions I need to both articulate the regime of space and vision embedded in OpenGL’s programming model and also to situate that regime in a historical context, to contrast it with earlier modes of visuality. In order to achieve these goals, I’ll begin by summarizing Crary’s account of the emergence of modern visual culture in the 19th century. I believe this account will both provide historical background as well as a vocabulary for describing the OpenGL vision regime itself.

In Techniques of the Observer, Crary describes the transition between the Renaissance regime of vision and the modern one by contrasting the camera obscura with the stereograph. In the Renaissance, Crary argues, the camera obscura was both an actual technical apparatus and a model for “how observation leads to truthful inferences about the world”. By entering into its “chamber”, the camera obscura allowed a viewer to separate himself from the world and view it objectively and completely. But, simultaneously, the flat image formed by the camera obscura was finite and comprehensible. This relation was made possible by the Renaissance regime of “geometrical optics”, where space obeyed well-known rigid rules. By employing these rules, the camera obscura could become, in Crary’s words, an “objective ground of visual truth”, a canvas on which perfect images of the world would necessarily form in obeisance to the universal rules of geometry.

In contrast to this Renaissance mode of vision, the stereograph represented a radically different modern visuality. Unlike the camera obscura’s “geometrical optics”, the stereograph and its fellow 19th century optical devices, were designed to take advantage of the “physiological optics” of the human eye and vision system. Instead of situating their image objectively in a rule-based world, they constructed illusions using eccentricities of the human sensorium itself. Techniques like persistence of vision and stereography manipulate the biology of the human perception system to create an image that only exists within the individual viewer’s eye. For Crary, this change moves visuality from the “objective ground” of the camera obscura to posses a new “mobility and exchangability” within the 19th century individual. Being located within the body, this regime also made vision regulatable and governable by the manipulation and control of that body and Crary spends a significant portion of Techniques of the Observer teasing out the political implications of this change.

But what of the contemporary digital mode of vision? If interactive computer graphics built with OpenGL are the contemporary equivalent of the Renaissance camera obscura or 19th century stereography, what mode of vision do they embody?

OpenGL enacts a simulation of the rational Renaissance perspective within the virtual environment of the computer. The process of producing an image with OpenGL involves generating a mathematical description of the full three dimensional world that you want to depict and then rendering that world into a single image. OpenGL contains within itself both the camera obscura, its image, and the world outside its walls. OpenGL programmers begin by describing objects in the world using geometric terms such as points and shapes in space. They then apply transformations and scaling to this geometry in absolute and relative spatial coordinates. They proceed to annotate these shapes with color, texture, and lighting information. They describe the position of a virtual camera within the three dimensional scene to capture it into a two dimensional image. And finally they animate all of these properties and make them responsive to user interaction.

To extend Crary’s history, where the camera obscura embodied a “geometric optics” and the stereograph a “physiological optics”, OpenGL employs a “symbolic optics”. It produces a rule-based simulation of the Renaissance geometric world, but leaves that simulation inside the virtual realm of the computer, keeping it as matrices of vertices on the GPU rather than presuming it to be the world itself. OpenGL acknowledges its system is a simulation, but we undergo a process of “suspension of simulation” to operate within its rules (both as programmers and as users of games, etc. built on the system). According to Crary, modern vision “encompasses an autonomous perception severed from any system”. OpenGL embodies the Renaissance system and imbues it with new authority. It builds this system’s metaphors and logics into its frameworks. We agree to this suspension because the system enforces the rules of a Renaissance camera obscura-style objective world, but one that is fungible and controllable.

The Matrix is the perfect metaphor for this “symbolic optics”. In addition to being a popular metaphor of a reconfigurable reality that exists virtually within a computer, the matrix is actually the core symbolic representation within OpenGL. OpenGL transmutes our description of objects and their properties into a series of matrices whose values can then be manipulated according to the rules of the simulation. Since OpenGL’s programming models embeds the “geometric optics” of the Renaissance within it, this simulation is not infinitely fungible. It posses a grain towards a set of “realistic” representational results and attempting to go against that grain requires working outside the system’s assumptions. However, the recent history of OpenGL has seen an evolution towards making its system itself programmable, loosening these restrictions by providing programmers ability to reprogram parts of its default pipeline themselves in the form of “shaders”. I’ll return to this topic in more detail at the end of this proposal.

To illustrate these “symbolic optics”, I would conduct a close analysis of various components of the OpenGL programming model in order to examine how they embed Renaissance-style “geometric optics” within OpenGL’s “fabricated visual spaces”. For example, OpenGL’s lighting model with its distinction between ambient, diffuse, and specular forms of light and material properties would bear close analysis. Similarly, I’d look closely at OpenGL’s various mechanisms for representing perspective, from the depth buffer,to its various blending modes and fog implementation. Both of these topics, light and distance, have a rich literature in the history of visuality that would make for a powerful launching point for this analysis of OpenGL.

To conclude this proposal, I want to discuss two topics that look forward to how OpenGL will change in the future both in terms of its ever-widening cultural application and the immediate roadmap for the evolution of the core platform.

Recently, Matt Jones of British design firm Berg London and James Bridle of the Really Interesting Group, have been tracking an aesthetic movement that they’ve been struggling to describe. In his post introducing the idea, The New Aesthetic, Bridle describes this as a “a new aesthetic of the future” based on seeing “the technologies we actually have with a new wonder”. In his piece, Sensor-Vernacular, Jones describes it as “an aesthetic born of the grain of seeing/computation. Of computer-vision, of 3d-printing; of optimised, algorithmic sensor sweeps and compression artefacts. Of LIDAR and laser-speckle. Of the gaze of another nature on ours.”

What both Jones and Bridle are describing is the introduction of a “photographic” trace of the non-digital world into the matrix space of computer graphics. Where previously the geometry represented by OpenGL’s “symbolic optics” was entirely specified by designers and programmers working within its explicit affordances, the invention of 3D scanners and sensors allows for the introduction of geometry that is derived “directly” from the world. The result is imagery that feel made up of OpenGL’s symbols (they are clearly textured three dimensional meshes with lighting) but in a configuration different from what human authors have previously made with these symbols. However these images also feel dramatically distinct from traditional photographic representation as the translation to OpenGL’s symbolic optics is not transparent, but instead reconfigures the image along lines recognizable from games, simulations, special effects, and the other cultural objects previously produced on the OpenGL platform. The “photography effect” that witnessed the transition from the Renaissance mode of vision to the modern becomes a “Kinect effect”.

A full-length platform studies account of OpenGL should include analyses of some of these Sensor-Vernacular images. A particularly good candidate subject for this would be Robert Hodgin’s Body Dysmophic Disorder, a realtime software video project that used the Kinect’s depth image to distort the artist’s own body. Hodgin has discussed the technical implementation of the project in depth and has even put much of the source code for the project online.

Finally, I want to discuss the most recent set of changes to OpenGL as a platform in order to position them within the framework I’ve established here and sketch some ideas of what issues might be in play as they develop.

Much of the OpenGL system as I have referred to it here assumes the use of the “fixed-function pipeline”. The fixed-function pipeline represents the default way in which OpenGL transforms user-specified three dimensional geometry into pixel-based two dimensional images. Until recently, in fact, the fixed-function pipeline was the only rendering route available within OpenGL. However, around 2004, with the introduction of the OpenGL 2.0 specification, OpenGL began to make parts of the rendering pipeline itself programmable. Instead of simply abiding by the logic of simulation embedded in the fixed-function pipeline, programmers began to be able to write special programs, called “shaders”, that manipulated the GPU directly. These programs provided major performance improvements, dramatically widened the range of visual effects that could be achieved, and placed programmers in more direct contact with the highly parallel matrix-oriented architecture of the GPU.

Since their introduction, shaders have gradually transitioned from the edge of the OpenGL universe to its center. New types of shaders, such as geometry and tessellation shaders, have been added that allow programmers to manipulate not just superficial features of the image’s final appearance but to control how the system generates the geometry itself. Further in the most recent versions of the OpenGL standard (versions 4.0 and 4.1) the procedural, non-shader approach, has been removed entirely.

How will this change alter OpenGL’s “symbolic optics”? Will the move towards shaders remove the limits of the fixed-function pipeline that enforced OpenGL’s rule-based simulation logic or will that logic be re-inscribed in this new programming model? Either way how will the move to shaders alter the affordances and restrictions of the OpenGL platform?

To answer these questions, a platform studies approach to OpenGL would have to include an analysis of the shader programming model, how it provides different aesthetic opportunities than the procedural model, how those differences have shaped the work made with OpenGL as well as the programming culture around it. Further, this analysis, which began with a discussion of standards when looking at the emergence of OpenGL would have to return to that topic when looking at the platform’s present prospects and conditions in order to explain how the shader model became central to the OpenGL spec and what that means for the future of the platform as a whole.

That concludes my proposal for a platform studies approach to OpenGL. I’d be curious to hear from people more experienced in both OpenGL and Platform Studies as to what they think of this approach. And if anyone wants to collaborate in taking on this project, I’d be glad to discuss it.

On the Future and Poetry of the Calibration Pose

greg — Tue, 07 Jun 2011 23:24:41 +0000

Interesting stuff here from Tom Armitage on a subject that’s been much on my mind lately: Waving at the Machines.

How does a robot-readable world change human behaviour?

[…]

How long before, rather than waving, or shaking hands, we greet each other with a calibration pose?

Which may sound absurd, but consider a business meeting of the future:

I go to your office to meet you. I enter the boardroom, great you with the T-shaped pose: as well as saying hello to you, I’m saying hello to the various depth-cameras on the ceiling that’ll track me in 3D space. That lets me control my Powerpoint 2014 presentation on your computer/projector with motion and gesture controls. It probably also lets one of your corporate psychologists watch my body language as we discuss deals, watching for nerves, tension. It might also take a 3D recording of me to play back to colleagues unable to make the meeting. Your calibration pose isn’t strictly necessary for the machine – you’ve probably identified yourself to it before I arrive – so it just serves as formal politeness for me.

Nice little piece of gesture recognition sci-fi/design fiction here looking at how the knowledge that we’re surrounded by depth sensors and pose recognition systems may alter human behavior and custom.

I’m fascinated by (and deeply share) people’s fixation on the calibration pose. It comes up over and over again as people have their first exposure to the Kinect.

The use of this particular pose to calibrate gesture recognition systems seems to have originated in security procedure where it’s known as the “submission pose”, but in the academic computer science literature it tends to get referred to by the much drier “Psi pose”.

On the one hand this calibration pose is comforting because it represents a definable moment of interaction with the sensor system. Instead of simply being tracked invisibly it gives us the illusion that our submission to that kind of tracking must be conscious — that if we don’t assume the calibration pose then we can’t be tracked.

On the other hand, we find the pose disturbing because it brings the Kinect’s military and security heritage to the surface. The only other times we stand in the submissive pose are while we’re passing through security checkpoints at airports or the like or, even more vividly, when we’re being held at gunpoint. Intellectually we may know that the core technology of the Kinect came from military and security research funding in the last decade’s war on terror. When the Kinect first launched, Matt Webb captured this reality vividly in a tweet:

“WW2 and ballistics gave us digital computers. Cold War decentralisation gave us the Internet. Terrorism and mass surveillance: Kinect.”

However, it’s one thing to know abstractly about this intellectual provenance and it’s another thing to have to undergo a physical activity whose origins are so obviously in violent dominance rituals every time we want to play a game or develop a new clever hack.

I think that it’s the simultaneous co-existence of these two feelings, the oscillation between them, that makes the existence of the calibration pose so fascinating for people. We can’t quite keep them in our minds at the same time. In the world we know they should be parts of two very different spheres hence their simultaneous co-existence must be a sign of some significant change in the world, a tickle telling us our model of things needs updating.

Technically speaking, the necessity of the pose is already rapidly fading. It turns out that pose tracking software can record a data sample for a single person and then use that to obviate the need for future subjects to actually perform the pose themselves. This works so long as those people are of relatively similar body types to the person who performed the orientation.

I wonder if the use of the calibration pose will fade to the point where it becomes retro, included only by nostalgic programmers who that want to create that old 11-bit flavor of early depth cameras in their apps. Will we eventually learn to accommodate ourselves to a world where we’re invisibly tracked and take it for granted. Will the pose fall away in favor of new metaphors and protocols that are native to the new interface world slowly coming into existence?

Or, conversely, maybe we’ll keep calibration around because of its value as a social signifier like the host in Armitage’s story who goes through the calibration pose as part of a greeting ritual even though it’s not necessary for tracking. Will it sink into custom and protocol because of its semantic value in a way that preserves it even after it loses its technical utility?

Either way, it’s a post-Kinect world from here on in.

Using the Kinect to Assess Involuntary Motion at Health 2.0

greg — Thu, 24 Feb 2011 06:03:03 +0000

This past weekend I participated in a Health 2.0 developer challenge in Boston. The event was a one day hack-a-thon hosted by Health 2.0 and O’Reilly at a Microsoft building in Cambridge. I was invited to give a brief presentation about using the Kinect for motion tracking and then to help out any groups that wanted to use the Kinect in their project. At the end of the day, the projects were judged by a panel of experts and a group was named the winner.

More about the competition in a minute, but first here were the slides from my presentation:

Body Tracking with the Kinect

View more presentations from atduskgreg.

I concluded the presentation by demonstrating the skeleton tracking application I blogged about recently.

After the presentations, the participants self-assembled into groups. One of the groups was talking about trying to build an early warning system for seizures. They asked me to talk to them about using the Kinect in their project. Unfortunately, the application seemed difficult to impossible with the Kinect. If you pointed a Kinect at a person at their desk all day long, how would you differentiate a seizure from a routine event like them dropping their pen and suddenly bending down to pick it up.

We had a brief discussion about the limitations and capabilities of the Kinect and they pondered the possibilities. After some brainstorming, the group ended up talking to a fellow conference participant who is doctoral candidate in psychiatry and works in a clinic, Dan Karlin. Dan explained to us that there are neuromuscular diseases that lead to involuntary muscle motion. These diseases include Parkinson’s but also many which occur as the side effects from psychiatric drugs.

Dan told us about a standard test for evaluating these kinds of involuntary motion: the Abnormal Involuntary Motion Scale (AIMS). To conduct an AIMS exam, the patient is instructed to sit down with his hands between his knees and stay as still as possible. The doctor then observes the patient’s involuntary motions. The amount and type of motion (particularly whether or not it is symmetrical) can then act as a tracking indicator for the progression of various neuromuscular disorders.

In practice, this test is underperformed. For many patients on psychiatric drugs, their doctors are supposed to perform the test at every meeting , but many doctors do not do so and even for those who do, the normal long delays between patient visits makes the data gathered from these tests of limited utility.

Doctor Dan (as we’d begun calling him by this point) said that an automated test that the patient could perform at home would be a great improvement to doctors’ abilities to track the progression of these kinds of diseases, potentially leading to better drug choices and the avoidance of advanced permanent debilitating side effects.

On hearing this idea, I thought it would make a perfect application for the Kinect. The AIMS test involves a patient sitting in a known posture. Scoring it only requires tracking the movement of four joints in space (the knees and the hands). The basic idea is extremely simple: add up the amount of total motion those four joints make in all three dimensions and then score that total against a pre-computed expectation or a weighted-average of the patients’ prior scores. To me, having already gotten started with the Kinect skeleton tracking data, it seemed like a perfect amount of work to achieve in a single day: record the movement of the four joints over a fixed amount of time, display the results.

So we set out to build it. I joined the team and we split into two groups. Johnny Hujol and I worked on the Processing sketch, Claus Becker and Greg Kust worked on the presentation, and Doctor Dan consulted with both teams.

A few hours later and we had a working prototype. Our app tracked the joints for ten seconds, displayed a score to the user along with a progress bar, and graded the score red, yellow, or green, based on constant values determined in testing by Doctor Dan.

Here’s a look at the app running with me demonstrating the amount of motion necessary to achieve a yellow score:

Here’s an example of a more dramatic tremor, which pushes the score into red:

And a “normal” or steady example is here: Triangle Tremor Assesment Score Green

The UI is obviously very primitive, but the application was enough to prove the soundness of the idea of using the Kinect to automate this kind of psychiatric test. You can see the code for this app here: tremor_assessment.pde.

Here’s the presentation we gave at the end of the day for the judges. It explains the medical details a lot better than I have in this post

Kinect Abnormal Movement Assessment System Presentation at Health 2.0 Boston

View more presentations from atduskgreg.

And, lo and behold, we won the competition! Our group will continue to develop the app going forward and will travel to San Diego in March to present at Health 2.0’s Spring conference.

I found this whole experience to be extremely inspiring. I met enthusiastic domain experts who knew about an important problem I’d never heard of that was a great match for a new technology I’d just learned how to use. They knew just how to take what the technology could do and shape it into something that could really make a difference. They have energy and enthusiasm for advancing the project and making it real.

For my part, I was able to use the new skills I’ve acquired at ITP to shape the design goals of the project into something that could be achieved with the resources available. I was able to quickly iterate from a barely-there prototype to something that really addressed the problem at hand with very few detours into unnecessary technical dead ends. I was able to work closely with a group of people with radically different levels of technical expertise but deep experience in their own domains in order to build something together that none of us could have made alone.

I’ll post more details about the project as it progresses and as we prepare for Health 2.0.