Stephen Ramsay    Home

The Mythical Man-Finger

A few weeks ago, I wrote a post called life on the command line, in which I noted that I have almost completely stopped using graphical tools on my computer. Since it was a blog post, I went further and made the observation that the command line is:

faster, easier to understand, easier to integrate, more scalable, more portable, more sustainable, more consistent, and many, many times more flexible than even the most well-thought-out graphical apps.

I also suggested that this is a “wonderful way to work” — and implied that most people would find their computational lives immeasurably improved if they would switch. I tried to explain that in most respects, I am an ordinary user (in that ninety percent of what I do on the computer is the same as what everyone else does).

The comments that followed were mostly as I imagined they’d be. Most began by offering some appreciation for what I was trying to do, but in the end, accused me of gravely misunderstanding peoples’ relationships with computers. Aimée Morrison wrote:

I’ve just read (yet) a(nother) book on design, and you, my friend, are the elite user. The expert. The most hardcore of the hardcore. I might be (well, I am) an explore, the next category of user. Together, your people and my people make up a tiny fraction of all people. […] The vast majority of people just want things to work, and to not harass them with too many options or require too much learning. I admire that kind of pragmatism, actually.

Nathan Kelber:

Most people see computers like appliances. You might use an oven to bake a cake or a toaster to make toast. You use a computer to check email or write documents (not files!). They want a toaster that toasts to the right color of brown. They don’t care if it could make waffles, because they want to make toast. The easier that is, the better. This is the philosophy of Steve jobs: “It just works” or it works “automatically.” Why would you want to know how it works? It’s doing that work so you don’t have to waste time doing it yourself.

This is the common liturgy of user interface design. “Most people” want it to “just work.”

I don’t disagree with that at all. But there’s a logical leap being made here that I think is dead wrong — namely, that graphical user interfaces, as such, are in a better position to make things “just work” than textual interfaces of the sort typified by the command line.

The gui revolution entailed a radical shift in the “haptic” characteristics of human-computer interaction. Before the gui, you interacted with a given application by learning its textual language. Electronic card catalogs worked this way, accounting applications worked this way, games worked this way. The application would have a domain-specific language you needed to learn in order to use it, and so you would learn that “language.” This nearly always meant learning some vocabulary and some syntax, though it usually didn’t amount to anything like programming (let alone learning a natural language). At the same time, it often inherited some of the features of both coding and natural language — namely, the ability to combine things in novel ways in order to accomplish some specialized task. Of course, there were bad textual languages and good ones, “intuitive” interfaces and non-intuitive interfaces, just as there are today.

The gui was, of course, about graphical applications, but it was more fundamentally about the mouse. The mouse essentially replaced language with the index finger. If you wanted to do something, you pointed at it. To this day, there are some applications that are hard to imagine under any other regime. If I want my cell phone to tell me the weather, I tap my index finger on the icon with the cloud on it, and up pops the weather. It just works.

This is, without any doubt, one of the most elegant ideas in the history of computing. Most of us, on encountering it for the first time, understood its power immediately. In fact, I remember the very first time I saw one. The person demoing it for me kept saying, “Look, all you do is point and click!”

But of course, very few applications on a computer system are so simple. If I want to a listen to a song, I presumably just click on it. But the song is in my music folder, which is in my home directory, which is on a drive labeled “Computer.” Easy enough, I suppose.

But really, not very easy at all. In order to do that, I have to already understand a “language” with a quite involved syntax. Folders are “nested,” and they reside on “drives” (which are like folders, but different). Folders will remain open until you close them. And the language of folders has different dialects — the way you move through folders when you are using a menu is different from the language you use when you click on them directly. Sometimes you go down, but sometimes sideways. It’s all just point and click. It’s all just point and click. And some sliding. And some selecting.

Now, we could design a system that allows you to type:

play "Teenage Dream"

and it would go find that song and play it.

What is most interesting about such a system, however, is that it almost immediately suggests several fairly trivial improvements. For example, we could give “play” some additional commands that would make it easy to do common things:

play last

(would play the last song “play” played)

play random

(would choose a random song among a list of music files on the system)

In fact, dozens suggest themselves:

play repeat

play 10 random

play softer

play random country


Of course, the invocation of the “play” command might open some kind of environment in which you can just say things directly:

repeat last, softer

10 random country, then "Teenage Dream"


show artist

I’m typing off the top of my head, here, but it seems to me that a system like this is simpler in every way (many, many unix commands work more-or-less like this, though that is not really my point here). There is a language — and that language has to be learned — but it’s able to leverage our ordinary experience of language in ways that make things considerably less unintuitive for the uninitiated.

The power of this representation, though, goes much further. More difficult things can build upon the general pattern that evolves from the syntax:

create playlist "country plus katy" from last 11

show playlists

Harder, but are we really ramping up the difficulty considerably over what we can achieve with a gui? The gui version of that last command isn’t hard, but it’s…fussier. It no longer “just works.” It’s pointing, and clicking, and pointing, and clicking. And it’s probably not quite like the other operations. The language version adds a couple of flourishes to the general pattern, but does so in a way that is easily learned (and can now be used in dozens of syntactically similar contexts).

Now, perhaps you’ve gotten this far and you have a dozen objections to what I’m saying. “Now I have to learn all these commands!” is one, though I would point out that you have to learn all these clicking and selection patterns (often with meta keys) to do it with the gui. “Yeah, but really complicated things are going to require searching through manuals.” Maybe, but you already do that in the gui, and very often you have to do that to accomplish some of the simpler tasks above. “This is just dull looking!” A matter of taste, perhaps, though nothing about my system precludes graphics. “show album cover” might display the album cover in all its glory. “show lava-lamp” might amuse you for a few moments.

But here’s one thing that I think is a truly nonsensical objection: “Systems like this are for power users.” How? Why?

Ever play Pictionary? Pictionary is a brilliant game. You get a card that describes some concept — say, “return address.” You have to communicate that idea to other people, but here’s the thing: you can only use your index finger (extended, McLuhan-style, with a pencil). Before long, people are laughing. “That’s not an envelope!” “Yes! Look, that’s a house, and that’s a letter, and that’s an arrow!” Much laughter ensues.

In the real world, we’d say, “I’m thinking of a return envelope.”

I realize the analogy is a bit strained, but my point is simply this: the idea that language is for power users and pictures and index fingers are for those poor besotted fools who just want toast in the morning is an extremely retrograde idea from which we should strive to emancipate ourselves.

The problem with the user categorization narrative, is that it uses “it just works” as a cover for saying “people aren’t capable” — for implying that millions of natural language speakers would be too intimidated by languages thousands of orders of magnitude smaller than the ones they use effortlessly every single day. It imagines that if we can find the right user-interface “metaphor,” everything will click.

I think such metaphors are rare with computing. The much-vaunted mouse is often celebrated as an “intuitive” device, despite the fact that not one single object in the entire natural world behaves the same way. The blur of nonsensical metaphors (desktops with wallpaper, control panels with hammers, etc.) has been amply discussed by others. Much research in human-computer interface design has been devoted to getting the metaphors right. But after decades, we seem to have made very little progress on this front (choosing, instead, to naturalize the “language” of these things as if they were the most natural things in the world). Perhaps it is time to reconsider whether the tools we’ve chosen (pictures and index fingers) just simply don’t lend themselves to easy metaphorization in many, if not most, domains.

What I find distressing about modern user interface design, is not that it strives to create better metaphors, but that it has radically limited itself to a very constrained set of ideas about what is possible. Whatever system we’re designing, we design within the realm of pictures and index fingers. We argue endlessly over color, and screen regions, and steps, and flow, but the ground truth of our efforts doesn’t change. Ideas about using more than one finger (“gestures!”) or using your whole body (“Kinect!”) are greeted as thunderous breakthroughs. Think of the possibilities!

Well, we should think of the possibilities. But we should also question whether our irrefragable ground assumptions are correct: commands are for geeks, mice are for moms.

Because if we can do that — if we can just open ourselves once again to the idea of language-based interfaces — we just might make meaningful progress on designing command languages that leverage all the things that language is good at (and that pictures are distressingly bad at): speed, “cross-app” integration, scale, portability, sustainability, consistency, flexibility. Right now, anyone who even proposes to work in this area is just an expert that lacks the self-awareness to realize their expertise — a hard-core geek who only knows how to design things for other hard-core geeks.

I suppose I am a hard-core geek. But I do know one thing: All of my users speak and write. And when language fails them, they start pointing fingers.

[update: I summarize and respond to some of the many excellent comments in The Man-Finger Aftermath]

blog comments powered by Disqus