Building a Robot with Human Intelligence

Building a Robot with Human Intelligence

Kits and Toys

Robot Kits
Stiquito Kit
BattleKits
Robot Toys
Solar Kits
Robot Arms
Robosapien
Basic Stamp Kits
Lego MindStorms

Books

Beginners Books
Hobby Robots
Robot Sports
Electronics
Mechanics
Robot Minds
Books for Kids
Robots at Work
Microcontrollers
Advanced Books
Used Books

More Robotics

Real Robots
Robot Motors
Remote Controls
Robot Parts
Robot Tools
Magazines
Robot Videos
Robot News
RoboLinks
Contact

Courtesy of New Scientist Magazine

By Duncan Graham-Rowe

THE YOUNGSTER EYES ME suspiciously as I enter the room, its gaze following as I cross the floor. Then after a while, it loses interest and turns back to its toy dinosaur. But then I never was any good with kids.

When Rodney Brooks set out to build a humanoid robot with the intelligence of a two-year-old child, he didn't realize what he was letting himself in for. Six years later, he and his team at the Massachusetts Institute of Technology have transformed themselves from artificial intelligence (AI) experts into the most unlikely bunch of developmental psychologists and nannies.

Colorful toys litter the labs, and much of their time is spent playing with and entertaining their charges. This is because Cog and its alter ego Kismet are the first of a new type of robot designed to behave in the same way as small children. If you want to create a robot with the intelligence of a two-year-old, Brooks reasoned, the best approach was to give it the innate abilities of a newborn and let it develop.

Motors and software may control Cog and Kismet but it's difficult to believe that this is all that makes them tick. Cog may be only a head, arms and torso, but as you watch it explore the world you can't help feeling that something profound is going on. It moves smoothly like a creature rather than with the abrupt, mechanical movements of a machine. And its eyes dart from one object to the next, with its head slowly bringing up the rear, as though it were human.

Kismet is even more convincing. It may be just a head, but it has more interesting facial features than its larger cousin, making its moods easier to interpret. Complete with eyelids, ears and newly acquired lips, it has the appearance of something young and cute, and reacts to events with an impressive repertoire of doe-eyed expressions ranging from surprise and interest to sadness and anxiety.

One of the unexpected findings from the project so far is that the appearance of these human-like actions and reactions is not just an optional extra. The robots, like children, will not develop unless their caregivers read more into their behavior than is actually there.

The idea of building a robot with human intelligence, even that of a child, is not only ambitious, it's highly unconventional. Most AI researchers confine themselves to recreating a single sense, such as vision, or simple behaviors--not the whole shebang. "The hard-core roboticists are not comfortable with our work because we are not doing the same sorts of things," says Brooks. If it wasn't that Cog does what it's supposed to, he says, his team would have been written off long ago.

But then Brooks has always been controversial. In his work on robot insects in the 1980s, he rejected the idea of a central "brain" and showed how intelligent behaviors could emerge from cooperation between a number of simple, independent systems. Each leg of Genghis, a six-legged robot, for example, had its own simple controller, and walking "emerged" by timing the actions of the controllers.

Brooks also pioneered the idea of increasing the complexity of robot behavior by building up a hierarchy of simple systems. When "whiskers" fitted to Genghis detected an obstacle, they modulated the signaling between the legs in such a way that the robot walked over the obstacle as though it "knew" what was before it.

His approach broke the traditional mould of AI, which tended to treat intelligence as a problem that could be encoded in rules, and hence software. Give a robot a software replica of its world to refer to, and a set of instructions for negotiating that world, and it would appear to act intelligently. By contrast, Brooks argued that robots need no internal description of the world: for him, intelligent behaviors emerge only when a robot exists in the world and interacts with it.

With the Cog project, Brooks wants to take his ideas further. "We're trying to build a Commander Data," he says with a smile. He knows it's a lofty aim, the android from Star Trek: Next Generation is smarter by far than the average life form.

Four eyes

At present, Cog is still a collection of isolated computer-controlled systems. For eyes, it has four cameras, two for peripheral vision and two for high-resolution, narrow viewing, which all move in their sockets in the same way as human eyes. It also has a movable head, neck, arms and gripper-hands. In addition, Cog has an auditory system that gives it enough information to know where sounds come from, a basic sense of balance and the rudiments of a sense of touch. Most of its motors come with position sensors to let Cog know where the rest of its "body" is, and strain and temperature gauges so it doesn't overload anything.

These are the basic systems--like the legs and whiskers of Genghis--which when they are connected to each other will allow Cog to display intelligent behaviors. Already, it can find faces, tell if a person is looking at it, detect movement, copy head gestures such as nodding, and even play with a Slinky. "We also do some very basic auditory stream segregation, so you can get the sound of my voice away from the fans in the background," says Brian Scassellati, a cognitive scientist and one of Cog's principal architects.

What makes Cog so different from its insect predecessors is that Brooks and his team are not simply trying to build an increasingly complex system and watch what emerges: they are trying to produce specific human behaviors.

Consider looking somebody in the eye. Cog does this in a series of steps. First, it notices that someone is present by detecting motion in its peripheral vision. Then it checks whether the moving object has a face, using an algorithm that matches patterns of light and shade to a template made from shadows cast by faces in different lighting conditions and orientations. Once it finds a face, Cog's eyes, then its head, turn towards the face and it matches the peripheral image to the high-resolution image. This done, a second template picks out the eyes within the image of the face. The behavior appears lifelike and the robot looks you in the eye in real time, without needing massive computing power.

In line with Brooks' original ideas, all Cog's actions and abilities are discrete and built in hierarchies. The result is that new behaviors, such as detecting a waving hand or recognizing individual faces, can be built on top without redesigning the existing systems.

Still, for the most part, Cog's abilities are purely reactive. If it is ever to fulfil Brooks's dream of learning like a child, Cog is going to need a memory, and so far the team has not tackled that problem (see "No past, no future, " below). The researchers have, however, begun to look at one aspect of learning that they originally took for granted, but which has proved to be crucial to Cog's development--social interaction.

As infants we learn gradually, expanding our abilities as our carers slowly increase the complexity of the tasks they set us. This incremental approach relies heavily on social interaction. It had always been part of the plan to make Cog communicate with people, but its designers hadn't counted on how complicated a process this is. The robot must be able not only to understand the intentions of its carer but also to impart its own intentions. This might be straightforward but for the fact that Cog doesn't have any intentions or know how to express them--which makes it similar to very young babies.

Cynthia Breazeal, principal researcher on Kismet, has wrestled with this problem. The solution, she believes, is that young babies are great at giving the impression that there is more going on inside them than there really is. Developmental psychologists argue that young babies are capable of showing only a few, innate expressions. Yet parents assume from the beginning that their babies behave in the way they do for meaningful reasons.

Parents interpret facial and vocal expressions as indicators of how a baby is feeling--so if a child is content, parents tend to maintain their level of interaction, but if the child appears uninterested or upset they may intensify the interaction or change it, to regain the baby's attention. Children learn from the consistency of the parents' reactions how to manipulate their parents, and so gain attention. This puts them in an ideal position to carry on learning.

These ideas are embodied in Cog's cousin. "Kismet takes advantage of the way we are programmed to interact with small children," says Breazeal. It was designed to emotionally blackmail people. If you don't do what it wants it scowls or looks sad: if you please it, it rewards you with a smile or a look of interest.

Kismet is actually a platform for testing behavioral systems before installing them in Cog. It has a vision system similar to Cog's, able to detect motion and faces. But Kismet also has what Breazeal calls a motivational system, which tries to keep the robot in a happy, interested state. This consists of a collection of drives, such as the urges to be social and stimulated, and behaviors that satisfy those drives (see Diagram, p 46). The intensities of the drives, which dictate the expression on Kismet's face, increase if they are not satisfied and decrease when the appropriate behaviors are operating.

So, if Kismet is left alone, the intensity of its social drive increases. This makes it look sad, communicating to anyone passing that it craves attention, and activating a behavior called socialize. As soon as it detects a face, it fixes on it and begins to socialize--sadness changes to happiness or interest and the intensity of its social drive begins to fall. If, however, the interaction is too intense and the social drive is pushed too far, the robot becomes overwhelmed and gives a look of displeasure.

Similar events take place with Kismet's "stimulation" drive, which is counter-balanced by a behavior called play. At present, Kismet's favorite object is its toy inchworm. If left alone, the robot looks sad and its play behavior is activated, but bounce the worm and it starts to look happy and the desire to be stimulated begins to fall. If you bounce the worm too fast, however, Kismet looks disgusted. And if you carry on, then disgust changes to anger, or it may simply close its eyes and sleep to allow its "brain" to catch up. In an added refinement, if you repeat the same movement repeatedly then Kismet gets bored and looks sad again.

So humans watching Kismet's expressions take them as signs of how it is "feeling" and modify their behavior to restore it to a contented state--just as they would with a child. Eventually, its happy and interested expressions will signify that it is absorbing information at the optimum rate. So, Kismet's expressions will regulate people's actions to let it learn at that pace.

The team is now working on a system that will allow Kismet to respond to the inflection in people's voices, so that its emotional state can be changed by auditory as well as visual stimuli. And soon, says Scassellati, they will introduce a vocal system so it can babble like a baby. This will let Kismet attract people's attention even when they're not looking at it, and let them know if it is happy or sad by cooing or crying. Eventually, with some form of software capable of learning, the vocal system could play a part in helping Kismet to develop language.

To some, Breazeal's approach might seem pointless: after all, what could bouncing a toy inchworm teach a robot about the world? The same, however, could be argued of infants, and yet it clearly works for them. This sort of exercise teaches infants more about themselves than their environment, such as how to reach for an object.

It also gives them the opportunity to forge links between different sensory perceptions. Children might learn, for example, to associate neural signals for bright colors with those for movement, thereby realizing that the bright thing and the moving thing they are seeing are one and the same. Breazeal and the team hope to see Cog learning in the same way.

Such basic skills are essential for children because they pave the way for more complex tasks later. To appreciate this, take the seemingly simple job of understanding what someone is referring to when they point at an object. Only other apes and dolphins are able to grasp that there's something "over there" worth looking at, and then find the object of interest. This is one form of a skill, called joint attention, which Cog needs to have because so much social interaction depends upon it, says Scassellati.

He hopes to give Cog this ability by following the idea that joint attention is built of simpler skills. For infants to grasp the meaning of pointing, they first have to develop the skill of knowing when someone is looking at them, and then learn to follow that person's gaze and finger. Evidence for this composite nature comes from observing child development.

While most three-month-olds have developed eye contact, it is not until nine months that they follow another person's gaze, and 18 months that they follow someone's gaze out of their field of view. And before they can single out the object being pointed at, it seems that infants first go through a phase of following the person's gaze and finger, and fixating on the first object their eyes see. Cog already has some of the basic skills needed for joint attention, such as the ability to maintain eye contact. It can also learn to point at an object. "Once we have enough of these rudimentary social pieces then we can actually start to learn from people," says Scassellati.

And this is really the crucial point about the Cog project. It is not about wiring the robot up and turning it on. But rather about gradually increasing Cog's skills, and watching as the rich-ness and complexity of its behavior increases, hoping that one day "intelligence" will emerge.

In the next few months the team plans to carry out psychological tests to see if Kismet can manipulate its carers as well as children manipulate theirs. A number of new heads are also on the way, including one that will transfer Kismet's abilities to Cog, which should make it appear more like a child than ever.

Already, the robots seem so human that there's a strong temptation to think of them as male or female. This is even though the Cog team has worked hard to keep its charges sexless. Still, if the researchers are ever to achieve their goal of making a robot that interacts without making people feel uncomfortable, then perhaps they need to address this issue. We humans, after all, are either one thing or the other.

BOX: No past, no future

OF THE PROBLEMS still to be faced by Rodney Brooks and his team at MIT, some of the thorniest involve memory. A robot built using the traditional techniques of AI would have a model of the world built into its software controls. So when it saw something red, the robot would "know" it was red because its software would tell it so. Likewise, if it needed to remember something--the position of an object, say--an instruction would tell it to do so.

Cog has no such model or instructions because the aim is let it choose what to remember. So how will it make this decision? Or know that it's seeing red? And how will the concept of redness be represented in the robot's memory? "In traditional AI you would never think of this as being a problem," says Brooks's colleague, Brian Scassellati. But for Cog these are major stumbling blocks that the team have yet to address.

Other things that we take for granted become real difficulties for Cog. Without a sense of time, for example, Cog will not be able to order its thoughts or know the difference between past and present. "There are so many problems," Scassellati sighs.

But then, a few years ago, social interaction seemed a colossal problem. And today, the team's approach of breaking behaviors down into simpler steps is starting to pay dividends. The question is whether storing and retrieving memories will fall to the same approach.

Subscribe to New Scientist

Advertise your product on RobotBooks.com