Research Agendas
The computer is a strange machine —brain amplifier, number cruncher, image manipulator. Researchers and artists seem fascinated by challenging its limits; they are intent
on seeing how far they can extend its capabilities. Sometimes this quest means trying
to enable computers to manifest skills that are quite unremarkable for humans but extraordinary for a machine, for example, understanding spoken words or extracting the
meaning of a children’s story. Or it might mean developing skills that are beyond
human capabilities, for example, being able to instantly analyze a database composed
of millions of records in thirty different ways. The research agendas to extend capabilities
and reach are critical elements of this era’s cultural history. This activity in think tanks
and worldwide labs is the flow that must be a source for current and future artistic
activity.
Inputs—Systems Recognizing Speech, Gestures, Faces, Objects, Motion,
Touch, Emotions, and Biological Signals
Speech Recognition
How can a computer understand the words of human speech? Commercial speaker-
independent-recognition products are already available. The task of understanding the
meaning of speech is much more difficult and still challenges researchers. Extensions
include the development of “auditory consciousness” and “auditory scene analysis,”
which will allow systems to track multiple human speakers in complex sound environments and identify their relative physical locations. Other research seeks to track speaker
changes, topic changes, and changes in emphasis. “Meeting capture” and “speaker segmentation” will enable the scan and analysis of a record of a complex sound event, such
as a meeting, in order to systematically summarize the event and reconstruct the flow
of the conversation by speaker and the thematic thread, and allow automatic browsing
and gisting. Reportedly, the CIA has a system that can monitor thousands of phone
calls simultaneously, listening for specific key phrases. A project called Net Sound is
attempting to extract the underlying acoustic structure of sounds in hopes of finding
algorithmic representations, like a vector or postscript representations of images, which
would allow more efficient network transmissions. “Emotional computing” projects seek
to enable computers to analyze the emotional content of speech via attributes such as to create musical performance augmentation systems that can work in both professional
and casual contexts.