![]() |
The
Conversational Interface: Our
Next Great Leap Forward
(aka Conversational User Interface, Linguistic UI, Natural UI,
Spoken Dialog System, etc.)
© 2003-2010, John M. Smart. Reproduction, review and quotation encouraged with attribution.
| The Conversational Interface: The Trigger to Perhaps the Biggest Single Social-Technological Change Today's Adults Are Likely to See in Our Lifetime This essay outlines a prediction for the near-term (2012 to 2019) emergence of a Conversational Interface (CI) on the global web, via computer hardware and software platforms called Spoken Dialog Systems (SDS). The CI has also been called the Conversational User Interface (CUI), Linguistic User Interface (LUI), Universal Linguistic Interface (ULI), Voice User Interface (VUI), Natural User Interface (NUI) and other terms. For a few recent technical books on the topic, see Spoken Dialog Technology: Toward the Conversational User Interface, Michael McTear, 2004, and Practical Spoken Dialog Systems, Deborah Dahl (Ed), 2005.
Though it will certainly be used, as with current information technology, to foster a variety of narcissisms, addictions, and new dependency behaviors in modern cultures, it will also deliver unprecedented new problem solving and educational capacities for those cultures, subcultures, organizations and individuals who are motivated to make a more sustainable and innovative world. With a weakly biologically-inspired, connectionist architecture, the CI also qualifies as a form of artificial intelligence (AI). In fact, given the very limited autonomy, evolution, and development capacities of today's early computers, and the limited organic capacities such systems will have for the forseeable future (the next twenty years), the CI and its extensions and may end up being the most important form of artificial intelligence that emerges in the first three decades of the twenty first century. If you are my age, and expect to live only to the middle of the 20th century (2040, in my case), then the CI and its aftermath will very likely be the single biggest social-technological change you will see over your lifetime. Microsoft began working on the software behind CI's even before the founding of Microsoft Research in 1991. But the folks at Google, barely five years old when this article was first written, have advanced much farther in this area than Redmond. In fact, Google seems to have a presently unbeatable functional and scale advantage with their leadership in globally distributed search and archiving platforms which, once the CI arrives, will allow an increasingly intelligent Google OS to emerge in coming decades. [Nov 2008 Update: See the Google Mobile App Voice for the iPhone and the new Google Search Wiki, two major advances toward the CI.] This is a fascinating realization, and potentially another textbook example of Clay Christiansen's Innovator's Dilemma, the idea that our academic theory of strategic foresight isn't yet developed enough, and/or institutional execution in large companies nimble enough, to allow even our leading companies to not be 'creatively disrupted' when the next fundamental technology paradigm comes along. Consider how, with all their intelligence, wealth, and resolve never to be elbowed out of the 'center of the cyclone' of IT innovation, the geniuses at Microsoft, like IBM before them, are on track to becoming another middleware player in the great sea of internet-based software providers. As we will argue, the CI and its extensions, driven most centrally by Google (and all of us using the worlds search platforms), now seem very likely to have a global economic and technical productivity impact in the 21st century that will greatly exceed both the emergence of individual and networked computers eras in the 20th century. When Microsoft missed this strategic insight in the 1990s, and the knowledge that from a developmental perspective, the CI platform would be by far the most likely to emerge in an incremental, statistical, bottom-up fashion via superior organization of the worlds digital information (e.g., via search platforms, which were poised to become more valuable than any other platform due to their greatly superior connectionist capacities), rather than by some theory-driven top down design, they permanently lost their chance to be the dominant players in tomorrow's software industry to the chaotic, creatively destructive forces of technological change.
Most obviously, the CI will help us address the current global inequity of access to high quality, lifelong education in our increasingly technological world. And once the digital and education divides have fallen globally (eg, effectively 'unlimited' open source technological education becomes available to any human in education-permissive societies), the economic and political/ power/ equity divides can be expected to fall (or, as social responsibility advocates like to say, move from our currently transitional/unsustainable distribution, toward more "rationalized" or "sustainable" distributions) within a generation or two of the CI's emergence. There is also good early evidence that CIs will help us discover better collective solutions in governance, globalization, environment, security, health, and productivity, among other domain, and allow us to extract insight and knowledge from all the burgeoning data being collected in our increasingly transparent and quantified society. The post-CI world will be an amazing era to be alive, even while it is an era that is still several decades away from the so-called 'technological singularity,' the arrival of generally human-surpassing artificial intelligence in our most advanced computing machines. A functional CI network will not entail significant machine self-awareness, but is a transitional stage of advanced natural language processing (NLP), a field that deserves far greater funding and attention than it attracts today. NLP advances will combine with critically-needed improvements in bandwidth of connectivity and the hardware and software of simulations, so that our CI devices and humanlike agents will "talk" both to each other and to us, using data-rich semantic protocols, with grammars, vocabularies and expressions that are continually 'tuned and pruned' by the daily interaction of hundreds of millions, then billions of humans with the system. [2009 Note: Most importantly, we need to recognize that NLP advance is driven 95% by the availability of good, high-quality, semantically parsable data on the web, and the hardware to store it and networks to serve it, and only 5% by algorithm, software, or computational platform. As with many other complex systems transitions we can point to (city-state emergence, the scientific method, industrialization, automation, new company creation, etc.) the CI will be primarily a bottom-up, data-driven emergence, and only slightly a top-down, engineered emergence. David Levy in Robots Unlimited, 2006, quotes AI expert Yorik Wilks as saying "Artificial Intelligence is a little software and a lot of data." Levy observes that this also means a lot of computer memory to store and networks to serve this data. Google Researchers Alon Halevy, Peter Norvig, and Fernando Periera articulate a data-driven strategy for building the conversational interface eloquently in The Unreasonable Effectiveness of Data, IEEE Intelligent Systems, 2009. I believe this paper should be required reading for every computer science student, hardware engineer, or technology managment exec interested in understanding how the semantic web is actually emerging, not what we think we are doing to make it emerge. For additional insights on architectural approaches to the next generation CI, read Joe Colannino's 2009 Master's thesis (and slides) on using Statistically Improbable Phrases to automate semantic ontology creation. Joe says: "Statistically improbable phrases give, on average, ten times less information overload and more than double the confidence of retrieving relevant data as compared to Boolean keyword searching. If the Semantic Web is to become a reality, it will require automated algorithms such as TF x IDF operating on oodles of data."] What will a CI enabled browser of 2015-2020 look like? For one thing, it seems clear that it will have some very sophisticated software simulations of human beings as part of the interface. Already today (2004), first world culture finally spends more on video games than movies, and this will apparently be a permanent feature of our world from this point forward. These "interactive motion pictures" are more compelling and educating, particularly to our youth, the fastest learning segment of our society, than any linear scripts, no matter how professionally produced. [2009 Note: It will also be very visual. Apparently youth under 12 in a few demographics now use YouTube, not Google, as their primary search engine, so they are visual first, text second.] Imagine that we have begun talking to our computers in a crude but useful verbal exchange, a kind of 'pidgin grammar' circa 2015. It is clear that we will not simply want to talk to a disembodied machine. While some of us will be happy with simple graphic indicators telling us whether the machine understands us, many more of us will want to relate to virtual human beings, embodied agents that have the ability to nonverbally communicate, to frown or place their hand on their chin until they understand what we are telling them to do, to react with word by word with realistic microexpressions to our statements and questions, to smile when they detect we are smiling at their jokes, to act in calm and relaxing manner when they detect we are upset, to speak more rapidly we are bored or hurried, etc. Why? Because having a nonverbal quasiemotional communication channel in parallell with linguistic communication makes our words more efficient and effective. We may not want this for low priority and multi-tasking communication, but we surely will whenever quality and accuracy and enjoyment and perhaps even speed are important. Thus we can see how, over time, we will want our our CI-equipped virtual avatars to learn to model and display human emotion and body language. [2010 Note: If you are unconvinced by the above arguments, perhaps the following will help. I think we will want these avatars for the same reason humans spend so much more money on biotechnology and medicine than is warranted in comparison to other industry classes. Biotech is an industry class has consistently lost money since it's inception in the 1970's, unlike any other investment class. It is an industry based on hope, specifically on the hope for improvements and vital longevity for our physical selves. We will want these avatars for the reason we are so obsessed with humanoid robotics, and why you can buy dresses for Roomba's, even though the best robots today don't look humanoid at all. We'll want avatars for the same reason Webkinz digitally animated stuffed animals, are so phenomenally successful with young kids. We want to relate to our technology. We want our mirror neurons to fire, we want our empathy activated, we want to feel like our technology is humanizing, improving its behavior, becoming more intimately connected to us, eventually, even becoming us, becoming our extended digital self.] Now given computer technology's far greater rates of innate learning by comparision to slow-switching biological information processors like us (on the order of ten millionfold greater, see Chaisson, Cosmic Evolution, 2001), they will adapt to us, not the other way round, and continue to improve their usefulness at a rate that seems uncannily fast. The Cybertwin and the Valuecosm: Your Emerging Digital Self It seems clear to me that our cybertwins, our emerging digital agents, which will contain increasingly advanced public and private context-sensitive preference maps of our values, will eventually become our best filters of and interfaces to the growing complexity of online environments. Why? Because that's what we want--a highly personalized extension of our own memories and desires, that will increasingly represent and motivate us, and that can increasingly act in more uniquely differentiated and creative ways than we can in our slow, biological physical space. Technology futurist George Gilder talked eloquently about the Microcosm, the explosion/new environment/universe of inexpensive microprocessing power, which began in the 1960's, and ushered in the personal computer. Then he talked about the Telecosm, the explosion of inexpensive telecommunications via fiber optics and network technologies, which began in the late 1980's and ushered in a new era of globalization. Futurist Bruce Sterling, in Shaping Things, and technologists Chris Stakutis and John G. Webster in Inescapable Data, have each talked about the Datacosm, the explosion of unstructured data on the web, which began in the late 1990's and has led us to fantastic new automated structuring tools like Google, and new data mining and competitive intelligence platforms. Beginning in the early 2000's I began thinking about next steps in this hierarchy, and became interested in something that for want of a better term I call the Valuecosm, the explosion of structured public and private maps of human preferences and values. We can think of the valuecosm as an element of the Semantic Web, that eloquent vision of Tim Berners-Lee, but focused most specifically on human values and preferences in a broad variety of contexts, and graph theory models comparing those values quantitatively and qualitatively to others in the values space. In concert with cybertwins as our interface to the digital world, the emerging valuecosm will help us grow avatars that act and transact progressively better for us every day, will lead us to dramatically better discovery of potential positive-sum social interactions, to better and more distributed social network media and education, to great new subcultural diversity, and and ultimately, to new ways to hold powerful actors accountable to democratic values. In this way, as our cybertwins begin to approach human level sophistication later this century, we will use our them to look after our values and advise us on our votes, purchases, and collaborative behaviors ever more powerfully, and thereby usher in a new level of global accountability of corporations, institutions, governments, and other large actors to human rights and democratic values. This will be the first generation of an era of total systems quantification, of both abstract and concrete issues of human value, to use futurist Alvis Brigis's excellent phrase, and perhaps the first advanced version of the digital democracy vision. See The Valuecosm, 2004 for more of these longer-term arguments, if interested. Once we have reasonably good values maps on the web, and a reasonably advanced cybertwins, able to scour the web for us while we are asleep, to act as our message and media screener and butler while we are awake, etc., imagine the positive implications for:
As I've argued with my tongue-in-cheek Fourth Law of Technology, we must also expect, and try in advance to minimize, all kinds of first-generation problems with these technologies. Consider for example some of the first-gen downsides and concerns cybertwins and the valuecosm might bring to:
Getting past the dehumanizing effects of these disruptive technologies that are inevitable in their first generation, moving on to the neutral effects of the second and finally the positive effects of the third generation and beyond, will be major challenges for designers, early adopters, critics, investors, entrepreneurs, politicians, and the other key players in our multifaceted society. Using Pareto's Law, I would predict that 20% of us will end up using cybertwins and the valuecosm for personal empowerment, to take us to amazing new levels of innovation, and to just-as-amazing new levels of collective ethics and sustainability. 20% will use these tools to be measurably better and more self-empowered than our parents were, on all the measures that matter to us. The other 80% of us may choose to use these tools for new levels of fantasy, entertainment, distraction, and domestications. I don't think we have to worry so much about that, as long as we keep our citizens away from the worst of the new addictions and dependencies. As long as they don't cause structural violence, as the eminent futurist Johan Galtung defined this brilliant meme. In other words, as long as the 20% of folks who get the 80% of the work done in any society (Pareto's "Vital Few") are significantly empowered by these platforms, everyone else can take a long-deserved rest from millennia of toil, brutality, and hardship, for as long as they damned-well want in fact. No matter how you feel about it, if you think about the meaning and likelihood of accelerating technological change, you may conclude as I have that the liesure society that emerged in the 20th century, so well articulated by futurist Herman Kahn in the 1960's, will continue its inexorable advance to new heights of comfort and domestication in the 21st century. We are talking here about biological humans, of course. What happens with technologically-augmented humans may be another story entirely, but that's a story for another generation, not ours. Predicting the CI Emergence: S-Curves, Codebreaking, and the Nexus One When can we expect the CI's emergence? In March 2005 Google's director of search Peter Norvig noted that their average query is now about 2.5 words per query, by comparison to 1.3 on Alta Vista in its heyday, circa 1998. In subsequent email conversation with him he has told me that the actual number is "closer to 2.6 or 2.7." This is an initial doubling time of only seven years, if this is a quasiexponential function. It appears that the growth of the CI as a complex adaptive technological system is in the early phase of an S-curve, well before the inflection point, and thus its growth will continue to look exponential for some time to come. [September 2008 Update: Average query length to Google now exceeds 4 words, apparently just this month. This is more early evidence that this phase of search query length growth will remain exponential up to the inflection point.] In my opinion this average search query length, averaged across all the leading search engines of the day (Google, Yahoo!, Bing, etc.) will be one of the key numbers to watch to gauge the growing effectiveness of statistical natural language processing (statistical NLP) in creating a conversational front end for the internet and all our other complex technologies in the 21st century. A presentation from one of my technology foresight slide presentations below attempts to summarize this point:
How long might the emegence of a full-fledged CI take? That's a big guess today, but given that it has taken approximately seven years to double from 1.3 to 2.6 words per average query, we should expect another seven years, circa 2012, to get us to 5.2 words, a period I suspect would be just prior to emergent grammars and the feeling of CI intelligence (something "semismart" on the other end of the line) on the part of the user. If these proposals and qualifiers are approxmately correct, this would place the intelligent CI's emergence circa 2015-2020. [2008 Note: One could even argue that our very first generation (slow and very limited) CI has finally emerged with Google Voice on the iPhone in 2008.]
A brief pitch for open platforms and innovation: Please buy the Nexus One today instead of the iPhone (even though Siri is not yet available on it as of March 2010) if you care strongly about accelerating open innovation globally. As much as many of us love Apple, we must recognize that as they have grown in size they have been increasingly following a closed-innovation, walled-garden pathway. That strategy will delay the emergence of true (network neutral, open access) internet television and other critically necessary aspects of our coming peer-to-peer culture, a culture that will most directly benefit the bottom three billion human beings who desperately, desperately need to be plugged into our 21st century online collective intelligence, educational systems, and economy. Google's present strategy is much more aligned with accelerating global access to free online resources, and as long as they are more open and more innovative they deserve your discretionary dollars and support.] In my cursory research to date, English speakers use an average of 8-14 words per written sentence and 5-11 words per spoken sentence (depending on context) when we ask each other complex questions. I would expect that as soon as our search tools get up over eight words a sentence, if not before, we'll start to see and expect emergent "pidgin" grammars in our computer's responses. Since voice recognition and text to speech are alreadly largely solved (they are far easier problems to solve than NLP), we'll be speaking and listening to to those sentences. At that point, we'll begin to feel like our computers have a primitive conversational intelligence. [Nov 2008 Note: Google's announced a voice recognition app for the iPhone this month. A NYT article on Google's Voice Technology says their statistical model (unique words and some of the ways they can be strung together to ask common questions) is composed of two trillion tokens (unique words and word combinations). Having this application available, even if it is used only for simple queries for the next few years, should easily push the average queries past four words, as it is far easier for the average person to speak a longer sentence than to type one. And Google Search Wiki, if it is used extensively, promises another advance toward buiilding a statistical map of words that should be strung together to answer human questions. Google is starting with just a personalized version of your search results, but they will clearly eventually release collectively aggregated and friend aggregated versions as well.] [2005 Note: Already today, when you use "near" in a sentence on Google, as in: "Coffee shops near Palo Alto", which returns a Google Map (yay, Google's now got an optical cortex!) of yellow pins, all distributed around Palo Alto's city center, you are using a query length of four or five words. But this doesn't yet feel like conversational intelligence.] That's where I suspect we'll be in 2012, a lot of folks using a lot of simple verbal operators (time, distance, etc.) but probably still mostly by keyboard [2008 update: With the release of Google's app for the iPhone this year, it seems possible that voice queries might begin to rival keyboard queries by 2012, as there are so many more phone users than computer users. I'd love to see an internal projection for that.], and still mostly thinking of the internet as a (wonderful, but not very intelligent) information appliance. Now step forward another seven years, to 2019, and in my estimation we will likely have doubled our search length yet again, to just over ten words per average query. Somewhere between 2012 and 2019 I expect we'll see voice recognition queries (many of them mobile) begin to compete with keyed entry, and a whole new level of sophistication (and user feedback/ranking/rating) of the average queries. This time, I think we have enough new functionality to create a"step function" in user experience, where the web no longer feels like just an information appliance, it now feels like a partner, a crude extension of our linguistic ability. That's what I would call the end of the Information Age and the start of the Symbiotic Age. From that point forward many of us will begin to feel naked and somewhat stupid out in public without the web, the way we'd feel out in public without our clothes today. What is our evidence that this query length doubling will continue? It's weak today, but I think still worth watching closely. First, consider that seven years per doubling would be just over the six year doubling in average software productivity or general algorithmic efficiency quoted by Bill Joy and others in the IT industry as a rough "Moore's law for software." But doubled algorithmic power or efficiency alone would be unlikely to translate to doubled query lengths. We will need a lot more insight before we can make this claim. My main intuition in this regard is that the entire human conversation space (the space of most useful human conversations, regardless of context), while still very much larger than our digital record of it today, is becoming an effectively 'closed' (slow growing, nearing saturation) phase space, what the physicists call 'ergodic', and furthermore that the encoding of human conversation in easily spiderable form is doubling in volume at an enormous rate by comparison (roughly every two years, since the start of the web), and that our ability to rank the relative value of those encodings is also steadily improving (Google's PageRank, Web 2.0, 3.0, etc.). In other words, I suspect that the human conversation phase space, in all languages, and all digital forms (web publishing, email, audio, video, chat, and other searchable conversations) while still growing slowly in novelty, is becoming effectively, approximately, or statistically computationally closed relative to the rapid mapping of this space by technological intelligence. If this is so, all the most useful and functional thoughts/ideas/sentences in the human mind space are increasingly frequently revisited by technological indexes. The 'map' grows only slowly relative to the 'mapmaking,' which gets finer-grained every year. If this is true then the hard problem of serving up a useful, semi-intelligent natural language response to an online query is very much like codebreaking, a cryptographic problem that involves finding the set of primers, or translation elements, that are repeatedly used to transform one set of information into another. For the CI, this is the transformation of the world wide web of digitally encoded symbols of use to human beings, into another, the most useful linguistic responses to queries about common human problems. At the same time, new intelligence/information emerges through the associations made during the translation. Codebreaking, like many natural growth processes, follows a logistic curve (an S-curve) in performance over time. Early in the process (the first 'flat' part of the S-curve) it's hard to get the primers. Then you enter into a positive feedback situation where you are getting the primers for the most used words (the 'fat head' of the Zipf's law distribution) and that makes it easy to decode the other high-frequency words. Then you hit the inflection point, having gotten most of the easy words, and you start chasing after increasingly less used words, with less reliable associations to other words, and you've hit the phase of declining returns (the 'top' of the S-curve, saturation, system 'senescence' in performance). But in the first phase of growth, before reaching the inflection point, the performance growth is roughly exponential, with an average doubling time. In the CI's case, of seven years. I suspect the saturation point in query length will come at some sentence length that is slightly longer than the average human-to-human query length in spoken sentences. I suggest this because when you query Google, it is often to your advantage, even when asking technical questions today, to include additional words beyond those you'd normally ask any human in natural conversation. You use those specialized words with Google because you suspect (and it is increasingly true) that Google knows everything, unlike the average human. Let me again go on record proposing that the conversational interface will be the single most important technological innovation the average person alive today will witness in their lifetimes, out of a very long list of competing innovations, like personal computers, the internet, automated supply chains, credit cards, ATMs and cell phones. The CI makes both the greatest wisdom of the species and its lowest common denominator distractions perennially accessible to all of us. It will surely be greatly misused in its early stages, but in the long run it will allow what we say, and hear, to bring us to a whole new level of conscious insight about ourselves and the world. If 2020 is our expected transition point, as has seemed most likely to me since first thinking about this issue circa 2000, then CI's are a bit farther off than some of our most optimistic technology futurists would today have us believe. Yet they are also much sooner arriving than the naysayers predict, those who tell us NLP is riddled with near-insoluable problems, and who don't understand how far we've advanced already with simple statistically based systems. I have heard that Google, for example, has won U.S. NIST's automated language translation competition, over IBM's and other's ontological and mixed systems, by using a relatively simple statistical NLP approach (tied of course to a large and ever-growing online corpus) for at least two years in a row (2005 and 2006). With leadership, luck, resolve, and exponentially more powerful computing, text analytics, and comunications platforms, we might even be able to accelerate CI development to occur even earlier than the 2020 ETA. In addition to broadband and wireless acess for everyone, I suggest that may be one of the noblest challenges of our generation, in fact. It seems very likely that we will all soon widely recognize this as the Next Great Leap after the internet. Read on, and let us know if you agree. If you want more speculation on this topic (decling marginal returns, I am sure) which compares the CI to a network that came before it, you may also enjoy Promontory Point Revisited: The Transcontinental Railroad and the Conversational Interface.
|
|
|
|
A Question of Priorities in a World of Accelerating Computation Annually for the last six years, John Brockman's provocative World Question Center at Edge.org has posed an interesting question to roughly 100 edge-thinkers who are committed to integrating both scientific and humanist perspectives on the world. In 2002, they answered a fictional letter from then-President G.W. Bush which asked each of them, as the administration's new science advisor, "What are the pressing scientific issues for the nation and the world, and what is your advice on how I can begin to deal with them?" As a developmental futurist, one who expects that a special subset of future events are statistically inevitable and highly predictable, I drafted my own unsolicited response, which I have pasted below.
Clearly the keyboard is a primitive, first-generation interface to our personal computational machines. It gives us information, but not symbiosis. We humans don't twiddle our fingers at each other when we exchange information. We primarily talk, and use a rich repertoire of emotional and body language in simultaneous, noninterfering channels. We also use our hands to help each other and to manipulate objects ever since the first stone was thrown by the first hominid, so it is also clear that keyboards won't disappear until the human form itself disappears. In other words, talking is the highest, most natural, and most inclusive form of human communication, and soon our computers everywhere will allow us to interface with them in this new computational domain. I suggest that achieving this emergence will be the greatest of all the technological "Moon Shots" of our own brief time here on Earth, whether we presently realize it or not. The CI Network: A National Priority for Our Generation Science and technology, and broadly, local computation, appear to be asymptotically accelerating, universally-driven phenomena. We are now coming to understand that humanity does not control this developmental process, but rather selectively catalyzes it, ideally with ever increasing social, organizational, and personal foresight. The entire 20th century demonstrated an astounding, unrelenting, unprecedented double exponential growth in the price performance of our computational machines. At the same time, we have seen new levels of computational autonomy, or human-independence emerge, wherein a rapidly diminishing fraction of human effort is required to produce any fixed amount of computational complexity within each new computing system. These are apparently universal developmental trends, not architected by human design or even desire. For the last decade at least, increasingly evolutionary and biologically inspired forms of computation have become the leading edge of technological development, and will remain so for the foreseeable future. Referring to the difficulty of technology prediction, Bill Gates reportedly once said "find me the person who predicted the internet, and we'll make him king." This is congruent with a common myth that futurists missed this major development, and certainly many did. The first major "think tank" long range public futures project of the postwar era, The Year 2000: A Framework for Speculation on the Next Thirty-Three Years, Herman Kahn and Anthony Wiener, 1967, certainly missed the decentralization trend, though they did see computing continuing to accelerate. But that only shows the riskiness of relying on one forecasting group to understand the future. Every community has its own biases. Among the global community there were numerous visionaries who foresaw various pieces of the internet long before it emerged. In 1937, H.G. Wells in "World Brain," articulated the developmental inevitability of a rapidly updating compendium of total world knowledge. In 1945, in "As We May Think," Vannevar Bush proposed the Memex, a proto-hypertext microfiche network that would organize and distribute the world's knowledge, and noted "The advanced arithmetical machines of the future will be electrical in nature, and they will perform at 100 times present [electromechanical relay computer] speeds." In 1946, just one year into the modern television era, Will F. Jenkins (aka Murray Leinster) in "A Logic Named Joe," predicted "logics," televisions with attached keyboards that were networked by a switching innovation called the "Carson Circuit," that would be used to watch TV, make video phone calls, send and receive telegraphic messages (email), get weather reports, ask research questions, keep books, trade stocks, and play games. Sound like the internet to you? Sure does to me. The emergence of personal computers was repeatedly predicted by journalists and commentators in the 1950's, and were a longtime goal of electronic hobbyists, who were making successively more complicated home built electronic systems. Peter Drucker predicted our 1980's economic shift to the information economy in the revolutionary Age of Discontinuity, 1968. Alvin Toffler expanded on this and the coming network of "electronic cottages" we would see in the 1990's in The Third Wave, 1980. In other words, there were many harbingers of the internet for those willing to look, and those who realized that trends in miniaturization, computing, and communication would have to continue to accelerate, because, borrowing from the biological lanaguage of evolution and development, they weren't just evolutionary choices, these particular trends were developmental forces that the universe (our extended environment) was imposing on modern society. Because most change that occurs in the universe is evolutionary, I believe prediction is generally quite difficult, particularly for those who don't discriminate between evolutionary and developmental dynamics. But developmental processes, when they can be discerned, are surprisingly easy to predict. In the language of complexity studies, they are 'standard attractors', like the hole at the bottom of a basin, or fitness landscape. You cannot predict exactly how the "marble" (the system, evolution, us) is going to get to the bottom of the basin, that is an evolutionary uncertainty, but you know all the evolutionary marbles in the system go through one of the few developmental "holes" available. To recap, there appear to be two fundamental processes of change at work in all universal systems: evolution and development. The coming CI appears to be, as far as I can determine, a developmental emergence, and we can even measure it's progress, speculate on its enabling and inhibiting factors, and even predict its arrival from past progress, should we choose to do so. The CI network will not replace the keyboard, as some futurists have incorrectly claimed. For those who today have the education and resources to learn them, keyboards are powerful extensions of human will into the physical space. They will continue to increase in prevalence and sophistication, and will be with us as long as we continue to have biological bodies with ten fingers. Nevertheless, at the same time we can expect that most human computer interaction will move beyond the keyboard, and the ease, power, universality, and sophistication of the CI network will make all our technologies embodied, egalitarian, and symbiotic as never before. In an evolutionary developmental universe, many evolutionary paths are within our control, challenging us to be good stewards and navigators, but some developmental destinations, such as accelerating local computation, are apparently not, challenging us to be good cartographers, prioritizers, and students of physical dynamics. This phenomenon of continual acceleration, also known as acceleration or singularity studies, is in need of much greater scientific attention. We will almost certainly see the CI network's emergence within our own lifetime. Perhaps the most important remaining questions are how soon, how balanced, and how humanizing will be the path we take toward it. Fifty years ago, the advent of digital computers moved us from the Industrial Age to the Information Age. But the information age is now getting on in years, and we will soon need a new phrase to capture the meaning of a coming environment where the average human interaction with the average computer is not via keyboard, but by voice. I and others have suggested that the Symbiotic Age is the most appropriate term for this coming era, as it will describe the dominant zeitgeist of the experiencea time when human beings finally feel both significantly empowered by and inseparably connected to their technological infrastructure. A time when anyone on the planet who is comfortable with talking will be cheaply and intuitively connected to the machines around them, when we will start thinking of and talking to our machines as physical entities, flexible to our needs, when complaints and compliments that we have will be relayed to appropriate parties, when the user's vocalizations will be an integral, and eventually, dominant part of the utility of our technology. Circa 2015-2025, several forecasters expect our natural language processing (most difficult), bandwidth accessibility (less difficult), human simulation and language translation software (even less difficult), and voice recognition software (already here) to each be finally sufficiently powerful, affordable, and ubiquitous that a new type of interface will emerge. Around this time, the majority of human-computer interaction on our global computer network, however we choose to measure it, will shift to a new level of sophistication. That new level will be the move from our present simple keyboard- and mouse-driven, primarily graphical user interface (GUI, or "gooey") into a graphically based, sophisticated mouse and keyboard (including virtual keyboard) utilizing, but primarily conversational interface (CI). A somewhat different definition of the CI's arrival involves that point where the majority of code and hardware behind the average computational interface is designed to interpret human language and intent. This is the most difficult interface problem presently known, more difficult even than constructing realistic virtual graphical environments, where great and accelerating commercial success (e.g., video games) has occurred over the last decade. CI-era network machines, tools, and services, including educational services, will not simply exist to serve us web databases and graphics, as they do today, but to empower a vast range of intelligent, linguistically guided human-computer interactions, in an organic technological environment where the average human command to the network is delivered verbally, not physically (as via punch card, keyboard, mouse or other physical input device). Think of the opportunities for human development! There are so many new skills, empowerments, services, and products that will evolve from this new capacity that we may rightly consider its full benefit difficult to imagine.
In the ubiquitous, mass-affordable CI environment circa 2020, our cellphones, computers, buildings, tools, and websites will finally achieve John Sculley's 1980's "Knowledge Navigator" vision, becoming symbiotic semi-intelligent agents that do ever more helpful intellectual tasks for us in the networked world. The continued development of better "top-down" computing standards, such as Tim Berners-Lee's/ W3C's semantic web, will be a part of this process. But the major part is likely to be evolutionary developmental and "bottom-up," like Microsoft's MindNet project, involving the integration of ever-smarter artificial neural networks, or their biologically-inspired equivalents, into the "back end" systems running all the tools and technologies we use. Even today, users of early CI systems (directory assistance, flight reservations, etc.) increasingly look forward to each hassle-free upgrade of the back end. Compare this with the mixed feelings we have toward user-guided upgrade processes, and the emerging human-machine symbiosis becomes tangible.We will all play a part, unwitting or not, as this drama unfolds in these final years of "unnatural interface."
On Phase Change Singularities: The Nature of CI Emergence Circa 2020, we may expect a highly useful set of CI-equipped interfaces, built on top of an increasingly parallel but still weakly biologically-inspired set of computer architectures. The CI seems a necessary prerequisite to high-level machine intelligence. Therefore, understanding and measuring the process of CI emergence may give us insight into the dynamics of the technological singularity (generally human-surpassing machine intelligence) to follow. Why can't some gifted and motivated individual, or perhaps a massive team of individuals (say, Microsoft Research) create an adequate CI using mostly top-down rationally guided design, working in relative isolation from the rest of the communications activity of the planet? Such systems have been tried many times before, and they have predictably been far less valuable than their designers expect. Instead, a much more distributed system transition may be necessary for the CI to emerge. How must the "Symbiotic Era" of the CI emerge? I'd expect through a massively distributed computational and information storage system that records and analyzes the entire human conversation and behavior space, in all the major spheres of human interest and experience. Furthermore, this system must first conduct multifold creative evolutionary experiments to attempt to construct meaning from this conversation, and in this process a small developmental subset of highly useful natural language processing systems will be created. The emergence of these systems will not be sudden or isolated, but incremental and global, as they are guided and pruned over many years of continuous human conversation with them all across the planet. How far are we away from being able to create the next generation of such a system? I suggest you watch the development of the current generation, the planetary internet, to search for signs of the CI's emergence. Today, roughly seventy percent of the 200 million daily verbal queries that Google (the most popular search engine on the planet) receives are novel. I have heard that Urs Hölzle at Google wrote a thirty day user query cache circa 2000 but it was not useful, much to the surprise of the company. Too many of the queries at the time were new and unpredictable to the system. When leading search engines begin to cache and do natural language processing on their user queries because most of them are repeating (circa 2015? 2020?), we will know that the human linguistic space has started to become ergodic (a well-explored and frequently repeating phase space). Soon the entire human preference set as expressed in written language, will be, to a first approximation, cataloged and monitored in real-time by our distributed network of computing machines. This is a form of effective computational closure, a necessary precondition for a phase transition singularity (the CI emergence) to occur. Presently there are at least two problems preventing this evolutionary developmental emergence. The first is that there is not enough memory available to the cache (not simply the last thirty days, but something approaching the entire written history of human inquiry needs to be cacheable by our technological systems, something we can't expect for another decade or two). The second is that there probably is not yet enough global users on the system. Google's 200 million queries/day in 2003 were generated by only a few hundred million regular computer users. As responsible globalization advocates remind us, it is probably safe to say that these users are not yet sufficiently representative of the full interests and inquiries of the six billion people presently on the planet. You may have heard that Microsoft is recently (2004) launching a major new search software development initiative. This will be critical to the long run economic success of the company, because verbally-driven search is the first generation conversation of humans with machines. It is within the search space that the intelligent internet, and the next generation CI-based operating systems will emerge. Windows 2020 (perhaps better renamed Conversations 2020) will have to be built on such a platform, or Google-like systems will outcompete them for average human use. Google is becoming a truly unique distributed data processing platform ("GooOS"), and may well in coming years encode a full-featured operating system as an afterthought, rebuilding Windows functionality in a linguistically-driven Google language. Microsoft will have to match Google's distributed CI-based functionality in coming years. To not do so would be to risk being late to the next major reinvention of the planetary computing platform. As futurists Mark Finnern and Wayne Radinsky both note, Rich Skrenta's excellent post "The Secret Source of Google's Power," relates that Google's competitive advantage springs from the features of its hardware network. It is developing a distributed computing platform that, in 2004, "can manage web-scale datasets on 100,000 node server clusters. It includes a petabyte, distributed, fault tolerant filesystem, distributed RPC code, perhaps including network shared memory and process migration. And a datacenter management system which lets a handful of ops engineers effectively run 100,000 servers." That's some impressive automation. Another way to understand the emergence of the internet-based CI is to explore the historical developmental phases of web search technology: 1) The first wave of web search was created by cheap disk drives (Altavista won this war) 2) The second wave has been created by cheap CPU's and Beowulf cluster networks (Google won this war) 3) the third wave, cheap RAM, may be the next inevitable emergence. Cheap RAM will make both massive fast caches and new, far more complex CI-based algorithms possible. Who will win the third wave? That is an open question, at present. All of this is not to suggest that human learning will decrease as we approach and enter the CI-enabled Symbiotic Era. To the contrary, our learning will clearly be shifted to a whole new level as all kinds of new collaborative and creative opportunities emerge in CI-driven real and virtual space. I expect the CI to enable our currently laughable digital avatars (digital persona, or "digital me" (DM)) to become both increasingly accurate reflections of the sum of our aspirations (e.g., Lifelogs) and increasingly effective coaches and couselors of our higher selves. Post 2020, I expect my DM will begin doing things in virtual space that are amazing by comparison to what I am doing in physical space. For a fun fictional account of the Symbiotic Era, see my teen-oriented essay "Future Heroes 2035: My Friends and I." It helps to realize that the biological portion of human species activity is sharply limited by the fixed number of us (6 billion) and fixed speed (200 miles/hour) of communication within biological brains. From the perspective of computers that are growing their capacity exponentially, and learning and recording million times faster than our own brains do (on a range of measures), the entire human phase space appears essentially frozen in spacetime. Capture, closure, and convergence between humans and our digital extensions are the dominant features we can expect, from our perspective. (Things look much more exploratory and dynamic from the perspective of the machines). In the Symbiotic Era, a time when higher machine intelligence can exist only as a first-level reflection of human aspirations, the most important feature, for planetary intelligence, will be the "Intelligence Amplification" (IA) that increasingly powerful, CI-equipped systems provide to humans who are using them everywhere. I'm presently assuming this era will comprise 30 years, from 2020-2050. That would place the arrival of the next singularity, the Autonomy Era, circa 2050. The latter era would involve the emergence of initially simplistic but eventually strongly biologically-inspired self-improving systems. Because I think such systems must gain their self-awareness through a process of personality capture and co-evolution with human beings, I expect they too will require a nontrivial length of time, in human years, to develop truly complex personalities. This might take perhaps ten human intelligence years, though this would represent a far longer stretch of time in machine intelligence years. That progression would fit with a circa 2060 technological singularity (a developmental "phase change" involving the emergence of human surpassing technological intelligence). Intelligence amplification (I.A.) systems like the CI network are highly collectivist in their construction, and will be tested and refined by an entire planet's worth of users. They are also highly symbiotic, and will engage in extensive profiling, simulation, human factors internalization, and "personality capture" of their users' behaviors, habits, goals, limits, and rational and emotive states. As our second-generation, post-2020 CI systems increasingly use personality capture techniques to build sophisticated world-models of their users, we may expect that an increasing number of human beings will give their semi-intelligent agents access to and influence over their mind states in an always reversible but progressively more intimate manner. Most accurately, this should be considered a true first-generation form of "uploading," predating the more intensive and invasive forms of uploading that will occur in subsequent iterations of the symbiosis. Will you choose to let your 2030 machines cheer you up, advise you on your interaction style, or tell you when to take a work break? Today there are robotic toys (AIBO, smart dolls) that already program their users to provide specific emotional responses. The tremendous utility, comfort, and productivity of CI's utilizing personality capture and more specialized tools such as knowledge management (a first-generation electronic forebrain) will compellingly demonstrate to millions of modern skeptics that the developmental destination of human-machine interaction is not some dystopian scenario of computer domination or isolation, but instead an increasingly seamless and symbiotic convergence. After the Symbiotic Age: Speculations on Autonomy and Beyond It may be approximately and usefully true, as hierarchical acceleration models like the developmental spiral propose, that the Scientific Age lasted for 380 years (1490-1770), followed by an Industrial Age for 180 years (1770-1950), and that today's Information Age will last for only 70 years (1950-2020). If so, then we can expect the coming Symbiotic Age, driven by CI network advances in natural language processing, connectionist architectures, bandwidth, simulation, and general hardware and software development, to last approximately 30 years (2020-2050), further continuing the relentless STEM (space, time, energy and matter) compression of computation and physical transformation that appears to characterize a coming developmental singularity, should the structure of spacetime allow this continued acceleration to the Planck scale. This future history roughly argues that the Symbiotic Age may usher in a greater amount of scientific, technological, and socioeconomic change than that seen in all previous human eras combined. Or it may not, if the total level of change begins to saturate once it reaches a threshold of local complexity (a separate topic, perhaps best reserved for a later discussion). More clearly obvious is that each era, whatever its total contribution to the change function, seems to run less than half the length of the previous one, representing a true asymptotic function. There are at least two subtle points we might make here. First, note the value of discriminating change in to at least partially decoupled stages. If this model is correct, the next fundamentally new era--will occur in a thirty year period between 2020 and 2050. Ray Kurzweil proposes that change seen in the next 20 years will be equivalent to that seen in the last 200, and this perspective seems quite useful as a bird's eye view. But by choosing to additionally demarcate developmental stages, such as the Scientific, Industrial, and Information Ages, and the coming Symbiotic and Autonomy Ages, developmentalist models commit themselves to proposing an additional level of specificity to the curve of past and future change, one that allows for phases of apparent equilibrium prior to each new punctuated emergence. In other words, hierarchical developmental models almost always require that change proceeds in a pattern known as "punctuated equilibrium," brief bursts of new activity followed by longer plateaus of consolidation, and where the final years of one stage are always significantly slower than the early years of the next. This future history we have offered, then, commits us to an expectation that the next twenty years will be primarily a less-remarkable continuation of the groundwork begun fifty years ago, at the birth of the Information Age. Seen in retrospect then, the period between the emergence of the developmentally inevitable Internet and its next necessary offspring, the CI network, will not likely be considered, from the human perspective, as a period of increasingly dramatic leaps, but rather of many steady, smaller, and less noticable improvements, preparing us for the next great surge of technological change. Borrowing a popular phrase, we can say that the Symbiotic Age, our coming era of not just functionally but also linguistically adaptable physical and virtual machines, will require fifteen to twenty five additional years of slow going and hard work before it suddenly and surprisingly becomes an overnight success. As a second subtle point, consider the apparent dichotomy of the unprecedented scientific and technological acceleration presently seen in first world countries, and their decreasing rate of cultural change as they develop ever finer distinctions in permissible social, politicolegal, and economic behavior. Such deceleration of at least the magnitude of cultural change in the first world strongly suggests we are rapidly moving toward an end-stage, saturation phase of development in the human computational substrate. In other words, there's not much more social optimizing left that can be easily done by human beings running human societies. As Francis Fukuyama (The End of History, 1992) and others have observed, all the world's governments drift closer every year to a scientific, capitalistic, democratic final common pathway for social development. In this new world the average citizen, having a sharply finite capacity for change absorption, increasingly insulates their social and cultural consciousness from the developmental hurricanes occuring in the technological systems around them. Technological acceleration continues unabated, it just increasingly proceeds "under the hood" of the car of change, so to speak. Think about all the computation that went into the creation of an advanced hybrid automobile like the Toyota Prius, for example, and how oblivious we are to that, vs. car technology of the 1950's. So does this mean that the total amount of socioeconomic change must slow down in the Symbiotic Age? Hardly. We only need to lose our first world bias to understand the unprecedented nature of the changes to come. If billions of presently marginalized human beings are uplifted toward first world socioeconomic status in coming decades, to a place where only tens or hundreds of millions have gone before, this will still represent massively unprecedented socioeconomic change for the average distributed complexity of humanity on Earth. We can expect these changes, given current trends, even as first world culture becomes increasingly canalized (comfortably settled), social-benefits oriented, and regulated with every passing year. This new post-CI growth spurt of globalization will be tempered somewhat by technological systems that allow us to increasingly preserve and maintain existing cultural histories, and with less fidelity, cultural differences. It will occur as human individual and cultural consciousnesses become steadily better at either insulating themselves from, or balancing themselves within, the accelerating computational change occuring all over the planet. Globalization debates are today often framed in terms of how to help the third world rise to first world standards. By mid century, they are likely to be framed in terms of how to help societies of every type become more change-seeking, versus change-averse, with regard to many powerful new opportunities for human-machine symbiosis. "Symbiozation," not globalization, sounds like the dominant cultural agenda, in addition to refining globalization, in an era of late 21C economic abundance, to be far more equitable and pluralistic, and in slowly demilitarizing a planet that may finally have sufficient transparency and trust to allow flatter and more bottom-up rather than top-down systems of global security. If we wish to be acceleration-aware in our forecasting, we may not be done yet. During the Symbiotic Age, the most successful of our CI's will likely incorporate substantial bottom-up-developed biologically inpired evolvable hardware (EHW) components, as well as a wide range of scanned and reverse-engineered structural architectural elements of metazoan neural networks within their differentiating body plans. Somewhere within this process they will begin to develop high-level, scalable, and robust ability to direct their own self-improvement, self-repair, and self-generation (e.g., limited self-replication, variation, and selection of even high-level neural architectures). This vision argues that we must take a highly distributed, incremental, and network-centric developmental route to the "Planetization" of humanity, a concept eloquently envisioned by Teilhard de Chardin in 1945. (See "The Planetisation of Mankind" in The Future of Man). The strong claim that I wish to make is that, just as linguistic AI will require a planetary network of human beings, incrementally tuning up the conversational interface (CI) to human-level utility, so too it will require a planetary network of human beings tuning up all our robotic systems to produce a broad collection of utility robots that are be sufficiently situationally intelligent to interact in the human environment. This problem is not an easy one. Like the CI, it is extremely complex, and robots will be incredibly stupid for decades. Only the input and detailed, collective feedback of their user-gardeners will allow them to become less than stupid. Generally, I call this perspective the "95/5% Rule" and it's the idea that all major substrate transitions seem to be primarily "95%" bottom-up and experimental, and only slightly "5%" guided by top-down, hierarchical or developmental control. It suggests that the primary role of the average 21C human being, from technology's perspective, will be as a trainer and gardener of the growth of tomorrow's intelligent technologies, just as we today are socially constructing the wired and wireless participatory web. Perhaps we will choose to allow various forms of increasingly autonomous self-replication, with appropriate safeguards, because the adaptive machines they empower will be natural incremental extensions of the machine learning paradigms presently in existence, and because such extensions will demonstrate dramatically greater human utility, as well as ever more self-balancing and statistically safe behaviors in the vast majority of artificial selection environments. But perhaps most importantly, our increasing understanding of biological, cultural, and technological immune systems, systems that were poorly understood by early 21st century thinkers, will allow us proceed with growing wisdom. As we begin to collectively train our machines, we can and should expect that any local catastrophes that do occur (e.g., unpredictably behaving, unsafe learning agents) will be rigorously contained by any redundant, fault-tolerant, healthy immune architecture. We will come to realize that those micro-catastrophes that do occur, within healthy immune environments, can only catalyze immune learning, increasing the "average distributed complexity" of the system, as well as general system intelligence. This increasing informational immunity appears to be one of the great hidden mechanisms that has guaranteed the accelerating hierarchical emergence of computational substrates in universal history. Within a few short years of this new self-directing, self-replicating capacity, we will begin to suspect that our increasingly self-modelling tools and agentsand by association, "we," as a human-machine social networkare a good deal more intelligent than surface appearances indicate. We can call that next era, an age of increasingly self-directing symbiotic machine interfaces, the Autonomy Age. By comparison with previous ages, it may last as little as 10 years (perhaps 2050-2060), taking us to the edge of a circa 2060 technological singularity, in this set of guesstimates. The transition to full autonomy in our technological systems will occur when it is ready. It may be unfortunately or intelligently delayed, or conversely may be charitably hastened along, but it is ultimately an unstoppable natural transition for technological development. Nevertheless, the way this transition appears, to human beings, will likely be largely within the control of post-singularity A.I., and influenced by their ethical concerns, whatever those may be. Greg Stock (in Metaman, 1993) and others have written eloquently on this transition, but few have considered the requisite ethical constraints that must emerge within self-directing technological systems that have many orders of magnitude greater learning capacity than even our most cherished biological architectures. We may understand ethics as a form of behavioral immunity that protects an otherwise precarious accelerating intelligence development, and no known intelligent systems exist on Earth without the presence of a competent, healthy, overarching immune system. If you additionally suspect, as I do, that self-aware technological systems will quickly become a true superset of the biological space, containing all the elements of biology, plus many additional unseen capacities, including immunity and ethical capacities, then excellent arguments can be made that our "uploading" into the machine substrate will be inevitable, gradual, voluntary, desirable, and reversible (the latter at least in principle, if rarely in action) when viewed from the perspective of our present unmodified biological minds. Considerations of developmental ethics in complex systems are among a number of important outstanding questions in need of ongoing careful study, and are a priority for our organization. Expect much more on these topics in coming years from our emerging sciences of simulation. What an amazingly innovative time to be alive! Thanks for reading and passing this on. As always, I appreciate your comments, critiques, fixes, and feedback at: johnsmart{at}accelerating{dot}org. |