The Turing Test

The Turing test is a test for machine intelligence devised by the British genius Alan Turing in the middle of the 20th century. The idea is this: A person conducts a typed conversation with a system. If after some period of time of chatting in this manner, say half an hour, the person conducting the test can not determine that the system he or she is talking to is not human, then the system is intelligent.

In my opinion, a system that passes the Turing test is precisely a system that passes the Turing test (and is therefore remarkable) but it is not necessarily intelligent (in a sense that does justice to our intuitions of what this term means at any rate), and certainly not necessarily conscious. Turing himself did not mention consciousness explicitly when he formulated the test, but it is tempting to some to regard any system that exhibits intelligent behavior as automatically conscious as well as intelligent, while I do not necessarily regard such a system as either.

Consider a few scenarios. First, imagine that instead of a computer taking the Turing test, a committee of three people are being tested. The connection between the committee and the human conducting the test is slow enough that genuine collaboration on each answer among the committee is possible. According to the hypothesis underlying the Turing test, if the committee passes the test, it, taken as a single system, is conscious. How many intelligences or consciousnesses are there then? Three? Four? One?

Another interesting scenario is a slight variation on an idea first presented by Ned Block (1995). Let us say that the test takes half an hour. Let us also say that the communication line between the human conducting the test (let us call this the tester's side of the conversation) and the system under test (let us just call this the system's side of the conversation) is somewhat slow, but fast enough not to be frustrating to an average human typist, say 50 characters per second. Let us also say that both parties are capable of typing upper and lower case letters, the numerals, the common punctuation marks, let us say 100 different characters in all. Given that both ends of the conversation can type at the same time for the entire duration of the test, each of them may type any of 100 characters (or no character at all) each 50th of a second during the entire half hour test. That means there are exactly 100 to the power of (2 (parties) X 50 (characters per second) X 60 (seconds per minute) X 30 (minutes in the test)), or 100180,000 different entire conversations that could possibly take place during the half-hour test, from both parties holding down the 'a' key for the whole half hour, to both of them holding down the 'z' key for the whole half hour.

Now, imagine that we write a simple computer program to generate each of these possible conversations, and that we submit the resulting (staggering) pile of transcripts to a vast committee and give them a huge amount of time to sort them into two piles: pile A of all of the conversations in which the system side of the conversation seemed non-human, and pile B, the (much smaller) pile in which the system side of the conversation seemed to conduct a conversation that would pass for rational human conversation to an average person.

Note that pile B contains the rational-seeming responses on the system side of the conversation, even if the human side is gibberish - pile B is selected only on the basis of the reasonableness of the system side of the conversation. This means that pile B contains the conversations in which the system side seems human, no matter what the human conducting the test types. In fact, it contains rational-seeming responses to all possible conversations from the human side (there are 100 to the power of 50 (characters per second) X 60 (seconds per minute) X 30 (minutes in the test), or 10090,000 of them). Moreover, it contains, for each of the 10090,000 possible human sides of the conversation, all possible rational-seeming system sides of the conversation. After all, given any particular human side of the conversation, how many ways are there of filling in the gaps so that the system seemed to respond as another human would? A lot.

The committee would then throw the pile A out. They would take pile B, the one with all the coherent, human-seeming conversations on the system side, and load this pile into a computer, along with a very, very simple program. The program would only choose randomly, each 50th of a second, from among the conversations in its memory that are consistent with everything that has already been typed by both sides of the conversation. Once it has chosen a conversation that meets this criterion, it simply types out the character that the conversation says the system should type out at that particular 50th of a second (or no character at all, if that's what the chosen conversation specifies).

This program could be written in about half an hour by any decent programmer, and it would be guaranteed to pass the Turing test, assuming the vast committee exercised proper human judgment in deciding which conversations appeared "human" and which did not. The intelligence in such a system is in the data, programmed in by the human committee, and clearly not in the tiny, stupid execution engine that reads and acts on the data. Given that the Turing test supposedly tests for machine intelligence, not the intelligence of the human programmers of the machine, I think that most people would agree that to characterize such a system as conscious or even intelligent misses the point of consciousness and intelligence.

Assuming that you accept that Block's machine is not conscious (even if, by some characterizations of the term, it is intelligent), if you have a favorite computer architecture that you think is conscious, you really should specify where the difference is between your machine and Block's. Some people insist that a truly conscious computer must be a parallel processing machine, with many processors (inter)acting together. But it has been shown that any parallel processing computation can be emulated perfectly well on a single processor (for each timeslice, you make your single processor simulate each of the parallel processors in turn for that timeslice. Then you move onto the next timeslice. So the whole computation just takes n times as long as it would on an n-processor parallel machine).

Block's machine is monstrously complex - as complex as any you could propose - the complexity is in the table. In essense, the table is the algorithm. Whatever your favorite conscious architecture, it should be clear that its outward behavior would be exactly matched by that of Block's machine. There is some mapping between your machine, with its models-of-self, or its Darwinian memosphere, or whatever, and Block's machine. Both machines are doing the same thing. The only difference between Block's table-driven Turing Test beater and any more "intelligent" algorithm is purely one of optimization. The difference between the two algorithms is merely one of encoding, much like the difference between a program written in assembly language as opposed to C++, or the difference between an uncompressed file and one that has been compressed with a utility like PKZIP. The onus is squarely on the defender of some purported conscious computer algorithm to explain exactly where (and why) in the mapping between that algorithm and Block's the fairy of consciousness waves her magic wand.


Go back up to the main page