gen_50.1.gif WelcomeAboutThe Booke-mail me

Chapter 8 - Logic

Logic is the primary way that humans reason, but it has its limits.

Up to now, we have talked about some of the basic principles of morality, Morality, is the knowledge of good and bad. The remainder of this book will discuss ethics - the methods by which moral decisions are made.

A recurrent debate in the field of ethics has been to what degree, if any, moral decisions are subject to logical reason. Reason is the way that people make decisions. The formalization of these methods of reasoning is the study of logic. In contrast to the application of logical reasoning to matters of good and evil, there is the emotional response to our moral choices. The emotions appear not to follow the laws of logic, creating a dynamic of their own, a different method of reasoning.

This chapter will go into some detail about logic and formal methods of computation. Some of the more advanced results in logic, discoveries of the last century, have an important bearing on the power and limits of logic.

What is logic? Like the lever and inclined plane to physics, it is a simple and fundamental piece of machinery for the intellect. Most intellectual processes are reducible to logic just as many dissimilar machines are reducible to the inclined plane. But just as machines can be especially tailored to optimally perform certain tasks, there are different logics and formal calculi, each fulfilling a different purpose. And just as there are other machines besides an inclined plane, logic is only one of the tools that the intellect possesses.

Logic essentially is a calculus of discourse - it is a way of talking about things. Logic is defined by its rules of syntax and rules of inference. Syntax rules define the structure of the sentences. They provide the method of determining if a statement is well-formed; whether the sentence structure has no errors. The syntax can also determine the structure of the overall discourse and it defines a space of sentences.

Logic as we know it was first systematized by Aristotle. He proved a detailed analysis of logical syllogisms. To give a hoary example found in standard logic textbooks such as Copi:

All humans are mortal Socrates is human Therefore Socrates is mortal

Aristotle showed that when the concepts in the sentences are replaced by abstract symbols along with their logical connectives, we can analyze logical arguments in and of themselves without recourse to the specific facts at hand. This work was later extended in the nineteenth and twentieth centuries by people such as Boole, Frege and Russell and Whitehead.

Logic requires a formal language that maps statements to some notion of truth and falsity. Besides the simple notion of logical connectives such as and &, or, ... implication  negation ~ and equality =, there is also a need for a definition of logical predicates - properties of individual objects and their interrelationships that are true for some selection of individual objects and some properties but false for others. Such formal languages can be represented as a Herbrand universe, where various interpretations of the objects and properties form different models of the formal system.

The first step in logical reasoning is to formalize the situation at hand in terms of the applicable objects and the predicates that describe their interrelationships. This can be a source of disagreement. Some aspects of definitions and their use will be discussed in the next chapter. But the formalization of a statement of fact into a logical notation requires a clear and consistent use of terminology and a clear understanding of the assumptions that are made when people convert real world facts into logical notation.

After the statements are defined, logic then provides rules of inference. The inferential rules show how derive conclusions from axioms and premises. For example, if A is true and B is false, then A OR B (AVB) can be inferred, but A AND B (A&B) cannot. If B must be true whenever A is true, but B can be either true or false if A is not true, then A implies B: AB. If both AB and BA, then A if and only if B, given as A IFF B (AßB)

There are two different forms of logic, the propositional logic and the predicate logic. Propositional logic is the rules and methods of reasoning for basic propositions, made up of simple symbols A, B, C, ... connected by AND, OR, IMPLIES, IFF and NOT. The rules of propositional logic are relatively simple. The situation becomes more complex with in the predicate calculus. There, the goal is to reason over universes of objects, finding out what is true for all individuals x, or if there exists at least one y. This leads to the addition of predicates such as Ax and Byz, for example where A may or may not be true for all x and only certain pairs of values assigned to variables y and z can have the relationship B. Then we need to consider universal and existential quantification: Âx Hx  Èy Hy & x·y & Myx, which under some Herbrand universe be interpreted to mean: For all objects x, if x is a human (Hx) then there exists a y, where y is also human (Hy), the humans x and y are not identical, and y is the mother of x (Mxy).

The truth or falsity of logical statements is usually determined in light of certain premises: Given that A implies B and B implies C, it must be true that A implies C. AB BC |- AC

This form of presenting a logical argument can be packaged together by making a conjunction of the premises and implying the conclusion: the assumption that both A implies B and that B implies C are both true further implies that A implies C is true. ((AB)&(BC))(AC).

For a logical argument to be true, it must be true under every model that serves as an interpretation of the objects and predicates. There are a number of sets of methodical rules that can be applied to determine if a logical argument is valid. One of the simplest is to package the logical argument as a single proposition. If the logical argument is true, then this statement is a tautology. Therefore, there can be no assignment of truth or falsity to the propositions A B and C where the statement is true.

It is actually simpler to negate the statement and show that every assignment of truth or falsity to the negated statement leads to a contradiction. We can represent all these assignments through a device known as a 'truth tree' where statements connected by an AND are replaced by listing the statements one below the other, and statements connected by and OR are fanned out side by side, forming a tree. See Richard C. Jeffrey 'Formal Logic: Its Scope and Limits' for a detailed description.

For example, assume the negation: that ((AB)&(BC))(AC) is false. Now if XY is false, it must be true that X is true and Y is false. Conversely, if XY is true then either X is false and Y is true.

So, assuming ((AB)&(BC))(AC) is false, it must be true that AB is true, and BC is true, and AC is false - that is ~(AC)

We write these cases this way: AB BC ~(AC)

If AB is true, then either A is false or B is true. We replace AB by two paths, one containing ~A and the other containing B: BC ~(AC) ^ ~A B

Similarly, we replace BC: ~(AC) ^ ~ B C ^ ^ ~A B ~A B X

The 'X' marks a branch of the tree that contains the two contradictory propositions ~B and B.

Finally, for AC to be false, then both A must be true and C false. Replace ~(AC) by these two statements:

A ~C ^ ~ B C ^ ^ ~A B ~A B X X X X

Now all four branches have a contradiction: in the first branch, A and ~A, in the second ~B and B, in the third A and ~A again, and in the fourth ~C and C.

All propositional syllogisms and be proven true or false this way: if you list the premises and the negation of the conclusion, then apply these rules to expand the argument into a tree, the syllogism is true if and only if every single branch has a contradiction.

The use of truth tables is extended to the predicate calculus with universal and existential quantification. If a universally quantified statement is false, then its existentially quantified negation is true: ~Âx (...Px...) iff Èx ~(...Px...). A similar rule is true for existential quantification.

The goal for using the truth tree method is to replace each universally and existentially quantified statement by a statement where the quantified variables x,y,z are replaced by constants a,b,c. The process is defined as follows:

If an existentially quantified statement Èx(...Px....) appears in the truth tree, then for each branch of the tree, replace it by (...Pa....), where a is a constant has not yet appeared in that branch of the tree.

If a universally quantified statement Âx(...Px...) appears in the truth tree, then for any constant a,b,c, etc that appears in the path of the tree, the statements (...Pa...), (...Pb...), (...Pc...) can be added to the tree. If there are no constants in that path already, choose a constant a. Do not remove the universally quantified statement, though, as it can be used again as new constants are added.

There are also a couple of rules for handling equality (x=y). The first is that any branch that contains the statement ~(a=a) for some constant a contains a contradiction. If a positive statement a=b appears, then it is possible to add statements in which a ...b... appears with the statement with the constant a substituted: ...a...

The Godel Completeness Theorem shows that with a systematic application of these rules, any true theorem in the Predicate Calculus can be proven to be true. That is why it is referred to as the predicate calculus - calculus means a process of reasoning by a formal symbolic computation. A major aspect of logic is that the procedure is a way of describing recipes, that when followed as given will yield absolutely the same results. This shows that formal logic is a model of effective computation - it is a mechanical, cookbook method to find every true statement. An effective computational procedure is fully determined - there is no appeal to magic or a supernatural force.

Two other methods of reasoning are Turing Machines and the General Recursive Functions. A Turing machine consists of a head positioned over a potentially infinite tape of symbols. The head can be in any of a finite number of states. Each position on the tape can hold one of a finite number of symbols, or blank, and the head can read and write those symbols. The program for the Turing Machine is a state table, where the rows of the table are labeled with the indexes of the states that the Turing Machine can be in and the columns of the table are headed by the symbols that can be written on the tape, including the blank. If the Turing machine is in state i and reads symbol a, then a lookup in the table shows the next action the Turing Machine can take: it can overwrite the symbol a with a new symbol b, it can move Left or Right, and it can go into a new state j: table entry is the triple , where M is either L or R. It is acceptable for a and b to be the same, or i and j to be the same, and either a or b or both can be a blank. The tape is presumed to begin with only a finite number of non-blank symbols written on it. There is a unique initial state for a given Turing Machine, state 0. If the Turing Machine goes back to the start state at any point of its computation, it is said to have halted. If it never reenters the start state then it never halts. If the Turing Machine halts, then it accepts the input sequence. To give an example, if the number 19 is given as a binary sequence 10011 and Turing Machine TM(x) halts if only if x is a prime number, then TM(10011) halts. Turing machines can also be considered as functions. When the Turing Machine halts on input x, inspect the symbols left on the tape. This can be considered as the output y, thus TM(x)=y.

It is possible to consider a nondeterministic Turing Machine that can have more than one alternative move from a given pair of state and symbol. In that case there is a exponentially growing set of sequences of states and tape symbols that can arise from a single input. In this case, we express these sequences as a branching tree of alternatives. If one branch halts, the Turing Machine accepts the input. If the Turing Machine is considered to be a function, choose the shortest sequence and look at what is left on the tape at the end of the sequence for the output.

Due to the simplicity of the Turing Machine, the rules for computing the truth of a logical statement can be rather involved, since it takes a lot of these simple instructions to do anything of significance. Even simple arithmetical operations such as adding one to a binary number can take a handful of states to express. To take a simple example, a nondeterministic Turing Machine can determine if there is an assignment of truth or falsity to a set of Boolean statements in conjunctive normal form (equations of OR's linked by AND's). A representative set of equations is: (A ... B) & (~A ... ~C ... ~D) & (B ... ~C) & (C ... D) This is satisfied if the truth assignments are A=T, B=F, C=F and D=T.

To make the construction of the Turing Machine simple, we shall assume that the input has a very simple syntax. If there are n statements and m variables in total, then the input is n statements each m characters in length, with each statement separated by a '#' from the previous one. Character i of statement j is a 'X' if variable i does not appear in statement j, a 'P' if variable i appears but is not negated, and 'N' if it is negated. So (A ... B) & (~A ... ~C ... ~D) & (B ... ~C) & (C ... D) is given on the input tape to the Turing machine as ...#PPXX#NXNN#XPNX#XXPP... Where '.' represents a spot on the tape that is blank.

The Nondeterministic Turing Machine has 10 states, numbered 0 (the start state) through 9 (a reject state). The tape begins with the symbols {#,X,P,N} on the tape and the Turing Machine can also read and write the symbols {a,b,c,d,e,f,T,S}. To save space in the following table, if a set of productions start in the same state n and go into the same state m and move in the same direction M, for a set of symbols {x,y,z}, where the machine replaces x by r, y by s and z by t, then I will write the three productions n,xm,r,M n,ym,s,M n,zm,t,M as the contraction: n,{x,y,z}m,{r,s,t}M The only nondeterministic state is state 1. The state 1 productions will be written out without a contraction.

Here is the full Turing Machine 0,#1,#R (i.e. the Machine in the start state 0 reads the leftmost # and goes into state 1 and moves right, leaving # unchanged) 1,X1,a,R 1,X1,b,R 1,P1,c,R 1,P1,d,R 1,N1,e,R 1,N1,f,R 1#2,#,L 2,{a,b,c,d,e,f}2,{a,b,c,d,e,f},L 2,{#,T,S}{#,T,S}3,R 3,{a,b,c,d,e,f}4,{S,S,T,S,S,T},R 4,{a,b,c,d,e,f,#}4,{a,b,c,d,e,f,#},R 5,{a,b,c,d,e,f,#}5,{a,b,c,d,e,f,#},R 4,{X,P,N}2,{a,c,e},L 5,{X,P,N}2,{b,d,f},L 4,.6,.,L 5,.6,.,L 6,{a,b,c,d,e,f}6,{S,S,T,S,S,T},L 6,{#,S,T}6,{#,S,T},R 6,.7,.,L 7,S7,S,L 7,T8,S,L 7,#9,#,L 8,{S,T}8{S,T},L 8,#7,#,L 9,{#,S,T,.}9,{#,S,T,.},L 7,.0,.,L (halt state)

Another useful formal system is what came to be known as the primitive recursive functions. (The following description of recursive functions is quoted from Rogers Theory of Recursive Functions and Effective Computation):

The primitive recursive functions are a set of functions satisfying the following rules: (1) all constant functions f(x1,...,xn)=c (2) the successor function f(x)=x+1 (3) all identity functions f(x1,...,xn)=xi (4) if f is a primitive recursive function of m variables and g1,...,gm are primitive recursive functions of n variables, then the composition of these functions is a primitive recursive function: h(x1,...,xn)=f(g1(x1,...,xn),...,gm(x1,...,xn)) (5) if f is a primitive recursive function of n+1 variables and g is a primitive recursive function n-1 variables, then h is a primitive recursive function, where h(0,x2,....,xn)=g(x2,...xn) and h(y+1,x2,...,xn)=f(y,h(y,x2,...,xn),x2,...,xn)

Although the primitive recursive functions seem to be general, it was found that they were not general enough to express Ackerman's function: A(0,0,y) = y A(0,x+1,y) = A(0,x,y)+1 A(1,0,y) = 0 A(z+2,0,y) = 1 A(z+1,x+1,y) = A(z,A(z+1,x,y),y)

This function is a generalized exponential, a function where the first argument represents an index of a class of functions of the second two arguments: A(0,x,y) = y+x A(1,x,y) = y*x A(2,x,y) = y^x ... A(z+1,x,y) = the result of applying y to itself x - 1 times under the z operation

The fact that clearly defined functions existed that could not be defined as primitive recursive functions led to their being classified as primitive and a more general class of functions defined instead. The General Recursive functions are defined by Kleene as a computation of a finite sequence of equations. The definition of a function P is a set of recursive equations. A computation is a finite sequence of equations, beginning with P, where each equation after P is obtained from the preceding equations either by the substitution of a numerical expression for a variable symbol throughout an equation or by the use of one equation to substitute "equals for equals" at any occurrence in a second equation or by the evaluation of an instance of the successor function lambda x[x+1]. In P, we allow auxiliary function symbols in addition to the main function symbol in whose evaluation we are interested. Thus the set of equations f(0) = 0 g(x) = f(x)+1 f(x+1) = g(x)+1 with f as the main function and g as an auxiliary function symbol, that computes the function lambda cx[2x]. Although this definition allows for ambiguity in choosing the computation sequence, the procedure can be made effective and unambiguous by defining a way of enumerating all acceptable sequences and choosing the first complete sequence in the enumeration that comes along.

It is also possible to express the General Recursive Function as a simple extension of the primitive recursive functions by the mu theorem. If f(x,z1,...,zn) is a partial function with n+1 arguments, then the mu function µxf is the partial function with arguments (z1,...,zn) that returns as its answer the smallest value x such that f(0,z1,...,zn), f(1,z1,...,zn), f(2,z1,...,zn), ..., f(x,z1,...,zn) all exist (are defined) and f(x,z1,...,zn)=0. If no such x exists, then µxf is not defined for the case z1,...,zn.

The mu theorem is expressed as follows. Given any general recursive function g of n variables that can be computed by Turing Machine TMi, it can be defined as the mu function applied to a primitive recursive function based on f: g(x1,...,xn)=µz[z= and v= and f(i,y,t,v)=0], that is, g(x1,...,xn) equals z, where the value z represents the pair of number y and t and where the value v represents then numbers x1,...,xn written on the input tape of TMi.

There have been a number of models of effective computation defined besides formal logic: General Recursive functions Church's lambda calculus lambda x [...x...] Post's correspondence problem Turing machines

Each one of these has been shown to be equivalent to the others. That is to say, the elementary computational steps taken in each of these methods of computation can be simulated step by step in any of the others.

For example, the predicate calculus of formal logic can be converted into a Turing Machine. Any logical statement can be built up out of the connectives &,V,,~,= and parenthesis (), where any number of variables and predicates can be represented using numbers: x111 is the 3rd variable symbol and P111111 is the 6th Predicate symbol. A single space can be used between statements. This is the form that the logical statements can be written on the tape. The Turing Machine then produces the proof the using the well-known rules that a person uses.

It is also possible to express any General Recursive function as a Turing Machine. It can also be proved that a single step of a Turing Machine can be expressed as a primitive recursive function. From this we can define a primitive recursive function f(i,y,t,x) that returns 0 if and only if TMi(x)=y in t steps.

Because Turing machines are so simple, they often form the basis of the canonical (or standard) definition of this essential set of effective procedures. Each Turing Machine, for example can be assigned a number, known as its Godel Number. The method of doing that is simple: since a Turing Machine is defined by its state table and each table entry is the triple , each entry in the Turing Machine's state table is a five tuple . Now each individual state i and each unique tape symbol a can be assigned a number. The two movements L and R can be assigned the numbers 1 and 2. Then the five tuple is five numbers . Each one of these tuples can be converted into a sequence of 0's and 1's of the following form: 'i' number 1's followed by a 0 followed by 'a' number 1's followed by a 0 followed by 'b' number 1's followed by a 0 followed by either by 10 for M=L or 110 for M=R followed by a 0 followed by 'j' number 1's followed by a 0. Stringing together all of these five-tuples gives a very large unique binary number for each Turing Machine.

Now, given any Turing Machine converted into a binary number, assume that symbol number 1 is a blank and symbol number 2 is the letter a. Then let us consider all of the inputs to the Turing machine that consist only of strings of a's - the rest of the tape is blank. If the Turing Machine is started in state number 1, assume that the Turing Machine can run until it reaches state number 1 again, at which point it halts and stops running. Otherwise, it goes into an infinite loop, or it writes an infinitely long string of symbols onto the tape and never halts.

We can now define sets of numbers the following way: given Turing Machine i, the Recursively Enumerable set Wi is the set of all numbers x such that, given an input tape with x letter a's written on it, Turing Machine starts in state 0 and halts. If there are only blanks on the tape, then the number x is zero, so zero can be included in the set Wi also.

Therefore we have the following equivalences First order predicate calculus is equivalent to Turing Machines Turing Machines are equivalent to the General Recursive functions Turing Machines are equivalent to the Recursively Enumerable sets. Thus First order predicate calculus is equivalent to the Recursively Enumerable Sets.

[Effective procedures]

The notion of an effective procedure can be represented in any of the equivalent models. Alternatively, classes of effective computations have always been shown to be either some transformation of the class of Recursively Enumerable sets, or some proper subset of this class.

These proofs that different types of models for effective procedures are equivalent have led to the claim known as Church's Thesis: there is an essential set of effective procedures that can be represented in any number of equivalent ways. Any class of functions that is greater than this essential set must have some non-effective, non-cookbook step in their computation that cannot be reduced to a step-by-step deterministic method.

So logic is just one of the methods of performing effective reasoning. It is, though, the most common and useful of these methods. Originally formalized by Aristotle and further refined by Boole and Frege, logic is still today the most important tools for arriving at truth.

All computer programming languages are effective procedures Some employ some of these formalisms as an idiom. One language in particular, Prolog, is actually composed of logical statements, where a computation follows the process of logical deduction. Another language is Lisp, which styles itself more like the partial recursive functions.

[Logic has limits]

Logic is a very important tool for reasoning. But it has its limits. As a practical matter, there is nothing special in the operations of logic themselves. These operations form a concise cookbook of instructions. What makes reason special comes from the objects of contemplation - the concepts in the real world that get cast into the logical syllogism as axioms, and the meanings attached to the predicates and terms.

This leads to the first limit. Using only the real world as the source of the concepts we subject to logical manipulation limits the range of our logical conclusions.

To quote from Alfred North Whitehead (From Religion in the Making): "Any proof which commences with the consideration of the character of the actual world cannot rise above the actuality of this world. ... By considering the world we can find all the factors required by the total metaphysical situation; but we cannot discover anything not included in this totality of actual fact, and yet explanatory of it."

Therefore, since there are no observable infinities in the real world, whole branches of mathematics that rely on transfinite concepts cannot be derived from logical reasoning applied to real world objects alone. This may seem not to be very limiting, except that differential and integral calculus are in this category, and a large part of modern technology is built on a basis of reasoning based on calculus. So a large number of very practical applications are based on the postulates that certain ideas that are not observable are presumed to have some ideal existence.

A second shortcoming is that the relation to logic as a tool and the real world as described by logic is murky. One problem is that the representations of the real world that logic leads us to can distort our understanding of the world. For example, in recent generations, it has been fashionable to consider the real world as quantized - made up of discrete individual units - and therefore amenable to direct representation as logical constants. This seems to make this discreteness to be a real characteristic of nature, but it could just be an artifact of the way we represent reality.

A representation is partly a matter of convenience. The original use of logic was to express geometrical arguments, where, although the objects being manipulated - lines, points and circles - were discrete unities, they could not be decomposed in any fundamental level to quanta. But even here, these representations led to trouble - they gave rise to the paradoxes of Parmenides and Xeno, such as the Achilles and the hare. It took the creation of the calculus to resolve these paradoxes adequately.

A third shortcoming is that the map is not the territory. To be useable, logic can only abstract a limited set of aspects of reality at any given time. This means that there are an infinite number of aspects that must be left out. Giving the driving directions to the Empire State Building and back is not equivalent from actually going there. It only abstracts one dimension of the experience - that of the way of making the trip there. It does not discuss the view, for example. You could add that dimension, and others, but then the dimensionality becomes infinite and you cannot even begin to reason about it.

[Logic is hard to apply to a deep level]

Another limit to logic is that it is hard to rigorously apply. According to C. Hartley Rogers, humans seem to be limited to three alterations of quantifiers - a syllogism in which a proposition of the form ÂxÈyÂzÈv Pxyzv - seems to be beyond the capabilities of most people.

This incapacity to analyze logical syllogisms with too much complexity is another strong reason why humans deal with the emergent properties of things instead of breaking them down to their basic components. As an example, take a living organism. Presuming no supernatural aspects, an organism can be completely defined, for all intents and purposes, by the location and arrangement of its chemical components in space and time. This is what logical analysis does to any item of discourse - it breaks it down into its parts.

It is mostly practical for simple things. Logical theorems work on things like proteins, but not human beings, represented as assemblages of chemical molecules. With our ability to experience these emergent properties, what we perceive with our senses is more real than what we perceive with reason.

The final limit to logic is that logic is just a means of discourse. Logic does not impart meaning to what one writes, any more than a paintbrush imparts meaning to a canvas. The meaning is imparted in the identification and interpretation of the properties used in the logical syllogism. Logic itself does works for any interpretation given in some Herbrand Universe. The wonder is that such a simple tool works at all, and can be applied to so much. But it is just one tool, which is why we sometimes forget that there are others.

[Logic is not the only tool]

What alternatives are there to logic? The most obvious is action: pure action, without analysis. Although certain aspects of action at certain times can be expressed in logic, the totality of action cannot be expressed in logic. A second alternative is the recognition of gestalts - experiencing things as an unanalyzed totality. This is the passive side of action.

An further alternative to logic is storytelling and other art forms. The arts can be reduced to logic, but this would be an error. In logic a story this is anecdotal evidence - usually considered a problem. But in the artistic sense the anecdote reveals more than a logical analysis does - casting it into logic loses the essence.

Related to both action and storytelling is learning by example. This is a combination of both. The Taoists such as Chuaung Tsu knew that the essence of doing is not reducible to analysis. This fact is still understood in education. A chemistry textbook can never take the place of a chemistry lab experiment. Even a computer simulation runs the risk, through its limited representation of the events it is simulating, is missing important aspects that are not built into the simulation.

Despite its limitations, though, logic works wonders. For example, at the level of DNA and proteins, the discrete logical formalism works admirably for the understanding the basic building blocks of life. This bottom up approach can be used for many aspects of understanding, as long as we recognize its limits.

[RE sets and the Incompleteness Theorem]

To understand modern logic, it is necessary to understand the notion of the incompleteness of logic. We shall demonstrate this by discussing the logical paradoxes.

The most ancient of the logical paradoxes is the Liar Paradox. This is attributed to the philosopher Epimenides: 1. Epimenides is a Cretan 2. Epimenides says 'All Cretans are liars'

If we assume that a liar always lies, then if all Cretans are liars and Epimenides is a Cretan, we can conclude that Epimenides is a liar, thus this statement is a lie. Actually, then, the opposite is a true statement: 'some Cretans sometimes tell the truth.'

As originally presented, the pair of statements does not lead to a paradox. But we can sharpen the statement so that the paradox is obvious. Eubulides Paradox is the statement 'This statement is false.' Assume this statement true. Then it follows that the statement is false - a contradiction. On the other hand, assume the statement false. Then we can conclude that the statement being false is false - i.e., the statement is true. Also a contradiction.

There are a number of self-referential paradoxes that can be easily expressed in mathematics. A classic is the Russell Paradox. Consider sets of elements. Sets of sets are known as classes. Consider the class of all sets that do not contain themselves as a member. Call this class S. If S does not contain S as a member, then S is a member of S by definition. This is a contradiction. So S cannot be in the class S, which also leads to a contradiction.

[Axiom sets for religion and morality]

In the greatest proof in formal logic in the twentieth century, Kurt Godel was able to formalize Eubulides Paradox from the axioms of arithmetic and the formal notion of general recursive functions. As mentioned before, the equivalence of the general recursive functions and formal logic means that if you are able to assign a number x to every possible logical statement, there is a function fn(x) that returns 1 if and only if the logical statement x is true. This function is given the number n (called a Godel number) in some enumeration of all of the general recursive functions. This function fn must be sound - that is, if fn(x)=1, then logical statement x must be true. Godel first proved the Completeness theorem - the function fn exists and can be given a particular formal description (that is, the Godel number can be explicitly stated) such that if for every logical statement x that is true, fn(x)=1. It is complete for all true statements.

The function is partial recursive, though. That means, that there are some cases where fn(y) does not compute a value at all. One way of thinking about it is to consider the function fn represented as a Turing machine, TMm, where TMm, given input x on its tape will, if x is true, go back into state q0 with its tape only having a single symbol 1 on the tape and the rest blank. Therefore fn(x)=1 iff TMm(x)=1. But there will be some cases of logical statements y where TMm just goes on and on without arriving at back at state q0 ever - it does not halt. Of course, this does not happen all the time. There are many cases y where it halts with a tape containing only blanks, or containing some tape symbols other than a single 1. That is fn(y)!=1 iff TMm(y)!=1. In those cases y is provably false.

The main result was Godel's Incompleteness Theorem. The Incompleteness theorem constructed a logical statement number a, which states the following: if fn(a)=v then v!=1. That is to say, if function f(a) returns a value then that value is 1. Assume then, that logical statement a is true. By the Completeness theorem fn(a)=1. But that means a is false, because it states the opposite. Assume that statement a is false, then. We thus have two alternatives: either fn(a) does not return a value or if fn(a)=v it must be true that v!=1

Assume that if f(a) converges to the value v, and v!=1. Then again by the Completeness theorem, a must be false, or else fn(a) would have returned 1. But this means that statement a is false, contradicting the assumption, which was just the statement of a. Therefore, the statement must be false, but fn(a) does not return an answer. Therefore, the proof method expressed by function fn is incomplete.

Actually every such proof method must be incomplete, since there was no special characteristic of the procedure fn that made it fail in the case of statement a. Every formal method fy that is sound and complete must have a logical statement z such that fy(z) does not return a value. Turing later showed a related result: There is no such Turing Machine TMx such that for every value y, If Turing Machine y is presented with its own number y on the tape, then if TMy halts on y and returns a number - any number at all - TMx(y) halts and returns 1, otherwise TMx(y) halts and returns a value v!=1.

The proof just defines an Turing Machine TMy that calls TMx as a subroutine. Given some input z, TMy simulates TMx with a simulated input of z. Then, if the simulation of TMx halts with a 1 on the simulated tape then TMy goes into an infinite loop of some kind - it doesn't really matter as long as it doesn't ever reach the halt state. If the simulation of TMx halts with any other value than 1 on the simulated tape, TMy cleans up the tape, overwriting it with blanks and halts.

Now consider what happens if TMy is given the value y as an input. It passes the value y to the simulated TMx and watches what happens. If TMx halts with a 1 on the tape, then by the definition of TMx, this means that TMy with input y halts in some finite time. But by the definition of TMy, since TMx halted with a 1, TMy then goes into an infinite loop. So TMx should never have halted with a one in the first place. On the other hand, if the simulation of TMx halts on input y with something other than a one, then that means that TMy on input y goes into an infinite loop. But actually, since TMy observed the simulation of TMx halt with a value other than 1, it cleaned up the tape and halted. So TMx could not have returned a value other than 1 and be correct.

Therefore, in this case, TMx must not halt. This is certainly permissible. Functions such as TMx can be constructed such that for all z where TMz(z) halts in some finite time the function TMx will halt with a 1 on its tape and for many cases where TMz(z) never halts, then TMx can halt with a value not equal to 1. But there must be an infinite number of cases where TMz(z) never halts and TMx(z) never halts either. At the very least, they are the infinite number of TMy functions that have one of the infinitely possible ways of simulating TMx and do the opposite.

A further result related to the Incompleteness Theorem is the Recursion theorem. The Recursion Theorem in its most basic form is a technical fixed point theorem that says that for any way of defining the family of effective procedures such as logic, general recursive functions, Turing Machines or any other method, there exists some function fx in that formalism that, if presented with its own representation in that formalism - that is, its own number x - it will recognize itself and return that value x. That is, it is a fixed point, where fx(x)=x. This is a formal notion of what it means to be conscious of oneself on a very rudimentary level. Actually, some of the most successful computer viruses spread this way, by getting access to their own computer program and copying it on other computers.

A related result is that any property of finite objects is trivial by Rice's theorem. Let any property be given. If this property is true of all numbers or false for all numbers, then it is trivial. But assume instead that the property is it is black and white, black for some objects and white for others. Then there are two cases, one true, one false. Assume that in every case it is decidable that the property can be determined for that object. Now pick two objects, one white and the other black. Redefine the two cases so that the halting problem is solved if the property is decidable. This decidable property will then allow us to solve the halting problem.

The Incompleteness Theorem and its related results have profound implications for morality. They show that you cannot logically prove all true moral statements, especially moral problems that involve consciousness as an important feature or contain self referential moral statements related to the generational tradeoff that was presented in an earlier chapter.

For example, let us assume that someone there is a group that claims that using their holy book you can rationally decide what is the maximum state of happiness attainable for each person. Well, perhaps in an ideal world, such a methodology could be applied to everyone, and everyone would achieve their optimal level of happiness. The attainment of these levels may require an allocation of scarce resources that may result in some tradeoffs. For example, it is probably unlikely that everyone can live in Hawaii who wants to, or if they do, it would lose the character that would make it so desirable.

But this perfect philosophy is incomplete in the face of evil. Assume that there exists a person x whose goal is to prevent the maximum happiness of some other person y. If the existence of this perfect philosophy is generally known, then person x can avail themselves of that philosophy, determine what would make some person y happiest, and if it involves some scarce resource, move to acquire that resource before person y, thereby thwarting them. This would then make that ultimate state unattainable. Now person y, anticipating such a setback, could also use the same philosophy to determine the nature of their ultimate theoretical happiness and what would endanger its realization. They then have the possibility of readjusting their goals to something more attainable. But person x could also anticipate this readjustment and account for it also. This kind of infinite regress is what makes the Halting Problem unsolvable.

Note that all that is required to end up in this untenable state is a philosophical system that claims to be effectively deterministic and universal, but that there are finite resources involved that put limits on the process, and that there is at least one actor who is trying to accomplish a negative. Since the second and third conditions are quite common in real life, they preclude many cases where people make claims to universal certainty in their philosophical systems. A consequence of the Incompleteness theorem and its corollaries is that all sacred books cannot be used as axiom systems and be considered divinely perfect at the same time. This means that in general one cannot work from a holy book as if it were an axiom set. There will always be cases where the knowledge in this book is incomplete or incorrect (that is, unsound).

This fallibility means that all sacred books are written by fallible and incomplete men. They can point to or indicate a perfect or divine nature, but they can never embody it.

There is no perfect and objective measure of well-being, since that would solve the halting problem. It also means that we take our axioms on faith - they can not be completely proved. This objection can be removed by claiming a metalogic. This metalogic can be a methodology from which we build our own system. This system must be open to error and interpretation and in a continual state of development. It is also true that there is no foundation we all agree on - we all start from an arbitrary place. We hypothesize an axiom set and then learn from our mistakes and refine our axioms. But there is no single absolute morality - each moral code is relative to our own situation.

[Learning and immune sets]

This means that a moral code is incomplete error-prone and subject to revision. This naturally leads to taking a look at the formal properties of learning. I will discuss these properties through the use of Kolmogorov complexity.

Learning can be formalism by considering the problem of learning to be one of being presented with a set of data and trying to find a theory that fits that data. Although most interesting problems are open ended - the number of potential cases to learn is unlimited - the information presented at any given time is always finite. That is, it can be written down on a sheet of paper. Let us put this data into a computer readable form. Then the expression of the data set is reduced to bits and bytes. It now becomes a finite binary sequence - a list of zeros and ones - of some finite length.

It is of most interest that the theory learned to explain the data be effective. It should be capable of being expressed as a logical expression, or better yet, as a Turing Machine that computes the data set. Since we are not concerning ourselves with function pairs, but instead of a finite binary sequence of digits that is the computer representation of the data set, we can consider that an acceptable theory to explain the data is a Turing machine that, beginning with an empty tape, runs until it halts and leaves the sequence representing the data set on the tape when it halts. We will also reduce this Turing Machine to a state table represented in a computer memory as another sequence of zeros and ones.

One of the most basic rules of learning is Ockham's razor - when deciding between two theories that each present the data, choose the simplest. Leaving alone the question of the computing time for two competing Turing Machines, we will simply choose the Turing Machine with the shortest state table as preferred over any other Turing Machine whose state table is longer than this one.

Therefore, in general, learning problems reduce to the search for the smallest Turing Machine that computes the data set that you wish to learn. When systematically searching through a space of models, such as Turing Machines, it is obvious that if a smaller model fits, it is easier to find. There is not necessarily a correlation between the size of the data set and the model that explains it. This means that certain data represents phenomena that is easy to learn - that is, has a small Turing Machine to compute the data set - due to inherent regularities in the data set. For example, a sequence of a million ones is very easy to learn. All it requires is a Turing Machine that, starting with a blank tape, writes the number one million as a counter on the tape, and then performs the following loop: go to the end of the tape and write a one, then come back and subtract one from the counter until you reach zero. At that point, erase the zero and halt, leaving a million ones on the tape. Similarly, the first million digits of pi can be computed by a Turing Machine that is not much more complicated. Instead of writing a one at each sep, what it writes is the next digit of pi.

Per Martin-Lof showed that if something is hard to learn, it is a data set whose digit sequence passes all generally known tests of randomness. In effect, it is a random number. A random number is defined to be a data sequence where the shortest representation of a Turing Machine that computes the number is as long as itself, up to a constant. Note that every data sequence has a corresponding Turing Machine that computes it. This Turing machine just has the digit sequence written in its state table. All the Turing machine does is, for an arbitrary digit sequence of a hundred zero and one digits, has one hundred states. At state 50, the Turing Machine writes the fiftieth digit of the sequence, moves one to the right and goes into state fifty one. At state one hundred the Turing Machine simply halts. So every data set has a theory to explain it, although many data sets are random. That is, the shortest expression of the data is simply the listing of the data in another form: as a state table for a Turing Machine.

It is easy to see that virtually all data sets are random sequences. For every data sequence length n, there are 2^n data sequences of that length. For example, for length 3, there are 2*2*2 or 8 data sequences of length three. There are about as many Turing Machines of this length or shorter. Actually, the sum is 2^n - 1. But of these shorter Turing Machines, most of them are copies of other Turing Machines of the same length or shorter. These redundant Turing Machines just renumber the states of the copies or contain dummy states that don't do anything. So most data sequences are random.

Learning at its simplest is a brute force generate and test method that tries to find the simplest model to fit the data. That is how evolution 'learned' how to live in more and more inhospitable environments, such as on land, in the air or in the arctic. Although there sometimes are techniques, such as drawing analogies to other phenomena that can be used to shorten learning times, in those cases where we are learning something truly novel, we have to fall back on this king on exhaustive search to discover something truly new.

Kolmogorov complexity is the study of the complexity of data sets in terms of the shortest algorithm to represent them. It seems somewhat artificial to equate learning with finding the shortest Turing Machine to compute a data set, but it actually is a reasonable paradigm for learning. Church's Thesis, the belief that all effective methods have essentially the same computational power means that we can use Turing Machines as a method of representing any other kind of effective procedure. That is, we are using Turing Machines as a way of representing the concepts used some other arbitrary formalism. Of course, it is also possible to use a formalism that simply names the data set but does not try to compute it, but this type of formalism is of limited usefulness. Learning by and large provides a method of effectively working with the information at hand and this usually implies some sort of procedure. So the representation of learning as finding a Turing Machine to compute the data set is an adequate, if abstract way of thinking about learning.

One surprising result that falls out of Kolmogorov Complexity is that for any given set of axioms there are only a finite number of cases for which it is possible to prove that we have the shortest explanation for the data. The proof goes like this: assume that there are an infinite number of proofs of the form 'TMx is the shortest representation for data set y'. Since formal logic has a set of simple rules, it is always possible for any axiom set to create a machine that prints out all of the true theorems for those axioms. Create a machine from this one that only prints out the true theorems in that form. From this machine create a machine TMa(z) that, given a number z, enumerates the z'th case of TMx is the shortest representation for data set y' and then runs a simulation of TMx, producing value y: TMa(z)=y. Since not all x's and y's are included in this listing, the values of x and y increase faster than their enumeration index z. So eventually, every value x is smaller than the fixed value a for TMa plus the index z, and so TMa(z) is a shorter representation for data set y, than TMx itself. If we have the shortest explanation for the data, then this explanation is irreducibly complex. Thus irreducible complexity is provable only for a finite number of concepts.

Because of this argument, the set of pairs where TMx is the shortest representation for data set y' cannot be computed by any effective procedure. This is known as an immune set: an infinite set such that every attempted means to calculate the set effectively will be in error in an infinite number of places. In a profound sense, this set is unknowable.

Although this set is immune, and thus incomputable, it is easily learnable in the limit. It is done in a very simple way that mirrors the process of scientific discovery by an ever expanding group of scientists. Given some data set that we wish to be explained, we simply farm out more and more possible theories to explain the data to an ever larger group of people, each of which tries the theory assigned to them until they get an answer. Some in this group are assigned a theory, represented as a Turing Machine that does not halt. They will wait futilely for an answer. Others will arrive at the wrong output for their theory and must try another. But for those that match, we simply use Ockham's razor and choose the simplest theory. We just don't know when we'll have found the best if it ever happens that one of our candidate models, shorter than the current best, never halts.

These theoretical results discuss how to find a theory that expresses a finite data set. But learning is mostly prized for its ability to generalize. When the finite data set is only the list of phenomena we have seen so far out of an unlimited and unbounded number of cases, the best theory is the one that successfully predicts the specifics of any future phenomena based on what we have seen already. It is not so much the accuracy of the model in summarizing the past as how it prepares us going forward.

It is generally known that we learn from our mistakes. Too often, our success in applying what we learn does not prepare us better, only give us a false complacency that dissipates when we come across something completely original. Seen in this way, learning is like managing to walk down a dark hallway. We bounce off the walls when we learn from our mistakes, and this teaches us to aim for the center as our mistakes become rarer and rarer. In a universe whose ultimate reality is immune to any final theory of everything, we may find that it is not possible to come up with a completely satisfactory theory to predict a particular set of phenomena. The best that we may do is to come up with some procedure to identify and avoid our mistakes. In that case the best that we can do, to learn proper behavior is to delimit the bounds of acceptable, by setting the boundaries that mark off the space of the unacceptable. This gives a deeper meaning to Justice Potter Stewart's oft quoted dictum that he may not be able to define pornography, but 'I know it when I see it.'

[FOOTNOTE: {Movie Day at the Supreme Court "I Know It When I See It" by Judith Silver, Esq.} In 1964, Justice Potter Stewart tried to explain "hard-core" pornography, (legally synonymous with obscenity), "It is possible to read the Court's opinion in Roth v. United States and Alberts v. California, 354 U.S: 476, in a variety of ways. In saying this, I imply no criticism of the Court, which in those cases was faced with the task of trying to define what may be indefinable. I have reached the conclusion, which I think is confirmed at least by negative implications in the Court's decisions since Roth and Alberts, that under the First and Fourteenth Amendments criminal laws in this area are constitutionally limited to hard-core pornography. I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description, and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that." {Jacobellis v. Ohio, 378 U.S. 184, 197 (1964)} END FOOTNOTE]

This may be the essence of learning proper behavior in society. Because the limits of human behavior are constantly expanding as the possibilities available to us are expanding, the best we can expect is to be able to quantify what is unacceptable. That is, to know what is good and beautiful may not be possible - the best we can hope for is to have seen enough pornography to know it when we see it again. As for the rest, it is acceptable if it does not cross our boundaries.

Besides learning how to predict behavior, we are also learning in the presence of noise. It is seldom that the things we are called upon to learn are expressed as a perfect set of data tables, otherwise probability and statistics would not have become such important tools. Even the most fundamental phenomena of physics, chemistry and astronomy have to deal with noise in the data.

The existence of noise and randomness presents us with the question: can reality be reasonably expressed as a set of integers? After all, we have reduced formal logic and computation down to numbers, which, although unlimited in number, are each finitely bounded and perfect. There does not seem to be any justification for this. Yet it does seem to work. We have learned how to function effectively in the world with the help of computers that convert the world down to bits and bytes, expressing reality in practical ways that allow us to some sense of mastery.

Turing's original paper on the Halting Problem addresses this problem. As he says 'the justification lies in the fact that the human memory is necessarily limited. That is to say, the functioning of the human mind requires the representation and manipulation of information in terms of a finite number of elements, each of which can be in one of a number of finite states: 'We may compare a man in the process of computing a real number to a machine which is only capable of a finite number of conditions q1, q2, ..., qR.' This means that even if the quantity is in some sense unlimited in the degree of its specification, for the purposes of human computation this expression must be truncated at some finite place and the before human cognition can begin to work on it.

[Overfitting is inevitable]

Although we can use numbers, with their deterministic character to represent a world built out of random processes, randomness does incorporate itself in our attempts to learn about the world. This is the phenomenon of overfitting. If data is presented to a learning system and that data contains a certain amount of randomness, then there is a tendency for the learning system to see some of that randomness as a pattern and incorporate that pattern into the model. This happens for any kind of learning system, not just humans. Computers will overfit noise into their models, and so will animals, also. It has been observed fort laboratory mice that have been put in Skinner Boxes where, they are given a reward for pushing a lever. If there is a random reinforcement, that is, the reward does not come every time they push the lever but randomly for every second, third or fourth time or so, they will incorporate random behaviors, such as turning round to the left before pushing the lever or other such motions. This is like the stories of baseball players wearing their 'lucky socks', or the mistaken belief of basketball players that they sometimes develop 'hot hands'. Some methods of prediction are nothing but overfitting, with no meaningful patterns, such as the field of astrology, which claims to have empirical models of future behavior built assuming the influence of distant planets on the person.

Although instances of overfitting can be refuted, they are impossible to avoid. One problem is that humans always look for explanations and they will be satisfied with bogus ones, if there is nothing better. A case in point was the attribution of stomach ulcers to diet and emotional tension. This was not refuted on its lack of merit alone. It took the substitution of an alternative model of infection by h. pylori before the previous model was finally discredited. We seem to prefer an inadequate explanation to none at all.

[The world is immune]

But this is what may be the ultimate nature of reality. The universe itself is just as likely to be expressible as an immune set as it is to be expressed as an effective procedure. It does appear that nature is amenable to our explanations, though. Newton's Laws of Motion and the laws of chemistry, for example, have given us great control over the world around us. But the immune nature of the universe may present itself as an onion with an infinite number of layers. Once we peel back one layer and express it in the closed form of a natural law, this merely reveals a new layer, a fine structure that was not fully explained by this model. In that way, Newton's laws were supplanted by Einstein's, and in the future, a new model will correct it.

We are probably not even starting from the outer layer to begin with. Outside of all the layers of explanation, there is a new level of generality that extends our already known theories to larger domains. But this may lead to an even larger generality. Since the world is immune, there is likely no Theory of Everything that will last more than a generation. The fact that every effective model is infinitely wrong gives us eventually a new body of exceptions that need explaining that provide a new layer of generality, or a new level of subtlety to our calculations.

Some evidence for the indeterminacy of the universe comes from the nondeterministic processes of quantum theory. Although this is usually though of as a completely random process, it may be the case that the universe is deterministic but immune. This would make these random events random in the sense of Kolmogorov Complexity, in that their expression cannot be summarized in a neat theory. They are not so much random events, but instead events incapable of being summarized.

In an immune universe, moral decision making is also immune. This happens in one of two ways. First, as we learn more and more about the world, our theories get more and more complicated and so do our capabilities. This means that we shall never come to an ultimate answer in determining what is the good. What looks to one generation as an obvious determination that choice A is preferred over Choice B, to an later generation who can see the world with a deeper level of sophistication, this may be true only for the most obvious cases. This is not to say that future generations will overturn what we know now to be right and wrong any more than they will say that every action has an equal and opposite reaction is false. But they will be able to point out that in certain cases, there is a subtlety to the situation that means that what looked like an obvious choice of one thing over another does not really apply, because in this case a previously undiscovered law applies.

Besides the immunity of the outside world to explanation as a single closed set of laws, we ourselves are probably immune also. This means that there is an underlying indeterminancy to our actions, This indeterminancy can arise even if we were completely deterministic machines, expressible as recursively enumerable functions, as long as we are able to tap into the indeterminancy of an immune universe. This indeterminacy is sometimes mistakenly considered to be the basis of free will, but is nothing more than an underlying unpredictability. In point of fact, we tend as a general rule to strive to eliminate these random processes from our decision making. They tend to be a hindrance more than they are an essential part of how we think and act.


This indeterminancy can lead to a mystical view of the universe, even in the fact of the efficacy of rational processes. Mysticism and logicism are not incompatible - it is perfectly reasonable to be a rational mystic.

The notion of mysticism comes from a word meaning hidden. There are two ways mysticism enters. The first way is via consciousness - the self-awareness expressed in concepts such as Godel Incompleteness Theorem and the Recursion Theorem. The second way is via the inherent complexity of an immune universe. The essential capability of the universe to achieve self-awareness allows for the ability of creatures to arise that will learn and grow in understanding, which outstrips the ability to understand, since this very understanding can result in future learning. The second level of mysticism leads to an infinite depth of levels of complexity that require an infinite level of understanding that reveals more and more of the complexity of the universe, but always leaves an infinite depth of still hidden complexity waiting to be revealed.