Axiom of Choice, the Zorn’s Lemma, and the Foundations of Mathematics: Set Theory Part I

Mathematics has long been built on the sturdy foundation of intuition. From counting pebbles to solving complex equations, mathematicians have relied on their inner compass to navigate the abstract landscapes of numbers and shapes. But as the 19th century dawned, something intriguing and unsettling began to occur – mathematics, the bedrock of human knowledge, was on the brink of a seismic shift.

This piece goes into how this happened, what changed - and its consequences. Though I might not know a lot of mathematics, I firmly believe that this is one of the most interesting (while being accessible) parts of mathematics.

All of this begins with arguably the most famous paradox in mathematics, logic, and the like - the Russell’s paradox.

Russell’s Paradox

Let \(A\) be the set of all sets that do not contain themselves. Does \(A\) contain itself?

\[ A = \{x : x \not\in x\} \text{. Is } A \in A? \]

Well, if \(A \in A\) then by the definition of \(A\), we get \(A \not \in A\). But if \(A \not\in A\) then \(A \in A\). Poof.

This called for a need for formalism - intuition was just not enough. Here came Axiomatic Set Theory - one of the most famous systems of Axiomatic Set Theory is the Zermelo Fraenkel set theory. (Do click the link and read the axioms - they are fun! And everything makes sense at the end.)

Well, how does the Russell’s paradox get resolved? One of the axioms deals with this head-on.

Axiom Schema of Specification / Axiom of Restricted Comprehension

Given a parent set \(z\) we can construct a subset \(S \subseteq z\) such that

\[ S = \{x \in z : \varphi(x)\} \]

Although this defines how we can create subsets from a parent set, it DOESN’T talk about creating sets of elements that follow a formula. Infact, we will need a sort of \(\text{``set of all sets: X"}\) to talk about creating sets out of thin air.

However this is NOT a set (In \(\text{ZF}\)) . How do we prove this? Well, if \(X\) is the set of all sets, then \(P(X) \in X\) which means there exists an injective function from \(f:P(X)\to X\). But we know by Theorem 3 that \(P(X) >_{c} X\). This is a Class.

Axiom of Choice

Let \(I\) be an index set with a family of sets indexed on this index set \(\{A_i\}_{i \in I}\). Then there exists a choice function \(f:I \to A = \cup_{i \in I} A_i\) such that \(f(i) \in A_i \ \forall i \in I\).

This makes “intuitive” sense and is something that “should” be true. However, this is an independent statement in \(\text{ZF}\), which is to say - we CANNOT prove this by ONLY using the axioms of \(\text{ZF}\). There is an equivalent formulation of this axiom (which is used more often)

Zorn’s Lemma

If \(P\) is a nonempty partially ordered set such that every chain has an upper bound in \(P\) then \(P\) has a maximal element.

There are quite a few words that need to be unpacked.

Definitions

A binary relation \(R\) on a set \(P\) is a subset \(R\) of \(P \times P\), \((x,y) \in R\) and is often expressed as \(xRy\).
A binary relation \(R\) is a partial order if
1. Reflexivity: \(xRx\), Transitivity: \(xRy, yRz \implies xRz\).
2. Antisymmetry: \(xRy, yRx \implies xRx\).
A partial order is called a linear order if any two elements are comparable.
Let \((P, \leq)\) be a poset (partially ordered set), a subset \(C \subseteq P\) is called a chain if \(C\) has a linear order.
Let \(A \subseteq P\), an upper bound for \(A\) is an element \(x \in P\) such that \(y \leq x, \ \forall y \in A\).
An \(x \in P\) is called a maximal element if \(\not\exists \ y\neq x : x \leq y\).
A linearly ordered set is called well ordered if every nonempty subset has a first element.

Zorn and Axiom of Choice are the SAME!

We now conclude by proving that \(\text{Zorn}\iff\text{AoC}\).

Theorem 1 (\(\text{Zorn} \implies \text{AoC}\)) \[ \text{Zorn's Lemma} \ \implies \text{Axiom of Choice} \]

(This proof was communicated to me by PSC. Sir at ISIK)

Consider the following poset

\[ P = \{(J, f) : J \subseteq I, f : J \to A\text{ such that }f(j) \in A_j, \forall j \in J\} \]

with the partial order:

\[ (J_1, f_1) \preceq (J_2, f_2) \text{ if } J_1 \subseteq J_2 \text{ and } f_2\Bigg|_{J_1} = f_1 \text{ ie. } f_2(x) = f_1(x), \forall x \in J_1 \]

then any chain \(C = \{(J_\alpha, f_\alpha) : \alpha \in Y\}\), has the trivial upperbound \((\cup J_\alpha, f) : f\Bigg |_{\alpha} = f_\alpha, \forall \alpha \in Y\). By Zorn’s Lemma, there exists a maximal element \(f:J \to A\), if \(J = I\) , we are done. Otherwise, pick \(j \in I \setminus J\) since \(A_j\) is nonempty, pick \(x \in A_j\) and consider \(f^{*}:J \cup \{j\} \to A\) such that \(f(j) = x, f^{*}\Bigg|_{J} = f\), which contradicts maximality of \(f\). \(\blacksquare\)

Theorem 2 (\(\text{AoC} \implies \text{Zorn}\)) \[ \text{Axiom of Choice} \implies \text{Zorn's Lemma} \]

This is definitely trickier to prove compared to the previous implication. I use transfinite induction and ordinals here (but if you don’t know about them I will probably make a post on them. Till then, check out Napkin pg. 801.)

Let \((P, \leq)\) be the poset, where every chain has an upperbound. Assume there is no maximal element. Let \(x_0\) be any element in \(P\). By assumption \(\exists x_1\) s.t. \(x_0 \leq x_1\). But again, by assumption, \(\exists x_2\) s.t. \(x_0 \leq x_1 \leq x_2\). Intuitively we create a chain \(x_0 \leq x_1 \leq x_2 \ \cdots\) which has an upper bound \(x_\omega\), but since this isn’t the maximal element again - I get \(x_0 \leq x_1 \leq x_2 \leq \cdots \leq x_\omega \leq x_{\omega+1} \leq \cdots\). We must “eventually” run out.

Now to formalize this: for any element \(x \in P\) consider the set \(S_x = \{y \in P : x \lneq y\}\). Clearly all \(S_x\) are nonempty, because otherwise there is a maximal element. By AoC, there exists a choice function \(f:P \to \cup_{x \in P} S_x\), such that \(f(x) \in S_x \ \forall x \in P\). Using transfinite induction:

Step 1: \(x_0 \in P\)

Take \(x_1 = f(x_0)\). Then \(x_0 \leq x_1\). Similarly, obtain \(C_{i < \omega} = x_0 \leq x_1 \leq \cdots\).
Step 2: Let \(\lambda\) be a limit ordinal. Defining \(x_\lambda = u(C_{i < \lambda})\) can be used to extend the chain.

Consider the upper bound of this chain \(x = u(C_{i < \lambda})\). Clearly, \(x\) cannot be in \(C\), because otherwise, \(\exists k\) s.t. \(C \ni x_k \geq x = x_{k-1}\) which forces \(x = x_k = f(x)\) which is not possible by the definition of \(f\). So, define \(x_{\lambda} = x\).

Then - keep doing this. So, you end up with a chain \(C = x_0 \leq x_1 \leq \cdots \leq x_\omega \leq x_{\omega + 1} \leq \cdots \leq x_{\omega + \omega} \leq \cdots\).

To finish we need magic: Hartogs’s theorem.

Hartogs’s Theorem

Given any set \(X\), there exists an ordinal \(\alpha\) such that there is no injection \(f:\alpha \to X\).

Consider the class of ordinals \(\alpha = \{\beta \in \text{Ord} \ | \ \exists f:\beta \hookrightarrow X\}\). Using the power set axiom twice - we get that \(\mathcal{P}(X \times X)\) is a set. Consider the class \(W\) of all reflexive well orderings of subsets of \(X\), since this is a definable subclass of \(\mathcal{P}(X \times X)\), it is a set by Schema of Specification.

The class of all order types of well-orderings in \(W\) is a set by the axiom schema of replacement: \((\operatorname{Domain}(w), w) \cong (\beta, \leq)\) where \(w\) is a well-ordering on \(\operatorname{Domain}(w)\) which gets mapped to the order type \((\beta, \leq)\).

Now we can see that \(W\), a set, is precisely \(\alpha\). Since every well-ordering of a subset of \(X\) defines an injective function from \(\beta \hookrightarrow X\) where \(\beta\) is some ordinal. And the other side is true as well. To finish: \(\alpha\) is a transitive set of ordinals, which means it is an ordinal itself. Ofcourse, then, \(\alpha\) is the first ordinal with no injection \(\blacksquare\).

So, the chain has to eventually end. We are done.

Footnotes

Theorem 3 (Cantor’s Diagonal Argument) \[X <_{c} P(X) \ \forall \ X\]

Assume for the sake of contradiction \(P(X) =_{c} X\) (where \(=_{c}\) means that there exists a bijection between the sets). Thus, there is a surjection \(f:X \to P(X)\). Consider

\[ A = \{x \in X: x \not\in f(x)\} \]

But since \(f\) is a surjection, \(\exists x_0 \ f(x_0) = A\) then

\[ x_0 \in A \iff x_0 \not\in A. \]

a contradiction. (This is pretty much the same idea as the famous proof of real numbers being uncountable.)