This will be a very non-traditional introduction to the ideas behind category theory. It will essentially be a slice through model theory (presented in a more programmer-friendly manner) with an unusual organization. Of course the end result will be ***SPOILER ALERT*** it was category theory all along. A secret decoder ring will be provided at the end. This approach is inspired by the notion of an internal logic/language and by Vaughn Pratt’s paper The Yoneda Lemma Without Category Theory.
I want to be very clear, though. This is not meant to be an analogy or an example or a guide for intuition. This is category theory. It is simply presented in a different manner.
Dan Doel pointed me at some draft lecture notes by Mike Shulman that seem very much in the spirit of this blog post (albeit aimed at an audience familiar with category theory): Categorical Logic from a Categorical Point of View. A theory in my sense corresponds to a category presentation (Definition 1.7.1) as defined by these lecture notes. Its oft-mentioned Appendix A will also look very familiar.
The first concept we’ll need is that of a theory. If you’ve ever implemented an interpreter for even the simplest language, then most of what follows modulo some terminological differences should be both familiar and very basic. If you are familiar with algebraic semantics, then that is exactly what is happening here only restricting to unary (but multi-sorted) algebraic theories.
For us, a theory, |\mathcal{T}|, is a collection of sorts, a collection of (unary) function symbols^{1}, and a collection of equations. Each function symbol has an input sort and an output sort which we’ll call the source and target of the function symbol. We’ll write |\mathtt{f} : A \to B| to say that |\mathtt{f}| is a function symbol with source |A| and target |B|. We define |\mathrm{src}(\mathtt{f}) \equiv A| and |\mathrm{tgt}(\mathtt{f}) \equiv B|. Sorts and function symbols are just symbols. Something is a sort if it is in the collection of sorts. Nothing else is required. A function symbol is not a function, it’s just a, possibly structured, name. Later, we’ll map those names to functions, but the same name may be mapped to different functions. In programming terms, a theory defines an interface or signature. We’ll write |\mathbf{sort}(\mathcal{T})| for the collection of sorts of |\mathcal{T}| and |\mathbf{fun}(\mathcal{T})| for the collection of function symbols.
A (raw) term in a theory is either a variable labelled by a sort, |\mathbf{x}_A|, or it’s a function symbol applied to a term, |\mathtt{f}(t)|, such that the sort of the term |t| matches the source of |\mathtt{f}|. The sort or target of a term is the sort of the variable if it’s a variable or the target of the outermost function symbol. The source of a term is the sort of the innermost variable. In fact, all terms are just sequences of function symbol applications to a variable, so there will always be exactly one variable. All this is to say the expressions need to be “well-typed” in the obvious way. Given a theory with two function symbols |\mathtt{f} : A \to B| and |\mathtt{g} : B \to A|, |\mathbf{x}_A|, |\mathbf{x}_B| , |\mathtt{f}(\mathbf{x}_A)|, and |\mathtt{f}(\mathtt{g}(\mathtt{f}(\mathbf{x}_A)))| are all examples of terms. |\mathtt{f}(\mathbf{x}_B)| and |\mathtt{f}(\mathtt{f}(\mathbf{x}_A))| are not terms because they are not “well-typed”, and |\mathtt{f}| by itself is not a term simply because it doesn’t match the syntax. Using Haskell syntax, we can define a data type representing this syntax if we ignore the sorting:
data Term = Var Sort | Apply FunctionSymbol Term
Using GADTs, we could capture the sorting constraints as well:
data Term (s :: Sort) (t :: Sort) where
Var :: Term t t
Apply :: FunctionSymbol x t -> Term s x -> Term s t
An important operation on terms is substitution. Given a term |t_1| with source |A| and a term |t_2| with target |A| we define the substitution of |t_2| into |t_1|, written |t_1[\mathbf{x}_A \mapsto t_2]|, as:
If |t_1 = \mathbf{x}_A| then |\mathbf{x}_A[\mathbf{x}_A \mapsto t_2] \equiv t_2|.
If |t_1 = \mathtt{f}(t)| then |\mathtt{f}(t)[\mathbf{x}_A \mapsto t_2] \equiv \mathtt{f}(t[\mathbf{x}_A \mapsto t_2])|.
Using the theory from before, we have:
\[\mathtt{f}(\mathbf{x}_A)[\mathbf{x}_A \mapsto \mathtt{g}(\mathbf{x}_B)] = \mathtt{f}(\mathtt{g}(\mathbf{x}_B))\]
As a shorthand, for arbitrary terms |t_1| and |t_2|, |t_1(t_2)| will mean |t_1[\mathbf{x}_{\mathrm{src}(t_1)} \mapsto t_2]|.
Finally, equations^{2}. An equation is a pair of terms with equal source and target, for example, |\langle \mathtt{f}(\mathtt{g}(\mathbf{x}_B)), \mathbf{x}_B \rangle|. The idea is that we want to identify these two terms. To do this we quotient the set of terms by the congruence generated by these pairs, i.e. by the reflexive-, symmetric-, transitive-closure of the relation generated by the equations which further satisfies “if |s_1 \approx t_1| and |s_2 \approx t_2| then |s_1(s_2) \approx t_1(t_2)|”. From now on, by “terms” I’ll mean this quotient with “raw terms” referring to the unquotiented version. This means that when we say “|\mathtt{f}(\mathtt{g}(\mathbf{x}_B)) = \mathbf{x}_B|”, we really mean the two terms are congruent with respect to the congruence generated by the equations. We’ll write |\mathcal{T}(A, B)| for the collection of terms, in this sense, with source |A| and target |B|. To make things look a little bit more normal, I’ll write |s \approx t| as a synonym for |\langle s, t \rangle| when the intent is that the pair represents a given equation.
Expanding the theory from before, we get the theory of isomorphisms, |\mathcal{T}_{\cong}|, consisting of two sorts, |A| and |B|, two function symbols, |\mathtt{f}| and |\mathtt{g}|, and two equations |\mathtt{f}(\mathtt{g}(\mathbf{x}_B)) \approx \mathbf{x}_B| and |\mathtt{g}(\mathtt{f}(\mathbf{x}_A)) \approx \mathbf{x}_A|. The equations lead to equalities like |\mathtt{f}(\mathtt{g}(\mathtt{f}(\mathbf{x}_A))) = \mathtt{f}(\mathbf{x}_A)|. In fact, it doesn’t take much work to show that this theory only has four distinct terms: |\mathbf{x}_A|, |\mathbf{x}_B|, |\mathtt{f}(\mathbf{x}_A)|, and |\mathtt{g}(\mathbf{x}_B)|.
In traditional model theory or universal algebra we tend to focus on multi-ary operations, i.e. function symbols that can take multiple inputs. By restricting ourselves to only unary function symbols, we expose a duality. For every theory |\mathcal{T}|, we have the opposite theory, |\mathcal{T}^{op}| defined by using the same sorts and function symbols but swapping the source and target of the function symbols which also requires rewriting the terms in the equations. The rewriting on terms is the obvious thing, e.g. if |\mathtt{f} : A \to B|, |\mathtt{g} : B \to C|, and |\mathtt{h} : C \to D|, then the term in |\mathcal{T}|, |\mathtt{h}(\mathtt{g}(\mathtt{f}(\mathbf{x}_A)))| would become the term |\mathtt{f}(\mathtt{g}(\mathtt{h}(\mathbf{x}_D)))| in |\mathcal{T}^{op}|. From this it should be clear that |(\mathcal{T}^{op})^{op} = \mathcal{T}|.
Given two theories |\mathcal{T}_1| and |\mathcal{T}_2| we can form a new theory |\mathcal{T}_1 \times \mathcal{T}_2| called the product theory of |\mathcal{T}_1| and |\mathcal{T}_2|. The sorts of this theory are pairs of sorts from |\mathcal{T}_1| and |\mathcal{T}_2|. The collection of function symbols is the disjoint union |\mathbf{fun}(\mathcal{T}_1) \times \mathbf{sort}(\mathcal{T}_2) + \mathbf{sort}(\mathcal{T}_1) \times \mathbf{fun}(\mathcal{T}_2)|. A disjoint union is like Haskell’s Either
type. Here we’ll write |\mathtt{inl}| and |\mathtt{inr}| for the left and right injections respectively. |\mathtt{inl}| takes a function symbol from |\mathcal{T}_1| and a sort from |\mathcal{T}_2| and produces a function symbol of |\mathcal{T}_1 \times \mathcal{T}_2| and similarly for |\mathtt{inr}|. If |\mathtt{f} : A \to B| in |\mathcal{T}_1| and |C| is a sort of |\mathcal{T}_2|, then |\mathtt{inl}(f, C) : (A, C) \to (B, C)| and similarly for |\mathtt{inr}|.
The collection of equations for |\mathcal{T}_1 \times \mathcal{T}_2| consists of the following:
The above is probably unreadable. If you work through it, you can show that every term of |\mathcal{T}_1 \times \mathcal{T}_2| is equivalent to a pair of terms |(t_1, t_2)| where |t_1| is a term in |\mathcal{T}_1| and |t_2| is a term in |\mathcal{T}_2|. Using this equivalence, the first bullet is seen to be saying that if |l = r| in |\mathcal{T}_1| and |C| is a sort in |\mathcal{T}_2| then |(l, \mathbf{x}_C) = (r, \mathbf{x}_C)| in |\mathcal{T}_1 \times \mathcal{T}_2|. The second is similar. The third then states
\[(t_1, \mathbf{x}_C)((\mathbf{x}_A, t_2)(\mathbf{x}_{(A, C)})) = (t_1, t_2)(\mathbf{x}_{(A, C)}) = (\mathbf{x}_A, t_2)((t_1, \mathbf{x}_C)(\mathbf{x}_{(A, C)}))\]
To establish the equivalence between terms of |\mathcal{T}_1 \times \mathcal{T}_2| and pairs of terms from |\mathcal{T}_1| and |\mathcal{T}_2|, we use the third bullet to move all the |\mathtt{inl}|s outward at which point we’ll have a sequence of |\mathcal{T}_1| function symbols followed by a sequence of |\mathcal{T}_2| function symbols each corresponding to term.
The above might seem a bit round about. An alternative approach would be to define the function symbols of |\mathcal{T}_1 \times \mathcal{T}_2| to be all pairs of all the terms from |\mathcal{T}_1| and |\mathcal{T}_2|. The problem with this approach is that it leads to an explosion in the number of function symbols and equations required. In particular, it easily produces an infinitude of function symbols and equations even when provided with theories that only have a finite number of sorts, function symbols, and equations.
As a concrete and useful example, consider the theory |\mathcal{T}_\mathbb{N}| consisting of a single sort, |0|, a single function symbol, |\mathtt{s}|, and no equations. This theory has a term for each natural number, |n|, corresponding to |n| applications of |\mathtt{s}|. Now let’s articulate |\mathcal{T}_\mathbb{N} \times \mathcal{T}_\mathbb{N}|. It has one sort, |(0, 0)|, two function symbols, |\mathtt{inl}(\mathtt{s}, 0)| and |\mathtt{inr}(0, \mathtt{s})|, and it has one equation, |\mathtt{inl}(\mathtt{s}, 0)(\mathtt{inr}(0, \mathtt{s})(\mathbf{x}_{(0, 0)})) \approx \mathtt{inr}(0, \mathtt{s})(\mathtt{inl}(\mathtt{s}, 0)(\mathbf{x}_{(0, 0)}))|. Unsurprisingly, the terms of this theory correspond to pairs of natural numbers. If we had used the alternative definition, we’d have had an infinite number of function symbols and an infinite number of equations.
Nevertheless, for clarity I will typically write a term of a product theory as a pair of terms.
As a relatively easy exercise — easier than the above — you can formulate and define the disjoint sum of two theories |\mathcal{T}_1 + \mathcal{T}_2|. The idea is that every term of |\mathcal{T}_1 + \mathcal{T}_2| corresponds to either a term of |\mathcal{T}_1| or a term of |\mathcal{T}_2|. Don’t forget to define what happens to the equations.
Related to these, we have the theory |\mathcal{T}_{\mathbf{1}}|, which consists of one sort and no function symbols or equations, and |\mathcal{T}_{\mathbf{0}}| which consists of no sorts and thus no possibility for function symbols or equations. |\mathcal{T}_{\mathbf{1}}| has exactly one term while |\mathcal{T}_{\mathbf{0}}| has no terms.
Sometimes we’d like to talk about function symbols whose source is in one theory and target is in another. As a simple example, that we’ll explore in more depth later, we may want function symbols whose sources are in a product theory. This would let us consider terms with multiple inputs.
The natural way to achieve this is to simply make a new theory that contains sorts from both theories plus the new function symbols. A collage, |\mathcal{K}|, from a theory |\mathcal{T}_1| to |\mathcal{T}_2|, written |\mathcal{K} : \mathcal{T}_1 \nrightarrow \mathcal{T}_2|, is a theory whose collection of sorts is the disjoint union of the sorts of |\mathcal{T}_1| and |\mathcal{T}_2|. The function symbols of |\mathcal{K}| consist for each function symbol |\mathtt{f} : A \to B| in |\mathcal{T}_1|, a function symbol |\mathtt{inl}(\mathtt{f}) : \mathtt{inl}(A) \to \mathtt{inl}(B)|, and similarly for function symbols from |\mathcal{T}_2|. Equations from |\mathcal{T}_1| and |\mathcal{T}_2| are likewise taken and lifted appropriately, i.e. |\mathtt{f}| is replaced with |\mathtt{inl}(\mathtt{f})| or |\mathtt{inr}(\mathtt{f})| as appropriate. Additional function symbols of the form |k : \mathtt{inl}(A) \to \mathtt{inr}(Z)| where |A| is a sort of |\mathcal{T}_1| and |Z| is a sort of |\mathcal{T}_2|, and potentially additional equations involving these function symbols, may be given. (If no additional function symobls are given, then this is exactly the disjoint sum of |\mathcal{T}_1| and |\mathcal{T}_2|.) These additional function symbols and equations are what differentiate two collages that have the same source and target theories. Note, there are no function symbols |\mathtt{inr}(Z) \to \mathtt{inl}(A)|, i.e. where |Z| is in |\mathcal{T}_2| and |A| is in |\mathcal{T}_1|. That is, there are no function symbols going the “other way”. To avoid clutter, I’ll typically assume that the sorts and function symbols of |\mathcal{T}_1| and |\mathcal{T}_2| are disjoint already, and dispense with the |\mathtt{inl}|s and |\mathtt{inr}|s.
Summarizing, we have |\mathcal{K}(\mathtt{inl}(A), \mathtt{inl}(B)) \cong \mathcal{T}_1(A, B)|, |\mathcal{K}(\mathtt{inr}(Y), \mathtt{inr}(Z)) \cong \mathcal{T}_2(Y, Z)|, and |\mathcal{K}(\mathtt{inr}(Z), \mathtt{inl}(A)) = \varnothing| for all |A|, |B|, |Y|, and |Z|. |\mathcal{K}(\mathtt{inl}(A), \mathtt{inr}(Z))| for any |A| and |Z| is arbitrary generated. To distinguish them, I’ll call the function symbols that go from one theory to another bridges. More generally, an arbitrary term that has it’s source in one theory and target in another will be described as a bridging term.
Here’s a somewhat silly example. Consider |\mathcal{K}_{+} : \mathcal{T}_\mathbb{N} \times \mathcal{T}_\mathbb{N} \nrightarrow \mathcal{T}_\mathbb{N}| that has one bridge |\mathtt{add} : (0, 0) \to 0| with the equations |\mathtt{add}(\mathtt{inl}(\mathtt{s}, 0)(\mathbf{x}_{(0, 0)})) \approx \mathtt{s}(\mathtt{add}(\mathbf{x}_{(0, 0)}))| and |\mathtt{add}(\mathtt{inr}(0, \mathtt{s})(\mathbf{x}_{(0, 0)})) \approx \mathtt{s}(\mathtt{add}(\mathbf{x}_{(0, 0)}))|.
More usefully, if a bit degenerately, every theory induces a collage in the following way. Given a theory |\mathcal{T}|, we can build the collage |\mathcal{K}_\mathcal{T} : \mathcal{T} \nrightarrow \mathcal{T}| where the bridges consist of the following. For each sort, |A|, of |\mathcal{T}|, we have the following bridge: |\mathtt{id}_A : \mathtt{inl}(A) \to \mathtt{inr}(A)|. Then, for every function symbol, |\mathtt{f} : A \to B| in |\mathcal{T}|, we have the following equation: |\mathtt{inl}(\mathtt{f})(\mathtt{id}_A(\mathbf{x}_{\mathtt{inl}(A)})) \approx \mathtt{id}_B(\mathtt{inr}(\mathtt{f})(\mathbf{x}_{\mathtt{inl}(A)}))|. We have |\mathcal{K}_\mathcal{T}(\mathtt{inl}(A), \mathtt{inr}(B)) \cong \mathcal{T}(A, B)|.
You can think of a bridging term in a collage as a sequence of function symbols partitioned into two parts by a bridge. Naturally, we might consider partitioning into more than two parts by having more than one bridge. It’s easy to generalize the definition of collage to combine an arbitrary number of theories, but I’ll take a different, probably worse, route. Given collages |\mathcal{K}_1 : \mathcal{T}_1 \nrightarrow \mathcal{T}_2| and |\mathcal{K}_2 : \mathcal{T}_2 \nrightarrow \mathcal{T}_3|, we can make the collage |\mathcal{K}_2 \circ \mathcal{K}_1 : \mathcal{T}_1 \nrightarrow \mathcal{T}_3| by defining its bridges to be triples of a bridge of |\mathcal{K}_1|, |k_1 : A_1 \to A_2|, a term, |t : A_2 \to B_2| of |\mathcal{T}_2|, and a bridge of |\mathcal{K}_2|, |k_2 : B_2 \to B_3| which altogether will be a bridge of |\mathcal{K}_2 \circ \mathcal{K}_1| going from |A_1 \to B_3|. These triples essentially represent a term like |k_2(t(k_1(\mathbf{x}_{A_1})))|. With this intuition we can formulate the equations. For each equation |t’(k_1(t_1)) \approx s’(k’_1(s_1))| where |k_1| and |k’_1| are bridges of |\mathcal{K}_1|, we have for every bridge |k_2| of |\mathcal{K}_2| and term |t| of the appropriate sorts |(k_2, t(t’(\mathbf{x})), k_1)(t_1) \approx (k_2, t(s’(\mathbf{x})), k’_1)(s_1)| and similarly for equations involving the bridges of |\mathcal{K}_2|.
This composition is associative… almost. Furthermore, the collages generated by theories, |\mathcal{K}_\mathcal{T}|, behave like identities to this composition… almost. It turns out these statements are true, but only up to isomorphism of theories. That is, |(\mathcal{K}_3 \circ \mathcal{K}_2) \circ \mathcal{K}_1 \cong \mathcal{K}_3 \circ (\mathcal{K}_2 \circ \mathcal{K}_1)| but is not equal.
To talk about isomorphism of theories we need the notion of…
An interpretation of a theory gives meaning to the syntax of a theory. There are two nearly identical notions of interpretation for us: interpretation (into sets) and interpretation into a theory. I’ll define them in parallel. An interpretation (into a theory), |\mathcal{I}|, is a mapping, written |[\! [-]\!]^\mathcal{I}| though the superscript will often be omitted, which maps sorts to sets (sorts) and function symbols to functions (terms). The mapping satisfies:
|[\! [\mathrm{src}(f)]\!] = \mathrm{src}([\! [f]\!])| and |[\! [\mathrm{tgt}(f)]\!] = \mathrm{tgt}([\! [f]\!])| where |\mathrm{src}| and |\mathrm{tgt}| on the right are the domain and codomain operations for an interpretation.
We extend the mapping to a mapping on terms via:
- |[\! [\mathbf{x}_A]\!] = x \mapsto x|, i.e. the identity function, or, for interpretation into a theory, |[\! [\mathbf{x}_A]\!] = \mathbf{x}_{[\! [A]\!]}|
- |[\! [\mathtt{f}(t)]\!] = [\! [\mathtt{f}]\!] \circ [\! [t]\!]| or, for interpretation into a theory, |[\! [\mathtt{f}(t)]\!] = [\! [\mathtt{f}]\!] ([\! [t]\!])|
and we require that for any equation of the theory, |l \approx r|, |[\! [l]\!] = [\! [r]\!]|. (Technically, this is implicitly required for the extension of the mapping to terms to be well-defined, but it’s clearer to state it explicitly.) I’ll write |\mathcal{I} : \mathcal{T} \to \mathbf{Set}| when |\mathcal{I}| is an interpretation of |\mathcal{T}| into sets, and |\mathcal{I}’ : \mathcal{T}_1 \to \mathcal{T}_2| when |\mathcal{I}’| is an interpretation of |\mathcal{T}_1| into |\mathcal{T}_2|.
An interpretation of the theory of isomorphisms produces a bijection between two specified sets. Spelling out a simple example where |\mathbb{B}| is the set of booleans:
- |[\! [A]\!] \equiv \mathbb{B}|
- |[\! [B]\!] \equiv \mathbb{B}|
- |[\! [\mathtt{f}]\!] \equiv x \mapsto \lnot x|
- |[\! [\mathtt{g}]\!] \equiv x \mapsto \lnot x|
plus the proof |\lnot \lnot x = x|.
As another simple example, we can interpret the theory of isomorphisms into itself slightly non-trivially.
- |[\! [A]\!] \equiv B|
- |[\! [B]\!] \equiv A|
- |[\! [\mathtt{f}]\!] \equiv \mathtt{g}(\mathbf{x}_B)|
- |[\! [\mathtt{g}]\!] \equiv \mathtt{f}(\mathbf{x}_A)|
As an (easy) exercise, you should define |\pi_1 : \mathcal{T}_1 \times \mathcal{T}_2 \to \mathcal{T}_1| and similarly |\pi_2|. If you defined |\mathcal{T}_1 + \mathcal{T}_2| before, you should define |\iota_1 : \mathcal{T}_1 \to \mathcal{T}_1 + \mathcal{T}_2| and similarly for |\iota_2|. As another easy exercise, show that an interpretation of |\mathcal{T}_{\cong}| is a bijection. In Haskell, an interpretation of |\mathcal{T}_\mathbb{N}| would effectively be foldNat
. Something very interesting happens when you consider what an interpretation of the collage generated by a theory, |\mathcal{K}_\mathcal{T}|, is. Spell it out. In a different vein, you can show that a collage |\mathcal{K} : \mathcal{T}_1 \nrightarrow \mathcal{T}_2| and an interpretation |\mathcal{T}_1^{op} \times \mathcal{T}_2 \to \mathbf{Set}| are essentially the same thing in the sense that each gives rise to the other.
Two theories are isomorphic if there exists interpretations |\mathcal{I}_1 : \mathcal{T}_1 \to \mathcal{T}_2| and |\mathcal{I}_2 : \mathcal{T}_2 \to \mathcal{T}_1| such that |[\! [[\! [A]\!]^{\mathcal{I}_1}]\!]^{\mathcal{I}_2} = A| and vice versa, and similarly for function symbols. In other words, each is interpretable in the other, and if you go from one interpretation and then back, you end up where you started. Yet another way to say this is that there is a one-to-one correspondence between sorts and terms of each theory, and this correspondence respects substitution.
As a crucially important example, the set of terms, |\mathcal{T}(A, B)|, can be extended to an interpretation. In particular, for each sort |A|, |\mathcal{T}(A, {-}) : \mathcal{T} \to \mathbf{Set}|. It’s action on function symbols is the following:
\[[\! [\mathtt{f}]\!]^{\mathcal{T}(A, {-})} \equiv t \mapsto \mathtt{f}(t)\]
We have, dually, |\mathcal{T}({-}, A) : \mathcal{T}^{op} \to \mathbf{Set}| with the following action:
\[[\! [\mathtt{f}]\!]^{\mathcal{T}({-}, A)} \equiv t \mapsto t(\mathtt{f}(\mathbf{x}_B))\]
We can abstract from both parameters making |\mathcal{T}({-}, {=}) : \mathcal{T}^{op} \times \mathcal{T} \to \mathbf{Set}| which, by an early exercise, can be shown to correspond with the collage |\mathcal{K}_\mathcal{T}|.
Via an abuse of notation, I’ll identify |\mathcal{T}^{op}(A, {-})| with |\mathcal{T}({-}, A)|, though technically we only have an isomorphism between the interpretations, and to talk about isomorphisms between interpretations we need the notion of…
The theories we’ve presented are (multi-sorted) universal algebra theories. Universal algebra allows us to specify a general notion of “homomorphism” that generalizes monoid homomorphism or group homomorphism or ring homomorphism or lattice homomorphism.
In universal algebra, the algebraic theory of groups consists of a single sort, a nullary operation, |1|, a binary operation, |\cdot|, a unary operation, |\mathtt{inv}|, and some equations which are unimportant for us. Operations correspond to our function symbols except that they’re are not restricted to being unary. A particular group is a particular interpretation of the algebraic theory of groups, i.e. it is a set and three functions into the set. A group homomorphism then is a function between those two groups, i.e. between the two interpretations, that preserves the operations. In a traditional presentation this would look like the following:
Say |\alpha : G \to K| is a group homomorphism from the group |G| to the group |K| and |g, h \in G| then:
- |\alpha(1_G) = 1_K|
- |\alpha(g \cdot_G h) = \alpha(g) \cdot_K \alpha(h)|
- |\alpha(\mathtt{inv}_G(g)) = \mathtt{inv}_K(\alpha(g))|
Using something more akin to our notation, it would look like:
- |\alpha([\! [1]\!]^G) = [\! [1]\!]^K|
- |\alpha([\! [\cdot]\!]^G(g,h)) = [\! [\cdot]\!]^K(\alpha(g), \alpha(h))|
- |\alpha([\! [\mathtt{inv}]\!]^G(g)) = [\! [\mathtt{inv}]\!]^K(\alpha(g))|
The |\mathtt{inv}| case is the most relevant for us as it is unary. However, for us, a function symbol |\mathtt{f}| may have a different source and target and so we made need a different function on each side of the equation. E.g. for |\mathtt{f} : A \to B|, |\alpha : \mathcal{I}_1 \to \mathcal{I}_2|, and |a \in [\! [A]\!]^{\mathcal{I}_1}| we’d have:
\[\alpha_B([\! [\mathtt{f}]\!]^{\mathcal{I}_1}(a)) = [\! [\mathtt{f}]\!]^{\mathcal{I}_2}(\alpha_A(a))\]
So a homomorphism |\alpha : \mathcal{I}_1 \to \mathcal{I}_2 : \mathcal{T} \to \mathbf{Set}| is a family of functions, one for each sort of |\mathcal{T}|, that satisfies the above equation for every function symbol of |\mathcal{T}|. We call the individual functions making up |\alpha| components of |\alpha|, and we have |\alpha_A : [\! [A]\!]^{\mathcal{I}_1} \to [\! [A]\!]^{\mathcal{I}_2}|. The definition for an interpretation into a theory, |\mathcal{T}_2|, is identical except the components of |\alpha| are terms of |\mathcal{T}_2| and |a| can be replaced with |\mathbf{x}_{[\! [A]\!]^{\mathcal{I}_1}}|. Two interpretations are isomorphic if we have homomorphism |\alpha : \mathcal{I}_1 \to \mathcal{I}_2| such that each component is a bijection. This is the same as requiring a homomorphism |\beta : \mathcal{I}_2 \to \mathcal{I}_1| such that for each |A|, |\alpha_A(\beta_A(x)) = x| and |\beta_A(\alpha_A(x)) = x|. A similar statement can be made for interpretations into theories, just replace |x| with |\mathbf{x}_{[\! [A]\!]}|.
Another way to look at homomorphisms is via collages. A homomorphism |\alpha : \mathcal{I}_1 \to \mathcal{I}_2 : \mathcal{T} \to \mathbf{Set}| gives rise to an interpretation of the collage |\mathcal{K}_\mathcal{T}|. The interpretation |\mathcal{I}_\alpha : \mathcal{K}_\mathcal{T} \to \mathbf{Set}| is defined by:
- |[\! [\mathtt{inl}(A)]\!]^{\mathcal{I}_\alpha} \equiv [\! [A]\!]^{\mathcal{I}_1}|
- |[\! [\mathtt{inr}(A)]\!]^{\mathcal{I}_\alpha} \equiv [\! [A]\!]^{\mathcal{I}_2}|
- |[\! [\mathtt{inl}(\mathtt{f})]\!]^{\mathcal{I}_\alpha} \equiv [\! [\mathtt{f}]\!]^{\mathcal{I}_1}|
- |[\! [\mathtt{inr}(\mathtt{f})]\!]^{\mathcal{I}_\alpha} \equiv [\! [\mathtt{f}]\!]^{\mathcal{I}_2}|
- |[\! [\mathtt{id}_A]\!]^{\mathcal{I}_\alpha} \equiv \alpha_A|
The homomorphism law guarantees that it satisfies the equation on |\mathtt{id}|. Conversely, given an interpretation of |\mathcal{K}_\mathcal{T}|, we have the homomorphism, |[\! [\mathtt{id}]\!] : [\! [\mathtt{inl}({-})]\!] \to [\! [\mathtt{inr}({-})]\!] : \mathcal{T} \to \mathbf{Set}|. and the equation on |\mathtt{id}| is exactly the homomorphism law.
Consider a homomorphism |\alpha : \mathcal{T}(A, {-}) \to \mathcal{I}|. The |\alpha| needs to satisfy for every sort |B| and |C|, every function symbol |\mathtt{f} : C \to D|, and every term |t : B \to C|:
\[\alpha_D(\mathtt{f}(t)) = [\! [\mathtt{f}]\!]^\mathcal{I}(\alpha_C(t))\]
Looking at this equation, the possibility of viewing it as a recursive “definition” leaps out suggesting that the action of |\alpha| is completely determined by it’s action on the variables. Something like this, for example:
\[\alpha_D(\mathtt{f}(\mathtt{g}(\mathtt{h}(\mathbf{x}_A)))) = [\! [\mathtt{f}]\!] (\alpha_C(\mathtt{g}(\mathtt{h}(\mathbf{x}_A)))) = [\! [\mathtt{f}]\!] ([\! [\mathtt{g}]\!] (\alpha_B(\mathtt{h}(\mathbf{x}_A)))) = [\! [\mathtt{f}]\!] ([\! [\mathtt{g}]\!] ([\! [\mathtt{h}]\!] (\alpha_A(\mathbf{x}_A))))\]
We can easily establish that there’s a one-to-one correspondence between the set of homomorphisms |\mathcal{T}(A, {-}) \to \mathcal{I}| and the elements of the set |[\! [A]\!]^\mathcal{I}|. Given a homomorphism, |\alpha|, we get an element of |[\! [A]\!]^\mathcal{I}| via |\alpha_A(\mathbf{x}_A)|. Inversely, given an element |a \in [\! [A]\!]^\mathcal{I}|, we can define a homomorphism |a^{*}| via:
- |a_D^{*}(\mathtt{f}(t)) \equiv [\! [\mathtt{f}]\!]^\mathcal{I}(a_C^{*}(t))|
- |a_A^{*}(\mathbf{x}_A) \equiv a|
which clearly satisfies the condition on homomorphisms by definition. It’s easy to verify that |(\alpha_A(\mathbf{x}_A))^{*} = \alpha| and immediately true that |a^{*}(\mathbf{x}_A) = a| establishing the bijection.
We can state something stronger. Given any homomorphism |\alpha : \mathcal{T}(A, {-}) \to \mathcal{I}| and any function symbol |\mathtt{g} : A \to X|, we can make a new homomorphism |\alpha \cdot \mathtt{g} : \mathcal{T}(X, {-}) \to \mathcal{I}| via the following definition:
\[(\alpha \cdot \mathtt{g})(t) = \alpha(t(\mathtt{g}(\mathbf{x}_A)))\]
Verifying that this is a homomorphism is straightforward:
\[(\alpha \cdot \mathtt{g})(\mathtt{f}(t)) = \alpha(\mathtt{f}(t(\mathtt{g}(\mathbf{x}_A)))) = [\! [\mathtt{f}]\!] (\alpha(t(\mathtt{g}(\mathbf{x}_A)))) = [\! [\mathtt{f}]\!] ((\alpha \cdot \mathtt{g})(t))\]
and like any homomorphism of this form, as we’ve just established, it is completely determined by it’s action on variables, namely |(\alpha \cdot \mathtt{g})_A(\mathbf{x}_A) = \alpha_X(\mathtt{g}(\mathbf{x}_A)) = [\! [\mathtt{g}]\!] (\alpha_A(\mathbf{x}_A))|. In particular, if |\alpha = a^{*}|, then we have |a^{*} \cdot \mathtt{g} = ([\! [\mathtt{g}]\!] (a))^{*}|. Together these facts establish that we have an interpretation |\mathcal{Y} : \mathcal{T} \to \mathbf{Set}| such that |[\! [A]\!]^\mathcal{Y} \equiv (\mathcal{T}(A, {-}) \to \mathcal{I})|, the set of homomorphisms, and |[\! [\mathtt{g}]\!]^\mathcal{Y}(\alpha) \equiv \alpha \cdot \mathtt{g}|. The work we did before established that we have homomorphisms |({-})(\mathbf{x}) : \mathcal{Y} \to \mathcal{I}| and |({-})^{*} : \mathcal{I} \to \mathcal{Y}| that are inverses. This is true for all theories and all interpretations as at no point did we use any particular facts about them. This statement is the (dual form of the) Yoneda lemma. To get the usual form simply replace |\mathcal{T}| with |\mathcal{T}^{op}|. A particularly important and useful case (so useful it’s usually used tacitly) occurs when we choose |\mathcal{I} = \mathcal{T}(B,{-})|, we get |(\mathcal{T}(A, {-}) \to \mathcal{T}(B, {-})) \cong \mathcal{T}(B, A)| or, choosing |\mathcal{T}^{op}| everywhere, |(\mathcal{T}({-}, A) \to \mathcal{T}({-}, B)) \cong \mathcal{T}(A, B)| which states that a term from |A| to |B| is equivalent to a homomorphism from |\mathcal{T}({-}, A)| to |\mathcal{T}({-}, B)|.
There is another result, dual in a different way, called the co-Yoneda lemma. It turns out it is a corollary of the fact that for a collage |\mathcal{K} : \mathcal{T}_1 \nrightarrow \mathcal{T}_2|, |\mathcal{K}_{\mathcal{T}_2} \circ \mathcal{K} \cong \mathcal{K}| and the dual is just the composition the other way. To get (closer to) the precise result, we need to be able to turn an interpretation into a collage. Given an interpretation, |\mathcal{I} : \mathcal{T} \to \mathbf{Set}|, we can define a collage |\mathcal{K}_\mathcal{I} : \mathcal{T}_\mathbf{1} \nrightarrow \mathcal{T}| whose bridges from |1 \to A| are the elements of |[\! [A]\!]^\mathcal{I}|. Given this, the co-Yoneda lemma is the special case, |\mathcal{K}_\mathcal{T} \circ \mathcal{K}_\mathcal{I} \cong \mathcal{K}_\mathcal{I}|.
Note, that the Yoneda and co-Yoneda lemmas only apply to interpretations into sets as |\mathcal{Y}| involves the set of homomorphisms.
The Yoneda lemma suggests that the interpretations |\mathcal{T}(A, {-})| and |\mathcal{T}({-}, A)| are particularly important and this will be borne out as we continue.
We call an interpretation, |\mathcal{I} : \mathcal{T}^{op} \to \mathbf{Set}| representable if |\mathcal{I} \cong \mathcal{T}({-}, X)| for some sort |X|. We then say that |X| represents |\mathcal{I}|. What this states is that every term of sort |X| corresponds to an element in one of the sets that make up |\mathcal{I}|, and these transform appropriately. There’s clearly a particularly important element, namely the image of |\mathbf{x}_X| which corresponds to an element in |[\! [X]\!]^\mathcal{I}|. This element is called the universal element. The dual concept is, for |\mathcal{I} : \mathcal{T} \to \mathbf{Set}|, |\mathcal{I}| is co-representable if |\mathcal{I} \cong \mathcal{T}(X, {-})|. We will also say |X| represents |\mathcal{I}| in this case as it actually does when we view |\mathcal{I}| as an interpretation of |(\mathcal{T}^{op})^{op}|.
As a rather liberating exercise, you should establish the following result called parameterized representability. Assume we have theories |\mathcal{T}_1| and |\mathcal{T}_2|, and a family of sorts of |\mathcal{T}_2|, |X|, and a family of interpretations of |\mathcal{T}_2^{op}|, |\mathcal{I}|, both indexed by sorts of |\mathcal{T}_1|, such that for each |A \in \mathbf{sort}(\mathcal{T}_1)|, |X_A| represents |\mathcal{I}_A|, i.e. |\mathcal{I}_A \cong \mathcal{T}_2({-}, X_A)|. Given all this, then there is a unique interpretation |\mathcal{X} : \mathcal{T}_1 \to \mathcal{T}_2| and |\mathcal{I} : \mathcal{T}_1 \times \mathcal{T}_2^{op} \to \mathbf{Set}| where |[\! [A]\!]^{\mathcal{X}} \equiv X_A| and |[\! [(A, B)]\!]^\mathcal{I} \equiv [\! [B]\!]^{\mathcal{I}_A}| such that |\mathcal{I} \cong \mathcal{T}_2({=},[\! [-]\!]^\mathcal{X})|. To be a bit more clear, the right hand side means |(A, B) \mapsto \mathcal{T}_2(B, [\! [A]\!]^\mathcal{X})|. Simply by choosing |\mathcal{T}_1| to be a product of multiple theories, we can generalize this result to an arbitrary number of parameters. What makes this result liberating is that we just don’t need to worry about the parameters, they will automatically transform homomorphically. As a technical warning though, since two interpretations may have the same action on sorts but a different action on function symbols, if the family |X_A| was derived from an interpretation |\mathcal{J}|, i.e. |X_A \equiv [\! [A]\!]^\mathcal{J}|, it may not be the case that |\mathcal{X} = \mathcal{J}|.
Let’s look at some examples.
As a not-so-special case of representability, we can consider |\mathcal{I} \equiv \mathcal{K}(\mathtt{inl}({-}), \mathtt{inr}(Z))| where |\mathcal{K} : \mathcal{T}_1 \nrightarrow \mathcal{T}_2|. Saying that |A| represents |\mathcal{I}| in this case is saying that bridging terms of sort |\mathtt{inr}(Z)|, i.e. sort |Z| in |\mathcal{T}_2|, in |\mathcal{K}|, correspond to terms of sort |A| in |\mathcal{T}_1|. We’ll call the universal element of this representation the universal bridge (though technically it may be a bridging term, not a bridge). Let’s write |\varepsilon| for this universal bridge. What representability states in this case is given any bridging term |k| of sort |Z|, there exists a unique term |\lceil k \rceil| of sort |A| such that |k = \varepsilon(\lceil k \rceil)|. If we have an interpretation |\mathcal{X} : \mathcal{T}_2 \to \mathcal{T}_1| such that |[\! [Z]\!]^\mathcal{X}| represents |\mathcal{K}(\mathtt{inl}({-}), \mathtt{inr}(Z))| for each sort |Z| of |\mathcal{T}_2| we say we have a right representation of |\mathcal{K}|. Note, that the universal bridges become a family |\varepsilon_Z : [\! [Z]\!]^\mathcal{X} \to Z|. Similarly, if |\mathcal{K}(\mathtt{inl}(A), \mathtt{inr}({-}))| is co-representable for each |A|, we say we have a left representation of |\mathcal{K}|. The co-universal bridge is then a bridging term |\eta_A : A \to [\! [A]\!]| such that for any bridging term |k| with source |A|, there exists a unique term |\lfloor k \rfloor| in |\mathcal{T}_2| such that |k = \lfloor k \rfloor(\eta_A)|. For reference, we’ll call these equations universal properties of the left/right representation. Parameterized representability implies that a left/right representation is essentially unique.
\[ \xymatrix @!=8em { A \ar@2{->}[d]_{\eta_A} \ar@2{->}[dr]^k & \\ [\! [A]\!] \ar[r]_{\lfloor k\rfloor} & Z } \qquad \xymatrix @!=8em { A \ar[r]^{\lceil k\rceil} \ar@2{->}[dr]_k & [\! [Z]\!] \ar@2{->}[d]^{\varepsilon_Z} \\ & Z }\]
Define |\mathcal{I}_\mathbf{1}| via |[\! [A]\!]^{\mathcal{I}_\mathbf{1}} \equiv \mathbf{1}| where |\mathbf{1}| is some one element set. |[\! [\mathtt{f}]\!]^{\mathcal{I}_\mathbf{1}}| is the identity function for all function symbols |\mathtt{f}|. We’ll say a theory |\mathcal{T}| has a unit sort or has a terminal sort if there is a sort that we’ll also call |\mathbf{1}| that represents |\mathcal{I}_\mathbf{1}|. Spelling out what that means, we first note that there is nothing notable about the universal element as it’s the only element. However, writing the homomorphism |! : \mathcal{I}_\mathbf{1} \to \mathcal{T}({-}, \mathbf{1})| and noting that since there’s only one element of |[\! [A]\!]^{\mathcal{I}_\mathbf{1}}| we can, with a slight abuse of notation, also write the term |!| picks out as |!| which gives the equation:
|!_B(\mathtt{g}(t)) = {!_A(t)}| for any function symbol |\mathtt{g} : A \to B| and term, |t|, of sort |A|, note |!_A : A \to \mathbf{1}|.
This equation states what the isomorphism also fairly directly states: there is exactly one term of sort |\mathbf{1}| from any sort |A|, namely |!_A(\mathbf{x}_A)|. The dual notion is called a void sort or an initial sort and will usually be notated |\mathbf{0}|, the analog of |!| will be written as |0|. The resulting equation is:
|\mathtt{f}(0_A) = 0_B| for any function symbol |\mathtt{f} : A \to B|, note |0_A : \mathbf{0} \to A|.
For the next example, I’ll leverage collages. Consider the collage |\mathcal{K}_2 : \mathcal{T} \nrightarrow \mathcal{T} \times \mathcal{T}| whose bridges from |A \to (B, C)| consist of pairs of terms |t_1 : A \to B| and |t_2 : A \to C|. |\mathcal{T}| has pairs if |\mathcal{K}_2| has a right representation. We’ll write |(B, C) \mapsto B \times C| for the representing interpretation’s action on sorts. We’ll write the universal bridge as |(\mathtt{fst}(\mathbf{x}_{B \times C}), \mathtt{snd}(\mathbf{x}_{B \times C}))|. The universal property then looks like |(\mathtt{fst}(\mathbf{x}_{B \times C}), \mathtt{snd}(\mathbf{x}_{B \times C}))(\langle t_1, t_2 \rangle) = (t_1, t_2)| where |\langle t_1, t_2 \rangle : A \to B \times C| is the unique term induced by the bridge |(t_1, t_2)|. The universal property implies the following equations:
- |\langle \mathtt{fst}(\mathbf{x}_{B \times C}), \mathtt{snd}(\mathbf{x}_{B \times C})) = \mathbf{x}_{B \times C}|
- |\mathtt{fst}(\langle t_1, t_2 \rangle) = t_1|
- |\mathtt{snd}(\langle t_1, t_2 \rangle) = t_2|
\[ \xymatrix @!=8em { A \ar[r]^{\langle t_1, t_2 \rangle} \ar@2{->}[dr]_{(t_1, t_2)} & B \times C \ar@2{->}[d]^{(\mathtt{fst}, \mathtt{snd})} \\ & (B, C) }\]
One aspect of note is regardless of whether |\mathcal{K}_2| has a right representation, i.e. regardless of whether |\mathcal{T}| has pairs, it always has a left representation. The co-universal bridge is |(\mathbf{x}_A, \mathbf{x}_A)| and the unique term |\lfloor{t_1, t_2}\rfloor| is |\mathtt{inl}(t_1, \mathbf{x}_A)(\mathtt{inr}(\mathbf{x}_A, t_2)(\mathbf{x}_{(A,A)}))|.
\[ \xymatrix @!=8em { A \ar@2{->}[d]_{(\mathbf{x}_A, \mathbf{x}_A)} \ar@2{->}[dr]^{(t_1, t_2)} & \\ (A, A) \ar[r]_{\lfloor (t_1, t_2)\rfloor} & (B, C) }\]
Define an interpretation |\Delta : \mathcal{T} \to \mathcal{T} \times \mathcal{T}| so that |[\! [A]\!]^\Delta \equiv (A,A)| and similarly for function symbols. |\Delta| left represents |\mathcal{K}_2|. If the interpretation |(B,C) \mapsto B \times C| right represents |\mathcal{K}_2|, then we say we have an adjunction between |\Delta| and |({-} \times {=})|, written |\Delta \dashv ({-} \times {=})|, and that |\Delta| is left adjoint to |({-} \times {=})|, and conversely |({-} \times {=})| is right adjoint |\Delta|.
\[ \xymatrix @!=8em { A \ar@2{->}[d]_{(\mathbf{x}_A, \mathbf{x}_A)} \ar[r]^{\langle t_1, t_2\rangle} \ar@2{->}[dr]_{(t_1, t_2)} & B \times C \ar@2{->}[d]^{(\mathtt{fst}, \mathtt{snd})}\\ [\! [A]\!]^\Delta \ar[r]_{\lfloor (t_1, t_2)\rfloor} & (B, C) }\]
More generally, whenever we have the situation |\mathcal{T}_1([\! [-]\!]^{\mathcal{I}_1}, {=}) \cong \mathcal{T}_2({-}, [\! [=]\!]^{\mathcal{I}_2})| we say that |\mathcal{I}_1 : \mathcal{T}_2 \to \mathcal{T}_1| is left adjoint to |\mathcal{I}_2 : \mathcal{T}_1 \to \mathcal{T}_2| or conversely that |\mathcal{I}_2| is right adjoint to |\mathcal{I}_1|. We call this arrangement an adjunction and write |\mathcal{I}_1 \dashv \mathcal{I}_2|. Note that we will always have this situation if |\mathcal{I}_1| left represents and |\mathcal{I}_2| right represents the same collage. As we noted above, parameterized representability actually determines one adjoint given (its action on sorts and) the other adjoint. With this we can show that adjoints are unique up to isomorphism, that is, given two left adjoints to an interpretation, they must be isomorphic. Similarly for right adjoints. This means that stating something is a left or right adjoint to some other known interpretation essentially completely characterizes it. One issue with adjunctions is that they tend to be wholesale. Let’s say the pair sort |A \times B| existed but no other pair sorts existed, then the (no longer parameterized) representability approach would work just fine, but the adjunction would no longer exist.
Here’s a few of exercises using this. First, a moderately challenging one (until you catch the pattern): spell out the details to the left adjoint to |\Delta|. We say a theory has sums and write those sums as |A + B| if |({-} + {=}) \dashv \Delta|. Recast void and unit sorts using adjunctions and/or left/right representations. As a terminological note, we say a theory has finite products if it has unit sorts and pairs. Similarly, a theory has finite sums or has finite coproducts if it has void sorts and sums. An even more challenging exercise is the following: a theory has exponentials if it has pairs and for every sort |A|, |(A \times {-}) \dashv (A \Rightarrow {-})| (note, parameterized representability applies to |A|). Spell out the equations characterizing |A \Rightarrow B|.
Finite products start to lift us off the ground. So far the theories we’ve been working with have been extremely basic: a language with only unary functions, all terms being just a sequence of applications of function symbols. It shouldn’t be underestimated though. It’s more than enough to do monoid and group theory. A good amount of graph theory can be done with just this. And obviously we were able to establish several general results assuming only this structure. Nevertheless, while we can talk about specific groups, say, we can’t talk about the theory of groups. Finite products change this.
A theory with finite products allows us to talk about multi-ary function symbols and terms by considering unary function symbols from products. This allows us to do all of universal algebra. For example, the theory of groups, |\mathcal{T}_{\mathbf{Grp}}|, consists of a sort |S| and all it’s products which we’ll abbreviate as |S^n| with |S^0 \equiv \mathbf{1}| and |S^{n+1} \equiv S \times S^n|. It has three function symbols |\mathtt{e} : \mathbf{1} \to S|, |\mathtt{m} : S^2 \to S|, and |\mathtt{i} : S \to S| plus the ones that having finite products requires. In fact, instead of just heaping an infinite number of sorts and function symbols into our theory — and we haven’t even gotten to equations — let’s define a compact set of data from which we can generate all this data.
A signature, |\Sigma|, consists of a collection of sorts, |\sigma|, a collection of multi-ary function symbols, and a collection of equations. Equations still remain pairs of terms, but we need to now extend our definition of terms for this context. A term (in a signature) is either a variable, |\mathbf{x}_i^{[A_0,A_1,\dots,A_n]}| where |A_i| are sorts and |0 \leq i \leq n|, the operators |\mathtt{fst}| or |\mathtt{snd}| applied to a term, the unit term written |\langle\rangle^A| with sort |A|, a pair of terms written |\langle t_1, t_2 \rangle|, or the (arity correct) application of a multi-ary function symbol to a series of terms, e.g. |\mathtt{f}(t_1, t_2, t_3)|. As a Haskell data declaration, it might look like:
data SigTerm
= SigVar [Sort] Int
| Fst SigTerm
| Snd SigTerm
| Unit Sort
| Pair SigTerm SigTerm
| SigApply FunctionSymbol [SigTerm]
At this point, sorting (i.e. typing) the terms is no longer trivial, though it is still pretty straightforward. Sorts are either |\mathbf{1}|, or |A \times B| for sorts |A| and |B|, or a sort |A \in \sigma|. The source of function symbols or terms are lists of sorts.
- |\mathbf{x}_i^{[A_0, A_1, \dots, A_n]} : [A_0, A_1, \dots, A_n] \to A_i|
- |\langle\rangle^A : [A] \to \mathbf{1}|
- |\langle t_1, t_2 \rangle : \bar S \to T_1 \times T_2| where |t_i : \bar S \to T_i|
- |\mathtt{fst}(t) : \bar S \to T_1| where |t : \bar S \to T_1 \times T_2|
- |\mathtt{snd}(t) : \bar S \to T_2| where |t : \bar S \to T_1 \times T_2|
- |\mathtt{f}(t_1, \dots, t_n) : \bar S \to T| where |t_i : \bar S \to T_i| and |\mathtt{f} : [T_1,\dots,T_n] \to T|
The point of a signature was to represent a theory so we can compile a term of a signature into a term of a theory with finite products. The theory generated from a signature |\Sigma| has the same sorts as |\Sigma|. The equations will be equations of |\Sigma|, with the terms compiled as will be described momentarily, plus for every pair of sorts the equations that describe pairs and the equations for |!|. Finally, we need to describe how to take a term of the signature and make a function symbol of the theory, but before we do that we need to explain how to convert those sources of the terms which are lists. That’s just a conversion to right nested pairs, |[A_0,\dots,A_n] \mapsto A_0 \times ({\dots} \times (A_n \times \mathbf{1}) \dots )|. The compilation of a term |t|, which we’ll write as |\mathcal{C}[t]|, is defined as follows:
- |\mathcal{C}[\mathbf{x}_i^{[A_0, A_1, \dots, A_n]}] = \mathtt{snd}^i(\mathtt{fst}(\mathbf{x}_{A_i \times(\dots)}))| where |\mathtt{snd}^i| means the |i|-fold application of |\mathtt{snd}|
- |\mathcal{C}[\langle\rangle^A] = {!_A}|
- |\mathcal{C}[\langle t_1, t_2 \rangle] = \langle \mathcal{C}[t_1], \mathcal{C}[t_2] \rangle|
- |\mathcal{C}[\mathtt{fst}(t)] = \mathtt{fst}(\mathcal{C}[t])|
- |\mathcal{C}[\mathtt{snd}(t)] = \mathtt{snd}(\mathcal{C}[t])|
- |\mathcal{C}[\mathtt{f}(t_1, \dots, t_n)] = \mathtt{f}(\langle \mathcal{C}[t_1], \langle \dots , \langle \mathcal{C}[t_n], ! \rangle \dots \rangle \rangle)|
As you may have noticed, the generated theory will have an infinite number of sorts, an infinite number of function symbols, and an infinite number of equations no matter what the signature is — even an empty one! Having an infinite number of things isn’t a problem as long as we can algorithmically describe them and this is what the signature provides. Of course, if you’re a (typical) mathematician you nominally don’t care about an algorithmic description. Besides being compact, signatures present a nicer term language. The theories are like a core or assembly language. We could define a slightly nicer variation where we keep a context and manage named variables leading to terms-in-context like:
\[x:A, y:B \vdash \mathtt{f}(x, x, y)\]
which is
\[\mathtt{f}(\mathbf{x}_0^{[A,B]}, \mathbf{x}_0^{[A,B]}, \mathbf{x}_1^{[A,B]})\]
for our current term language for signatures. Of course, compilation will be (slightly) trickier for the nicer language.
The benefit of having compiled the signature to a theory, in addition to being able to reuse the results we’ve established for theories, is we only need to define operations on the theory, which is simpler since we only need to deal with pairs and unary function symbols. One example of this is we’d like to extend our notion of interpretation to one that respects the structure of the signature, and we can do that by defining an interpretation of theories that respects finite products.
A finite product preserving interpretation (into a finite product theory), |\mathcal{I}|, is an interpretation (into a finite product theory) that additionally satisfies:
- |[\! [\mathbf{1}]\!]^\mathcal{I} \approx \mathbf{1}|
- |[\! [A \times B]\!]^\mathcal{I} \approx [\! [A]\!]^\mathcal{I} \times [\! [B]\!]^\mathcal{I}|
- |[\! [!_A]\!]^\mathcal{I} = {!_{[\! [A]\!]^\mathcal{I}}}|
- |[\! [\mathtt{fst}(t)]\!]^\mathcal{I} = \mathtt{fst}([\! [t]\!]^\mathcal{I})|
- |[\! [\mathtt{snd}(t)]\!]^\mathcal{I} = \mathtt{snd}([\! [t]\!]^\mathcal{I})|
- |[\! [\langle t_1, t_2 \rangle]\!]^\mathcal{I} = \langle [\! [t_1]\!]^\mathcal{I}, [\! [t_2]\!]^\mathcal{I} \rangle|
where, for |\mathbf{Set}|, |\mathbf{1} \equiv \{\{\}\}|, |\times| is the cartesian product, |\mathtt{fst}| and |\mathtt{snd}| are the projections, |!_A \equiv x \mapsto \{\}|, and |\langle f, g \rangle \equiv x \mapsto \langle f(x), g(x) \rangle|.
With signatures, we can return to our theory, now signature, of groups. |\Sigma_{\mathbf{Grp}}| has a single sort |S|, three function symbols |\mathtt{e} : [\mathbf{1}] \to S|, |\mathtt{i} : [S] \to S|, and |\mathtt{m} : [S, S] \to S|, with the following equations:
- |\mathtt{m}(\mathtt{e}(\langle\rangle^S), \mathbf{x}_0^S) \approx \mathbf{x}_0^S|
- |\mathtt{m}(\mathtt{i}(\mathbf{x}_0^S), \mathbf{x}_0^S) \approx \mathtt{e}(\langle\rangle^S)|
- |\mathtt{m}(\mathtt{m}(\mathbf{x}_0^{[S,S,S]}, \mathbf{x}_1^{[S,S,S]}), \mathbf{x}_2^{[S,S,S]}) \approx \mathtt{m}(\mathbf{x}_0^{[S,S,S]}, \mathtt{m}(\mathbf{x}_1^{[S,S,S]}, \mathbf{x}_2^{[S,S,S]}))|
or using the nicer syntax:
- |x:S \vdash \mathtt{m}(\mathtt{e}(), x) \approx x|
- |x:S \vdash \mathtt{m}(\mathtt{i}(x), x) \approx \mathtt{e}()|
- |x:S, y:S, z:S \vdash \mathtt{m}(\mathtt{m}(x, y), z) \approx \mathtt{m}(x, \mathtt{m}(y, z))|
An actual group is then just a finite product preserving interpretation of (the theory generated by) this signature. All of universal algebra and much of abstract algebra can be formulated this way.
We can consider additionally assuming that our theory has exponentials. I left articulating exactly what that means as an exercise, but the upshot is we have the following two operations:
For any term |t : A \times B \to C|, we have the term |\mathtt{curry}(t) : A \to C^B|. We also have the homomorphism |\mathtt{app}_{AB} : B^A \times A \to B|. They satisfy:
- |\mathtt{curry}(\mathtt{app}(\mathbf{x}_{B^A \times A})) = \mathbf{x}_{B^A}|
- |\mathtt{app}(\langle \mathtt{curry}(t_1), t_2 \rangle) = t_1(\langle \mathbf{x}_A, t_2 \rangle)| where |t_1 : A \times B \to C| and |t_2 : A \to B|.
We can view these, together with the the product operations, as combinators, and it turns out we can compile the simply typed lambda calculus into the above theory. This is exactly what the Categorical Abstract Machine did. The “Caml” in “O’Caml” stands for “Categorical Abstract Machine Language”, though O’Caml no longer uses the CAM. Conversely, every term of the theory can be expressed as a simply typed lambda term. This means we can view the simply typed lambda calculus as just a different presentation of the theory.
At this point, this presentation of category theory starts to connect to the mainstream categorical literature on universal algebra, internal languages, sketches, and internal logic. This page gives a synopsis of the relationship between type theory and category theory. For some reason, it is unusual to talk about the internal language of a plain category, but that is exactly what we’ve done here.
I haven’t talked about finite limits or colimits beyond products and coproducts, nor have I talked about even the infinitary versions of products and coproducts, let alone arbitrary limits and colimits. These can be handled the same way as products and coproducts. Formulating a language like signatures or the simply typed lambda calculus is a bit more complicated, but not that hard. I may make a follow-up article covering this among other things. I also have a side project (don’t hold your breath), that implements the internal language of a category with finite limits. The result looks roughly like a simple version of an algebraic specification language like the OBJ family. The RING
theory described in the Maude manual gives an idea of what it would look like. In fact, here’s an example of the current actual syntax I’m using.^{3}
theory Categories
type O
type A
given src : A -> O
given tgt : A -> O
given id : O -> A
satisfying o:O | src (id o) = o, tgt (id o) = o
given c : { f:A, g:A | src f = tgt g } -> A
satisfying (f, g):{ f:A, g:A | src f = tgt g }
| tgt (c (f, g)) = tgt f, src (c (f, g)) = src g
satisfying "left unit" (o, f):{ o:O, f:A | tgt f = o }
| c (id o, f) = f
satisfying "right unit" (o, f):{ o:O, f:A | src f = o }
| c (f, id o) = f
satisfying "associativity" (f, g, h):{ f:A, g:A, h:A | src f = tgt g, src g = tgt h }
| c (c (f, g), h) = c (f, c (g, h))
endtheory
It turns out this is a particularly interesting spot in the design space. The fact that the theory of theories with finite limits is itself a theory with finite limits has interesting consequences. It is still relatively weak though. For example, it’s not possible to describe the theory of fields in this language.
There are other directions one could go. For example, the internal logic of monoidal categories is (a fragment of) ordered linear logic. You can cross this bridge either way. You can look at different languages and consider what categorical structure is needed to support the features of the language, or you can add features to the category and see how that impacts the internal language. The relationship is similar to the source language and a core/intermediate language in a compiler, e.g. GHC Haskell and System Fc.
If you’ve looked at category theory at all, you can probably make most of the connections without me telling you. The table below outlines the mapping, but there are some subtleties. First, as a somewhat technical detail, my definition of a theory corresponds to a small category, i.e. a category which has a set of objects and a set of arrows. For more programmer types, you should think of “set” as Set
in Agda, i.e. similar to the *
kind in Haskell. Usually “category” means “locally small category” which may have a proper class of objects and between any two objects a set of arrows (though the union of all those sets may be a proper class). Again, for programmers, the distinction between “class” and “set” is basically the difference between Set
and Set1
in Agda.^{4} To make my definition of theory closer to this, all that is necessary is instead of having a set of function symbols, have a family of sets indexed by pairs of objects. Here’s what a partial definition in Agda of the two scenarios would look like:
-- Small category (the definition I used)
record SmallCategory : Set1 where
field
: Set
objects : Set
arrows : arrows -> objects
src : arrows -> objects
tgt ...
-- Locally small category
record LocallySmallCategory : Set2 where
field
: Set1
objects : objects -> objects -> Set
hom ...
-- Different presentation of a small category
record SmallCategory' : Set1 where
field
: Set
objects : objects -> objects -> Set
hom ...
The benefit of the notion of locally small category is that Set
itself is a locally small category. The distinction I was making between interpretations into theories and interpretations into Set was due to the fact that Set wasn’t a theory. If I used a definition theory corresponding to a locally small category, I could have combined the notions of interpretation by making Set a theory. The notion of a small category, though, is still useful. Also, an interpretation into Set corresponds to the usual notion of a model or semantics, while interpretations into other theories was a less emphasized concept in traditional model theory and universal algebra.
A less technical and more significant difference is that my definition of a theory doesn’t correspond to a category, but rather to a presentation of a category, from which a category can be generated. The analog of arrows in a category is terms, not function symbols. This is a bit more natural route from the model theory/universal algebra/programming side. Similarly, having an explicit collection of equations, rather than just an equivalence relation on terms is part of the presentation of the category but not part of the category itself.
model theory | category theory |
---|---|
sort | object |
term | arrow |
function symbol | generating arrow |
theory | presentation of a (small) category |
collage | collage, cograph of a profunctor |
bridge | heteromorphism |
signature | presentation of a (small) category with finite products |
interpretation into sets, aka models | a functor into Set, a (co)presheaf |
interpretation into a theory | functor |
homomorphism | natural transformation |
simply typed lambda calculus (with products) | a cartesian closed category |
In some ways I’ve stopped just when things were about to get good. I may do a follow-up to elaborate on this good stuff. Some examples are: if I expand the definition so that Set becomes a “theory”, then interpretations also form such a “theory”, and these are often what we’re really interested in. The category of finite-product preserving interpretations of the theory of groups essentially is the category of groups. In fact, universal algebra is, in categorical terms, just the study of categories with finite products and finite-product preserving functors from them, particularly into Set. It’s easy to generalize this in many directions. It’s also easy to make very general definitions, like a general definition of a free algebraic structure. In general, we’re usually more interested in the interpretations of a theory than the theory itself.
While I often do advocate thinking in terms of internal languages of categories, I’m not sure that it is a preferable perspective for the very basics of category theory. Nevertheless, there are a few reasons for why I wrote this. First, this very syntactical approach is, I think, more accessible to someone coming from a programming background. From this view, a category is a very simple programming language. Adding structure to the category corresponds to adding features to this programming language. Interpretations are denotational semantics.
Another aspect about this presentation that is quite different is the use and emphasis on collages. Collages correspond to profunctors, a crucially important and enabling concept that is rarely covered in categorical introductions. The characterization of profunctors as collages in Vaughn Pratt’s paper (not using that name) was one of the things I enjoyed about that paper and part of what prompted me to start writing this. In earlier drafts of this article, I was unable to incorporate collages in a meaningful way as I was trying to start from profunctors. This approach just didn’t add value. Collages just looked like a bizarre curio and weren’t integrated into the narrative at all. For other reasons, though, I ended up revisiting the idea of a heteromorphism. My (fairly superficial) opinion is that once you have the notion of functors and natural transformations, adding the notion of heteromorphisms has a low power-to-weight ratio, though it does make some things a bit nicer. Nevertheless, in thinking of how best to fit them into this context, it was clear that collages provided the perfect mechanism (which isn’t a big surprise), and the result works rather nicely. When I realized a fact that can be cryptically but compactly represented as |\mathcal{K}_\mathcal{T} \simeq \mathbb{I} \times \mathcal{T}| where |\mathbb{I}| is the interval category, i.e. two objects with a single arrow joining them, I realized that this is actually an interesting perspective. Since most of this article was written at that point, I wove collages into the narrative replacing some things. If, though, I had started with this perspective from the beginning I suspect I would have made a significantly different article, though the latter sections would likely be similar.
It’s actually better to organize this as a family of collections of function symbols indexed by pairs of sorts.↩︎
Instead of having equations that generate an equivalence relation on (raw) terms, we could simply require an equivalence relation on (raw) terms be directly provided.↩︎
Collaging is actually quite natural in this context. I already intend to support one theory importing another. A collage is just a theory that imports two others and then adds function symbols between them.↩︎
For programmers familiar with Agda, at least, if you haven’t made this connection, this might help you understand and appreciate what a “class” is versus a “set” and what “size issues” are, which is typically handled extremely vaguely in a lot of the literature.↩︎