In this post we look at the idea that matrices can be viewed as functions. We bother doing this because it puts some of the later ideas in context and allows us to re-use existing ideas (injectivity, surjectivity, and bijectivity) to think about matrices.

In this post I'd like to be explicit about an idea which may or may not be covered by your linear algebra material. I find that keeping this idea in mind can really help motivate the ways in which we talk about matrices and why we care about looking for inverses. The idea is summed up as follows:

An \(m \times n\) matrix \(\AA\) times a vector \(\xx\) can be thought of as a function of that vector which produces a new vector, i.e., \(\AA : \mathbb{R}^n \to \mathbb{R}^m\).

If matrices are functions, then matrix multiplication (i.e., successive application of a matrix) is analogous to function composition:

\[ \begin{align*} \AA\BB\CC &= \AA \circ \BB \circ \CC \\ \AA\BB\CC\xx &= \AA(\BB(\CC(\xx))). \end{align*} \]

And just as function composition is associative, so is matrix multiplication:

\[ \begin{align*} f \circ (g \circ h) &= (f \circ g) \circ h; \\ \AA(\BB\CC) &= (\AA\BB)\CC. \end{align*} \]

And lastly, the square identity matrix \(\II\) is just the identity function for the appropriate vector domain.

We spend a lot of time trying to determine whether matrices are invertible. We learn about quick and easy ways to spot flaws that tell us whether this is impossible for a given matrix, or a process to find the inverse if it exists. Why do we care about inverses?

If we're given an equation \(f(x) = y\) and we're asked to solve for \(x\), the process of finding \(x\) essentially leads us to find the inverse of \(f\), written \(f^{-1}\), such that if \(f(x) = y\), then \(x = f^{-1}(y)\). If we have such a function \(f^{-1}\), then in the future if we're asked to find some \(x\) given a \(y\), we can just apply the inverse to compute \(x\) directly rather than algebraically manipulating \(f(x) = y\) until we can solve for \(x\). The same principle applies for matrices.

In previous posts we looked at the process for solving a system of equations in the form \(\AA\xx = \bb\). The process is a bit laborious and can be computationally expensive if \(\AA\) has many rows and columns. But if we think of matrices as functions, then we should end up with

\[ \begin{align*} \AA(\xx) &= \bb; \\ \AA^{-1}(\bb) &= \xx. \end{align*} \]

That would make solving such systems straightforward, and we could reuse \(\AA^{-1}\) instead of repeating the elimination process for every \(\AA\xx = \bb\). But not all functions are invertible, and the same is true for matrices; a function is invertible if and only if it is both *surjective* and *injective*. Now we'll look at these properties in the context of matrices.

A function is *surjective* if the function maps inputs to all elements of its *codomain* without leaving any out. An example of a surjective function is \(f : \mathbb{R} \to \mathbb{R} = x + 1\): the entirety of the codomain \(\mathbb{R}\) is covered by \(f\), i.e., there are no values of \(y\) in \(\mathbb{R}\) that don't have some corresponding \(x\) in \(\mathbb{R}\). An example of a function which is *not* surjective is \(f : \mathbb{R} \to \mathbb{R} = |x|\). There are an infinite number of values in \(\mathbb{R}\) that have no corresponding \(x\): all \(y < 0\).

A surjective \(m \times n\) matrix is one which takes all its input \(n\)-vectors into the entirety of \(\mathbb{R}^m\). An example of such a matrix is the \(n \times n\) identity matrix \(\II_n\): any vector in \(\mathbb{R}^n\) can be expressed as a sum of the column vectors in \(\II_n\), so we can find some input vector \(\xx\) to which it corresponds. (Easily: \(\II\xx = \xx\) for any \(\xx\).)

An example of a matrix which is *not* surjective is a matrix whose columns combine to form a subset of \(\mathbb{R}^m\), such as a line or a plane, e.g.,

\[ \AA = \mat{1 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1}. \]

No matter the input \(x\), the above matrix \(\AA\) will yield only vectors in the \(x\)-\(z\) plane. Vectors such as \(\bb = \rxyz{1}{2}{1}\) have no corresponding \(\xx\).

A function is *injective* if it maps each input \(x\) to a *unique* value \(y\), i.e., \(f(x_1) = f(x_2) \implies x_1 = x_2\).

An injective \(m \times n\) matrix \(\AA\) behaves the same way: for any result vector \(\bb\), there is *at most* a single \(\xx\) such that \(\AA\xx = \bb\).

An example of an injective matrix is \(\II_n\): for any right hand side \(\bb\) there is exactly one \(\xx\) that maps to it. An example of a matrix which is *not* injective is, again, \(\AA\) given above in the surjective case: an infinitude of vectors \(\xx = \rxyz{a}{-a}{1}\) for any \(a\) map to the same vector \(\xx = \rxyz{0}{0}{1}\).

If a function is both injective and surjective, it can be inverted. So it is with matrices: all of the codomain \(\mathbb{R}^m\) is covered and every vector in \(\mathbb{R}^m\) has a unique corresponding vector in \(\mathbb{R}^n\).

A lot of material in linear algebra texts deals with the various ways in which matrices fail to be invertible. But once we have the inverse \(\AA^{-1}\) of a matrix \(\AA\), it's trivial to solve a system of equations involving \(\AA\). Given \(\AA\xx = \bb\), we can just multiply \(\AA^{-1}\bb\) to obtain \(\xx\).

Now that we have drawn some parallels between matrices and functions, we should be able to better understand upcoming material. Next time I'll resume the series with coverage of elimination as matrix multiplication.