Reference: John Preskill’s notes on *Quantum Information and Computation*.

In quantum mechanics, a *state* is a complete description of a physical system. Mathematically, it is given by a ray (an equivalence class of vectors) in *Hilbert space*, , which is a vector space endowed with an inner product, and which is complete with respect to the norm induced by the latter. Simply put, Hilbert space is the abstract vector space in which quantum states “live”.

Hilbert spaces can be real or complex, finite- or infinite-dimensional; for definiteness we’ll assume complex, finite-dimensional Hilbert spaces here. The inner product is simply a map that associates an element of the field to pairs of elements in the vector space; in this case, . In Dirac’s bra-ket notation, vectors (states) in are denoted , and dual vectors (operators, which act as linear functionals on the states) are denoted . The properties of the inner product may then be written as follows:

- Positivity: , with equality iff .
- Linearity: .
- Skew symmetry: .

The inner product induces a norm, , which defines the distance between states in . Any inner product space with such a distance function is a metric space, also known as a pre-Hilbert space. The aforementioned completeness criterion is what elevates a pre-Hilbert space to a Hilbert space: a pre-Hilbert space is complete if every Cauchy sequence converges with respect to the norm to an element in the space (intuitively, there are no “missing points”). The completeness criterion is important for infinite-dimensional Hilbert spaces, where it ensures the convergence of eigenfunction expansions that one encounters in, e.g., Fourier analysis.

Note that we are free to choose the normalization , since this merely amounts to choosing a representative of the equivalence class of vectors that differ by a nonzero complex scalar. In this sense, both and represent the same state; only relative phase changes between states in a superposition are physically meaningful.

Although states are the basic mathematical objects in this formalism, we never measure them. Rather, we measure *observables*, which are self-adjoint (a.k.a. Hermitian) operators that act as linear maps on states, . Such an operator has a spectral representation, meaning that its eigenstates form a complete orthonormal basis in . This allows us to write an observable as

where is the orthogonal projection onto the space of eigenvectors with eigenvalue (such orthogonal projections can be proven to exist for complete inner product spaces, e.g., ). In the simple case where is non-degenerate, is the projection onto the corresponding eigenvector, . Of course, given the unit normalization above, the projection operators satisfy and . (Note that the spectral theorem is more subtle for unbounded operators in infinite-dimensional spaces, but that will not concern us here).

The numerical result of a measurement in quantum mechanics is given by the eigenvalue of the observable in question, . This implies that the system must, at the instant of measurement, be in an eigenstate of with the measured eigenvalue. If the quantum state immediately prior to a measurement is , then the outcome is obtained with probability

and the normalized quantum state with eigenvalue is therefore

This is the point at which probability notoriously enters the picture; we’ll have more to say about this later. Note that, since the system is now in an eigenstate, immediately repeating the measurement will yield the same eigenvalue with probability 1.

The fact that the measurement process appears to induce such decisiveness on the part of the state leads to the notion of “wave function collapse”, which is a horribly misleading oversimplification to which we will return. Suffice to say that wave functions don’t collapse, but we’ve a bit more math to cover before the reality can be made precise.

So much for states and observables. What about dynamics? As in classical mechanics, the Hamiltonian is the generator of time translations, and its expectation value gives the energy of the state. The latter is a measurable quantity, which implies that in order to be a well-defined physical observable, the Hamiltonian operator must be self-adjoint, . By Stone’s theorem, the exponential of a self-adjoint operator is unitary; thus if , then is a bounded linear operator on that satisfies . Mathematically, this is why time evolution in quantum mechanics is unitary. Physically, this is simply the statement that time evolution preserves the inner product; i.e., that probabilities continue to sum to 1 (since, under time-evolution by a Hermitian operator , ). Note that here we’re implicitly assuming that is time-independent, in order to write the time translation operator as the exponential thereof. For time-dependent cases, one can still show perturbatively in , it’s just less elegant.

Given such an operator , the evolution of a state over some finite interval is unitary, and may be written

where in the second equality we’ve assumed to be time-independent. If we then consider an infinitesimal transformation and expand both the left- and right-hand sides to first order, we have

Comparing terms at linear order, we recognize the Schrödinger equation,

which describes the evolution of states in the *Schrödinger picture*, wherein states are time-dependent while operators (including observables) are constants. This is precisely the opposite of the *Heisenberg picture*, wherein operators carry the time-dependence while states are constant. The two pictures are related by a change of basis, analagous to the relation between active and passive transformations. A third picture, the *interaction picture*, is often later introduced as a rather ham-fisted compromise between these two; it forms a fantastically successful premise for perturbation theory, but it doesn’t actually exist (see Haag’s theorem).

Note that unitary evolution, as encapsulated in the Schrödinger equation, is entirely *deterministic*: specification of an initial state allows us to predict the state at any future time. But as described above, measurement is *probabilistic*: despite our infinite ability to predict future states, we cannot make definite predictions about measurement outcomes. One of the deepest (and most controversial) aspects of quantum mechanics is how deterministic evolution can nonetheless lead to probabilistic outcomes. Preskill quite aptly refers to this juxtaposition as a “disconcerting dualism”, and we shall return to it below.

Another interesting observation is that according to the Schrödinger equation, quantum mechanical evolution is linear, in contrast to the non-linear evolution often encountered in classical theories. This is tied up with the issue of probability above: probability theory is fundamentally linear. But the connection isn’t quite so straightforward.

As an aside, the linearity of quantum mechanics is why quantum chaos is so subtle. Naïvely, quantum systems should be incapable of supporting chaos, since small perturbations to the initial state don’t wildly change the evolution in the case of linear dynamics. However, two states which are close in Hilbert space can nonetheless yield wildly different measurements. Quantum chaos has important implications in a number of areas, particularly holography and black holes; but that’s a subject for another post.

So, to summarize, we’ve seen that in quantum mechanics, states are vectors in Hilbert space, observables are Hermitian operators, symmetries are unitary operators, and measurements are orthogonal projections.

Now here’s the kicker: everything we’ve said so far applies only to a single, isolated system. This is an idealization that simply does not exist. Another way to phrase this is that the formulation above only holds if applied to *the entire universe*. Even ignoring the issue of how one would make a measurement in such a scenario, this is clearly not a realistic description. In fact, it’s frankly wrong: in general (that is, when considering subsystems) states are *not* rays, measurements are *not* orthogonal projections, and evolution is *not* unitary!

The simplest extension of the above is to consider a bipartite system, the Hilbert space for which is a tensor product of the Hilbert spaces of the constituents, . Given an orthonormal basis for and for , an arbitrary *pure state* of can be expanded as

where, by unitarity, the eigenvalues satisfy . We’ve referred to this as a pure state in contrast to a *mixed state*; the former correspond to rays in the total Hilbert space, while the latter do not. This is the first crucial correction alluded to above.

Let us now consider an observable that acts only on subsystem , . Its expectation value is

where we’ve introduced the *reduced density matrix*

In contrast to the trace, which is a scalar-valued function given by the sum of eigenvalues, the partial trace w.r.t. is an operator-valued function given by summing over the basis elements of :

The elegant expression for the expectation value above then follows by the cyclic property of the trace.

The reduced density matrix will play a central role in what follows, so it’s worth elaborating the properties that follow from the definition above (in particular the explicit form (10)):

- Hermiticity: .
- Non-negativity (of its eigenvalues): , .
- Unit norm: (since is normalized).

As mentioned above, *pure states* are rays in Hilbert space, but *mixed states* are not. However, both are described by a reduced density matrix, which therefore provides a suitably general definition of quantum states. In the case of a pure state, , which is the projection operator onto the state (that is, onto the one-dimensional space spanned by ). The density matrix for a pure state is therefore idempotent, . In contrast, for a general (mixed) state in the diagonal basis ,

where the eigenvalues satisfy and . It follows that a pure state has only a single non-zero eigenvalue, which must be 1, while a mixed state contains two or more terms in the sum (and ).

As alluded above, in a *coherent* superposition of states, the relative phase is physically meaningful (i.e., observable). This is merely a consequence of the linearity of the Schrödinger equation: any linear combination of solutions is also a solution. In contrast, the mixed state is an *incoherent* superposition of eigenstates , meaning that the relative phases are experimentally unobservable. This gives rise to the concept of *entanglement*: when two systems and interact, they become entangled (i.e., correlated). This destroys the coherence of the original states such that some of the phases in the superposition become inaccessible if we measure alone. Henceforth we will reserve the unqualified “superposition” to refer to the former case.

We should note that probability again enters this updated picture when we consider that the expectation value of any observable acting on the subsystem described by is

which leads to the interpretation of as describing a statistical *ensemble* of pure states each of which occurs with probability . But we’re not quite ready to address the associated interpretive questions just yet.

As a concrete example, consider the spin state

which is a (coherent) superposition of spins along the -axis. Measuring the spin along the x-axis will result in or with probability each; e.g., from (2):

In contrast, the ensemble in which each of these states occurs with this probability is

But since the identity is invariant under a unitary change of basis (), we can obtain the state along an arbitrary axis by applying a suitable unitary transformation to without changing the right-hand side. As a consequence, measuring the spin along *any* axis yields a completely random result:

In other words, we obtain spin up or down with equal probability, regardless of what we do. This is a reflection of the fact that the relative phases in a superposition are observable, but those in an ensemble are not. A mixed state can thus be thought of as an ensemble of pure states in many different ways, all of which are experimentally indistinguishable. (As an aside, further clarity on these relationships can be gained by studying the *Bloch sphere*, which I shall not digress upon here).

A bipartite pure state can be expressed in a standard form, which is often very useful. One begins by observing that an arbitrary state may be expanded as

where and are the orthonormal bases defined in (7), and in the second equality we’ve defined a new basis . *A priori*, need not be orthonormal. However, since is, we are free to choose it such that is diagonal (cf. (11)), in which case we can write the reduced density matrix that describes subsystem alone as

However, by definition (9), this is also equivalent to tracing out system ,

And therefore, it must be the case that

i.e., the new basis is orthogonal after all! Furthermore, by simply rescaling the vectors by , we find that we can express the bipartite state (17) as

which is the *Schmidt decomposition* of the bipartite pure state in terms of a particular orthonormal basis of . Note that our derivation was completely general; any bipartite pure state can be expressed in this form, though of course the particular orthonormal basis employed will depend on the state (that is, we can’t simultaneously expand and using the same orthonormal basis for ).

Observe that by tracing over one of the Hilbert spaces in (21), we find that both and have the same nonzero eigenvalues, e.g.,

though since the dimensions of and need not necessarily be equal, the number of zero eigenvalues can still differ. The fact that and have no degenerate non-zero eigenvalues implies that they uniquely determine the Schmidt decomposition: one can diagonalize the reduced density matrices, and then pair up eigenstates with the same eigenvalue to determine (21). (There is still the potential for ambiguity in the basis if either or individually has degenerate eigenvalues—to wit, which gets paired with which ).

The Schmidt decomposition is useful for characterizing whether pure states are *separable* or *entangled* (for mixed states, the situation is more subtle). In particular, the bipartite pure state above, is separable iff there is only one non-zero Schmidt coefficient . Otherwise, the state is entangled. If all the Schmidt coefficients are equal (and non-zero), then the state is *maximally entangled*. On account of this classification, it is common to associated a *Schmidt number* to the state , which is the number of non-zero eigenvalues (equivalently, the number of terms) in the decomposition (note that this implies the Schmidt number is a positive integer). Thus a pure state is separable iff its Schmidt number is 1. In this case we can write it as a direct product of states in and , which further implies that and are each pure. In contrast, an entangled state, with Schmidt number greater than 1, has no such direct product expression, in which case and are mixed.

Entanglement is quantified by the *von Neumann entropy*. Entanglement entropy is a tremendously rich topic in itself, to say nothing of its connections to other areas of physics, and thus we defer further discussion elsewhere.

To summarize, it is only in the case for idealized, isolated systems (i.e., the entire universe) that quantum states may be described by rays in Hilbert space. In reality, since we always deal with subsystems, states are given by (reduced) density matrices defined by tracing out the complement of the Hilbert space under consideration. (The Hilbert spaces themselves are associated with spatial regions, and thus what we mean is that we trace over all degrees of freedom localized in the complement of our subregion. As mentioned elsewhere however, this is generally still too naïve).

It remains to justify our earlier claim that generic measurements are not orthogonal projections, and evolution non-unitary. In the course of doing so, we shall resolve Preskill’s “disconcerting dualism” between determinism and probability, and explain why the notion of wave-function collapse is an illusion. But this will require us to develop slightly beyond the basic mathematical machinery above, and as such we partition the discussion into Part 2.