Information geometry (part 1/3)

Information geometry is a rather interesting fusion of statistics and differential geometry, in which a statistical model is endowed with the structure of a Riemannian manifold. Each point on the manifold corresponds to a probability distribution function, and the metric is governed by the underlying properties thereof. It may have many interesting applications to (quantum) information theory, complexity, machine learning, and theoretical neuroscience, among other fields. The canonical reference is Methods of Information Geometry by Shun-ichi Amari and Hiroshi Nagaoka, originally published in Japanese in 1993, and translated into English in 2000. The English version also contains several new topics/sections, and thus should probably be considered as a “second edition” (incidentally, any page numbers given below refer to this version). This is a beautiful book, which develops the concepts in a concise yet pedagogical manner. The posts in this three-part series are merely the notes I made while studying it, and aren’t intended to provide more than a brief summary of what I deemed the salient aspects.

Chapter 1 provides a self-contained introduction to differential geometry which — as the authors amusingly allude — is far more accessible (dare I say practical) than the typical mathematician’s treatise on the subject. Since this is familiar material, I’ll skip ahead to chapter 2, and simply recall necessary formulas/notation as needed.

Chapter 2 introduces the basic geometric structure of statistical models. First, some notation: a probability distribution is a function {p:\mathcal{X}\rightarrow\mathbb{R}} which satisfies

\displaystyle p(x)\geq0\quad\forall x\in\mathcal{X}\quad\mathrm{and}\quad\int\!\mathrm{d} x\,p(x)=1~, \ \ \ \ \ (1)

where {\mathcal{X}=\mathbb{R}^n}. If {\mathcal{X}} is a discrete set (which may have either finite or countably infinite cardinality), then the integral is instead a sum. Each {p} may be parametrized by {n} real-valued variables {\xi=[\xi^i]=[\xi^1,\ldots,\xi^n]}, such that the family {S} of probability distributions on {\mathcal{X}} is

\displaystyle S=\{p_\xi=p(x;\xi)\,|\,\xi=[\xi^i]\in\Xi\}~, \ \ \ \ \ (2)

where {\Xi\subset\mathbb{R}^n} and {\xi\mapsto p_\xi} is injective (so that the reverse mapping, from probability distributions {p} to the coordinates {\xi} is a function). Such an {S} is referred to as an {n}-dimensional statistical model, or simply model on {\mathcal{X}}, sometimes abbreviated {S=\{p_\xi\}}. Note that I referred to {\xi} as the coordinates above: we will assume that the map {\Xi\rightarrow\mathbb{R}} provided by {\xi} is {C^\infty}, so that we may take derivatives with respect to the parameters, e.g., {\partial_ip(x;\xi)}, where {\partial_i\equiv\frac{\partial}{\partial\xi^i}}. Additionally, we assume that the order of differentiation and integration may be freely exchanged; an important consequence of this is that

\displaystyle \int\!\mathrm{d} x\,\partial_i p(x;\xi)=\partial_i\!\int\!\mathrm{d} x\,p(x;\xi)=\partial_i1=0~, \ \ \ \ \ (3)

which we will use below. Finally, we assume that the support of the distribution {\mathrm{supp}(p)\equiv\{x\,|\,p(x)>0\}} is independent of (that is, does not vary with) {\xi}, and hence may redefine {\mathcal{X}=\mathrm{supp}(p)} for simplicity. Thus, the model {S} is a subset of

\displaystyle \mathcal{P}(S)\equiv\left\{p:\mathcal{X}\rightarrow\mathbb{R}\,\bigg|\,p(x)>0\;\;\forall x\in\mathcal{X},\;\;\int\!\mathrm{d} x\,p(x)=1\right\}~. \ \ \ \ \ (4)

Considering parametrizations which are {C^\infty} diffeomorphic to one another to be equivalent then elevates {S} to a statistical manifold. (In this context, we shall sometimes conflate the distribution {p_\xi} with the coordinate {\xi}, and speak of the “point {\xi}”, etc).

As a concrete example, consider the normal distribution:

\displaystyle p(x;\xi)=\frac{1}{\sqrt{2\pi}\sigma}\exp\left[-\frac{\left(x-\mu\right)^2}{2\sigma^2}\right]~. \ \ \ \ \ (5)

In this case,

\displaystyle \mathcal{X}=\mathbb{R}~,\;\;n=2,\;\;\xi=[\mu,\sigma],\;\;\Xi=\{[\mu,\sigma]\,|\,\mu\in\mathbb{R},\;\sigma\in\mathbb{R}^+\}~. \ \ \ \ \ (6)

Other examples are given on page 27.

Fisher information metric

Now, given a model {S}, the Fisher information matrix of {S} at a point {\xi} is the {n\times n} matrix {G(\xi)=[g_{ij}(\xi)]}, with elements

\displaystyle g_{ij}(\xi)\equiv E_\xi[\partial_i\ell_\xi\partial_j\ell_\xi] =\int\!\mathrm{d} x\,\partial_i\ell(x;\xi)\partial_j\ell(x;\xi)p(x;\xi)~, \ \ \ \ \ (7)


\displaystyle \ell_\xi(x)=\ell(x;\xi)\equiv\ln p(x;\xi)~, \ \ \ \ \ (8)

and the expectation {E_\xi} with respect to the distribution {p_\xi} is defined as

\displaystyle E_\xi[f]\equiv\int\!\mathrm{d} x\,f(x)p(x;\xi)~. \ \ \ \ \ (9)

Though the integral above diverges for some models, we shall assume that {g_{ij}(\xi)} is finite {\forall\xi,i,j}, and furthermore that {g_{ij}:\Xi\rightarrow\mathbb{R}} is {C^\infty}. Note that {g_{ij}=g_{ji}} (i.e., {G} is symmetric). Additionally, while {G} is in general positive semidefinite, we shall assume positive definiteness, which is equivalent to requiring that {\{\partial_1 p_\xi,\ldots,\partial_n p_\xi\}} are linearly independent functions on {\mathcal{X}}. Lastly, observe that eq. (3) may be written

\displaystyle E_\xi[\partial_i\ell_\xi]=0~. \ \ \ \ \ (10)

Via integration by parts, this allows us to express the matrix elements in a useful alternative form:

\displaystyle g_{ij}(\xi)=-E_\xi[\partial_i\partial_j\ell_\xi]~. \ \ \ \ \ (11)

The reason for the particular definition of the Fisher matrix above is that it provides a Riemannian metric on {S}. Recall that a Riemannian metric {g:p\mapsto\langle\cdot,\cdot\rangle_p} is a {(0,2)}-tensor (that is, it maps points in {S} to their inner product on the tangent space {T_pS}) which is linear, symmetric, and positive definite (meaning {\langle X,X\rangle\geq0}, with equality iff {X=0}). In fact, in the natural basis of local coordinates {[\xi^i]}, the Riemannian metric is uniquely determined by

\displaystyle g_{ij}=\langle\partial_i,\partial_j\rangle~, \ \ \ \ \ (12)

whence {g} is called the Fisher metric. Since it’s positive definite, we may define the inverse metric {g^{-1}} corresponding to {G^{-1}(\xi)} such that {g^{ij}g_{jk}=\delta^i_{~k}}. Note that {g_{ij}} is invariant under coordinate transformations, since

\displaystyle \xi\rightarrow\tilde\xi\;\implies\; g_{ij}\rightarrow\tilde g_{ij}=\frac{\partial\tilde\xi^k}{\partial\xi^i}\frac{\partial\tilde\xi^l}{\partial\xi^j}g_{kl}~, \ \ \ \ \ (13)

and hence we may write

\displaystyle \langle X,Y\rangle_\xi=E_\xi[(X\ell)(Y\ell)]\quad\forall X,Y\in T_\xi S~. \ \ \ \ \ (14)

Recalling that {||V||^2=\langle V,V\rangle_p=g_{ij}(p)V^iV^j}, the length of a curve {\gamma:[a,b]\rightarrow S} with respect to g is then

\displaystyle ||\gamma||=\int_a^b\!\mathrm{d} t\left|\frac{\mathrm{d}\gamma}{\mathrm{d} t}\right| =\int_a^b\!\mathrm{d} t\sqrt{g_{ij}\dot\gamma^i\dot\gamma^j}~. \ \ \ \ \ (15)

But before we can properly speak of curves and distances, we must define a connection, which provides a means of parallel transporting vectors along the curve.


In particular, we will be concerned with comparing elements of the tangent bundle, and hence we require a relation between the tangent spaces at different points in {S}, i.e., an affine connection. To that end, let {S=\{p_\xi\}} be an {n}-dimensional model, and define the {n^3} functions {\Gamma_{ij,k}^{(\alpha)}} which map each point {\xi} to

\displaystyle \left(\Gamma_{ij,k}^{(\alpha)}\right)_\xi\equiv E_\xi\left[\left(\partial_i\partial_j\ell_\xi+\frac{1-\alpha}{2}\partial_i\ell_\xi\partial_j\ell_\xi\right)\left(\partial_k\ell_\xi\right)\right]~, \ \ \ \ \ (16)

where {\alpha\in\mathbb{R}}. This defines an affine connection {\nabla^{(\alpha)}} on {S} via

\displaystyle \langle\nabla_{\partial_i}^{(\alpha)}\partial_j,\partial_k\rangle=\Gamma_{ij,k}^{(\alpha)}~, \ \ \ \ \ (17)

where {g=\langle\cdot,\cdot\rangle} is the Fisher metric introduced above. {\nabla^{(\alpha)}} is called the {\alpha}-connection, and accordingly terms like {\alpha}-flat, {\alpha}-affine, {\alpha}-parallel, etc. denote the corresponding notions with respect to this connection.

Let us pause to review some of the terminology from differential geometry above. Recall that the covariant derivative {\nabla} may be expressed in local coordinates as

\displaystyle \nabla_XY=X^i\left(\partial_iY^k+Y^j\Gamma_{ij}^{~~k}\right)\partial_k~, \ \ \ \ \ (18)

where {X=X^i\partial_i}, {Y=Y^i\partial_i} are vectors in the tangent space. If these are basis vectors such that {X^i=Y^i=1}, then {\nabla_{\partial_i}\partial_j=\Gamma_{ij}^{~~k}\partial_k}. The vector {Y} is said to be parallel with respect to the connection {\nabla} if {\nabla Y=0}, i.e., {\nabla_XY=0\;\;\forall X\in TS}; equivalently, in local coordinates,

\displaystyle \partial_iY^k+Y^j\Gamma_{ij}^{~~k}=0~. \ \ \ \ \ (19)

If all basis vectors are parallel with respect to a coordinate system {[\xi^i]}, then the latter is an affine coordinate system for {\nabla}. Connections which admit such an affine parametrization are called flat (equivalently, one says that {S} is flat with respect to {\nabla}).

Now, with respect to a Riemannian metric {g}, one defines {\Gamma_{ij,k}} as above, namely {\Gamma_{ij,k}=\langle\nabla_{\partial_i}\partial_j,\partial_k\rangle=\Gamma_{ij}^{~~l}g_{lk}}. Note that this defines a symmetric connection, i.e., {\Gamma_{ij,k}=\Gamma_{ji,k}}. If, in addition, {\nabla} satisfies

\displaystyle Z\langle X,Y\rangle=\langle\nabla_ZX,Y\rangle+\langle X,\nabla_ZY\rangle\;\; \forall X,Y,Z\in TS~, \ \ \ \ \ (20)

then {\nabla} is a metric connection with respect to {g}. (This is basically a statement about linearity, since affine transformations are linear). This implies that

\displaystyle \partial_kg_{ij}=\Gamma_{ki,j}+\Gamma_{kj,i}~. \ \ \ \ \ (21)

In other words, under a metric connection, parallel transport of two vectors preserves the inner product, hence their significance in Riemannian geometry. Any connection which is both metric and symmetric is Riemannian, of which there are generically an infinite number. However, the natural metrics on statistical manifolds are generically non-metric! Indeed, since

\displaystyle \begin{aligned} \partial_kg_{ij}&=\partial_kE_\xi[\partial_i\ell_\xi\partial_j\ell_\xi]\\ &=E_\xi[\left(\partial_k\partial_i\ell_\xi\right)\left(\partial_j\ell_\xi\right)]+E_\xi[\left(\partial_i\ell_\xi\right)\left(\partial_k\partial_j\ell_\xi\right)]+E_\xi[\left(\partial_i\ell_\xi\right)\left(\partial_j\ell_\xi\right)\left(\partial_k\ell_\xi\right)]\\ &=\Gamma_{ki,j}^{(0)}+\Gamma_{kj,i}^{(0)}~, \end{aligned} \ \ \ \ \ (22)

only the special case {\alpha=0} defines a Riemannian connection {\nabla^{(0)}} with respect to the Fisher metric (though observe that {\nabla^{(\alpha)}} is symmetric for any value of {\alpha}). While this may seem strange from a physics perspective, where preserving the inner product is of prime importance, there’s nothing mathematically pathological about it. Indeed, the more relevant condition — which we’ll see below — is that every point on the manifold have an interpretation as a probability distribution.

Two neat relationships between different {\alpha}-connections are worth noting. First, for any {\beta\neq\alpha}, we have

\displaystyle \Gamma_{ij,k}^{(\beta)}=\Gamma_{ij,k}^{(\alpha)}+\frac{\alpha-\beta}{2}T_{ijk}~, \ \ \ \ \ (23)

where {T_{ijk}} (not to be confused with the unrelated torsion tensor) is a covariant symmetric tensor which maps a point {\xi} to

\displaystyle \left(T_{ijk}\right)_\xi\equiv E_\xi[\partial_i\ell_\xi\partial_j\ell_\xi\partial_k\ell_\xi]~. \ \ \ \ \ (24)

Second, the {\alpha}-connection may be decomposed as

\displaystyle \nabla^{(\alpha)}=(1-\alpha)\nabla^{(0)}+\alpha\nabla^{(1)} =\frac{1+\alpha}{2}\nabla^{(1)}+\frac{1-\alpha}{2}\nabla^{(-1)}~. \ \ \ \ \ (25)

Within this infinite class of connections, {\nabla^{(\pm1)}} play a central role in information geometry, and are closely related to an interesting duality structure on the geometry of {\mathcal{P}\left(\mathcal{X}\right)}. We shall give a low-level introduction to the relevant representations of {S} here, and postpone a more elegant derivation based on different embeddings of {\mathcal{P}} in {\mathbb{R}^{\mathcal{X}}} in the next post. In particular, we’ll define the so-called exponential and mixed families, which are intimately related to the {1}– and {(-1)}-connections, respectively.

Exponential families

Suppose that an {n}-dimensional model {S=\{p_\theta\,|\,\theta\in\Theta\}} can be expressed in terms of {n\!+\!1} functions {\{C,F_1,\ldots,F_n\}} on {\mathcal{X}} and a function {\psi} on {\Theta} as

\displaystyle p(x;\theta)=\exp\left[C(x)+\theta^iF_i(x)-\psi(\theta)\right]~, \ \ \ \ \ (26)

where we’ve employed Einstein’s summation notation for the sum over {i} from {1} to {n}. Then {S} is an exponential family, and {[\theta^i]} are its natural or canonical parameters. The normalization condition {\int\mathrm{d} xp(x;\theta)=1} implies that

\displaystyle \psi(\theta)=\log\int\!\mathrm{d} x\,\exp\left[C(x)+\theta^iF_i(x)\right]~. \ \ \ \ \ (27)

This provides a parametrization {\theta\mapsto p_\theta}, which is {1:1} if and only if the functions {\{C,F_1,\ldots,F_n\}} are linearly independent (which we shall assume henceforth). Many important probabilistic models fall into this class, including all those referenced on page 27 above. The normal distribution (5), for instance, yields

\displaystyle \begin{aligned} C(x)=0~,\;\;F_1(x)=x~,\;\;F_2(x)&=x^2~,\;\;\theta^1=\frac{\mu}{\sigma^2}~,\;\;\theta^2=-\frac{1}{2\sigma^2}~,\\ \psi(\theta)=\frac{\mu^2}{2\sigma^2}+&\ln\left(\sqrt{2\pi}\sigma\right)~. \end{aligned} \ \ \ \ \ (28)

The canonical coordinates {[\theta^i]} are natural insofar as they provide a {1}-affine coordinate system, with respect to which {S} is {1}-flat. To see this, observe that

\displaystyle \partial_i\ell(x;\theta)=F_i(x)-\partial_i\psi(\theta)\;\;\implies\;\; \partial_i\partial_j\ell(x;\theta)=-\partial_i\partial_j\psi(\theta)~, \ \ \ \ \ (29)

where keep in mind that {\partial_i} denotes the derivative with respect to {\theta^i}, not {x}! This implies that

\displaystyle \left(\Gamma_{ij,k}^{(1)}\right)_\theta=E_\theta[\left(\partial_i\partial_j\ell_\theta\right)\left(\partial_k\ell_\theta\right)] =-\partial_i\partial_j\psi(\theta)E_\theta[\partial_k\ell_\theta]=0~. \ \ \ \ \ (30)

Therefore, exponential families admit a canonical parametrization in terms of a {1}-affine coordinate system {[\theta^i]}, with respect to which {S} is {1}-flat. The associated affine connection is called the exponential connection, and is denoted {\nabla^{(e)}\equiv\nabla^{(1)}}.

Mixed families

Now consider the case in which {S} can be expressed as

\displaystyle p(x;\theta)=C(x)+\theta^iF_i(x)~, \ \ \ \ \ (31)

i.e., {S} is an affine subspace of {\mathcal{P}(\mathcal{X})}. In this case {S} is called a {\emph{mixture family}}, with mixture parameters {[\theta^i]}. Note that {\mathcal{P}(\mathcal{X})} itself is a mixture family if {\mathcal{X}} is infinite. The name arises from the fact that elements in this family admit a representative form as a mixture of {n\!+\!1} probability distributions {\{p_0,p_1,\ldots,p_n\}},

\displaystyle p(x;\theta)=\theta^ip_i(x)+\left(1-\sum_{i=1}^n\theta^i\right)p_0(x) =p_0(x)+\sum_{i=1}^n\theta^i\left[p_i(x)-p_0(x)\right]~, \ \ \ \ \ (32)

(i.e., {C(x)=p_0(x)} and {F_i(x)=p_i(x)-p_0(x)}), where {\theta^i>0} and {\sum_i\theta^i<1}. For a mixture family, we have

\displaystyle \partial_i\ell(x;\theta)=\frac{F_i(x)}{p(x;\theta)}\;\;\implies\;\; \partial_i\partial_j\ell(x;\theta)=-\frac{F_i(x)F_j(x)}{p(x;\theta)^2}~, \ \ \ \ \ (33)

which implies that

\displaystyle \partial_i\partial_j\ell+\partial_i\ell\partial_j\ell=0\;\;\implies\;\; \Gamma_{ij,k}^{(-1)}=0~. \ \ \ \ \ (34)

Therefore, a mixture family admits a parametrization in terms of a {(-1)}-affine coordinate system {[\theta^i]}, with respect to which {S} is {(-1)}-flat. The associated affine connection is called the mixture connection, denoted {\nabla^{(m)}\equiv\nabla^{(-1)}}.

In the next post, when we discuss the geometrical structure in more detail, we shall see that {\nabla^{(\pm1)}} are dual connections, which has many interesting consequences.

Why is Fisher special?

As noted above, a given manifold admits infinitely many distinct Riemannian metrics and affine connections. However, a statistical manifold {S} has the property that every point is a probability distribution, which singles out the Fisher metric and {\alpha}-connection as unique. To formalize this notion, we must first introduce the concept of a sufficient statistic.

Let {F:\mathcal{X}\rightarrow\mathcal{Y}} be a map which takes random variables {X} to {Y=F(X)}. Given the distribution {p(x;\xi)} of {X}, this results in the distribution {q(y;\xi)} on {Y}. We then define

\displaystyle r(x;\xi)=\frac{p(x:\xi)}{q\left(F(x),\xi\right)}~,\quad p(x|y;\xi)=r(x;\xi)\delta_{F(x)}(y)~,\quad \mathrm{Pr}(A|y;\xi)=\int_A\!\mathrm{d} x\,p(x|y;\xi)~, \ \ \ \ \ (35)

where {A\subset\mathcal{X}}, and {\delta_{F(x)}(y)} is the delta function at the point {F(x)}, such that {\forall B\subset\mathcal{Y}},

\displaystyle \int_{A\cap F^{-1}(B)}\!\mathrm{d} x\,p(x;\xi) =\int_A\int_B\!\mathrm{d} x\mathrm{d} y\,r(x;\xi)q(y;\xi)\delta_{F(x)}(y) =\int_B\!\mathrm{d} y\,\mathrm{Pr}(A|y;\xi)q(y;\xi)~. \ \ \ \ \ (36)

In other words, the delta function picks out the value of {x} such that {F(x)=y}. The above implies that {\mathrm{Pr}(A|y;\xi)} is the conditional probability of the event {\{A\in X\}}, given {Y=y} (cf. the familiar definition {P(X|Y)=P(X\cup Y)/P(Y)}). If {F} is independent of {\xi}, then {F} is called a sufficient statistic for {S}. In this case, we may write

\displaystyle p(x;\xi)=q\left(F(x);\xi\right)r(x)~, \ \ \ \ \ (37)

i.e., the dependence of {p} on {\xi} is entirely encoded in the distribution {q}. Therefore, treating {p} as the unknown distribution, whose parameter {\xi} one wishes to estimate, it suffices to know only the value {Y=F(x)}, hence the name. Formally, one says that {F} is a sufficient statistic if and only if there exists functions {s:\mathcal{Y}\times\Xi\rightarrow\mathbb{R}} and {t:\mathcal{X}\rightarrow\mathbb{R}} such that

\displaystyle p(x;\xi)=s\left(F(x);\xi\right)t(x)\qquad\forall x,\xi~. \ \ \ \ \ (38)

The significance of this lies in the fact that the Fisher information metric satisfies a monotonicity relation under a generic map {F}. This is detailed in Theorem 2.1 of Amari & Nagaoka, which states that given {S=\{p(x;\xi)\}} with Fisher metric {G(\xi)}, and induced model {S_F\equiv\{q(y;\xi)\}} with matrix {G_F(\xi)}, the difference {\Delta G_\xi\equiv G(\xi)-G_F(\xi)} is positive semidefinite, i.e., {G_F(\xi)\leq G(\xi)}, with equality if and only if {F} is a sufficient statistic. Otherwise, for generic maps, the “information loss” {\Delta G(\xi)=[\Delta g_{ij}(\xi)]} that results from summarizing the data {x} in {y=F(x)} is given by

\displaystyle \Delta g_{ij}(\xi)=E_\xi[\partial_i\ln r(X;\xi)\partial_j\ln r(X;\xi)]~, \ \ \ \ \ (39)

which can be expressed in terms of the covariance with respect to the conditional distribution {p(x|y;\xi)}. This theorem will be important later, when we discuss relative entropy.

Now, if {F} is a sufficient statistic, then (37) implies that {\partial_i\ln p(x;\xi)=\partial_i\ln q\left(F(x);\xi\right)}. But this implies that {g_{ij}}, and by extension {\Gamma_{ij,k}^{(\alpha)}}, are the same for both {S} and {S_F}. Therefore the Fisher metric and {\alpha}-connection are invariant with respect to the sufficient statistic {F}. In the language above, this implies that there is no information loss associated with describing the original distribution {p} by {q}, i.e., that information is preserved under {F}. Formally, this invariance is codified by the following two equations:

\displaystyle \begin{aligned} \langle X,Y\rangle_p&=\langle\lambda_*(X),\lambda_*(Y)\rangle_{\lambda(p)}^{'}~,\\ \lambda_*\left(\nabla_X^{(\alpha)}\right)&={\nabla'}_{\lambda_*(X)}^{(\alpha)}\lambda_*(Y)~, \end{aligned} \ \ \ \ \ (40)

{\forall\;X,Y,Y\in TS}, where the prime denotes the object on {S_F}, {\lambda} is the diffeomorphism from {S} onto {S_F} given by {\lambda(p_\xi)=q_\xi}, and the pushforward {\lambda_*:TS\rightarrow TS_F} is defined by {\left(\lambda_*(X)\right)_{\lambda(p)}=(\mathrm{d}\lambda)_p(X_p)}.

The salient feature of the Fisher metric and {\alpha}-connection is that they are are uniquely characterized by this invariance! This is the thrust of Chentsov’s theorem (Theorem 2.6 in Amari & Nagaoka). Strictly speaking, the proof of this theorem relies on finiteness of {\mathcal{X}}, but — depending on the level of rigour one demands — it is possible to extend this to infinite models via a limiting procedure in which one considers increasingly fine-grained subsets of {\mathcal{X}}. A similar subtlety will arise in our more geometrical treatment of dual structures in the next post. I’m honestly unsure how serious this issue is, but it’s worth bearing in mind that the mathematical basis is less solid for infinite {\mathcal{X}}, and may require a more rigorous functional analytic approach.

Posted in Minds & Machines, Physics | Leave a comment

Disjoint representations and particle ontology

There is a beautiful paper by Clifton and Halvorson [1], which discusses the ontology of particles in quantum field theory using the famous example of Minkowski vs. Rindler quantizations of a free bosonic field. What is especially nice about this paper is that it contains the clearest exposition of the algebraic approach (AQFT), in particular the Gelfand-Naimark-Segal (GNS) construction, I’ve ever encountered. This framework enables them to make the discussion of particles, and physical observables in general, very precise, to wit: that the Minkowski and Rindler vacua induce disjoint GNS representations of the Weyl algebra.

Now, while I prefer to avoid excessive rigor (or rigor as summum bonum), there is truth in Halvorson’s claim [2] that AQFT “is a particularly apt tool for studying the foundations of QFT.” Hence this post will attempt to summarize (and/or copy verbatim) those essential aspects of the GNS construction, as detailed in section 2 of [1], which are necessary to address the question of inequivalent field quantizations. A more thorough introduction to AQFT is, alas, an undertaking for another post.

We begin by introducing the Weyl algebra. This is essentially a more formal/rigorous way of formulating the canonical commutation relations. Consider a (for the moment, finite) classical system with {n} degrees of freedom, which has {2n}-dimensional phase space {S}. Each point in {S} is described by a pair of vectors {\mathbf{a},\mathbf{b}\in\mathbb{R}^n}, whose components parametrize the position and momentum of the system via the canonical variables

\displaystyle x(\mathbf{a})=a^ix_i~,\qquad p(\mathbf{b})=b^ip_i~. \ \ \ \ \ (1)

To quantize the system, we elevate the position and momentum variables to operators on some Hilbert space, and impose the canonical commutation relations

\displaystyle [x(\mathbf{a}),p(\mathbf{b})]=i(\mathbf{a}\cdot\mathbf{b})\mathbf{1}~,\qquad [x(\mathbf{a}),x(\mathbf{a}')]=[p(\mathbf{b}),p(\mathbf{b}')]=0~, \ \ \ \ \ (2)

where {\mathbf{1}} is the {n}-dimensional identity matrix. Of course, the phrase “elevate to operators on Hilbert space” is precisely the sort of cavalier attitude to which mathematical physicists object; and while we theoretical physicists can usually get away with simply dismissing tedious questions about boundedness, representations, and whatnot, in this case we must be (significantly) more precise about what such a procedure entails.

To that end, observe that one can introduce two {n}-parameter families of unitary operators

\displaystyle U(\mathbf{a})\equiv e^{ix(\mathbf{a})}~,\qquad V(\mathbf{b})\equiv e^{ip(\mathbf{b})}~, \ \ \ \ \ (3)

whereupon one can show that the canonical commutation relations are formally equivalent to

\displaystyle \begin{gathered} U(\mathbf{a}) U(\mathbf{a}')=U(\mathbf{a}+\mathbf{a}')~,\qquad V(\mathbf{b}) V(\mathbf{b}')=V(\mathbf{b}+\mathbf{b}')~,\\ U(\mathbf{a}) V(\mathbf{b})=e^{i(\mathbf{a}\cdot\mathbf{b})}V(\mathbf{b}) U(\mathbf{a})~. \end{gathered} \ \ \ \ \ (4)

These are known as the Weyl form of the canonical commutation relations. (As noted in [1], there are some irregular representations in which this equivalence does not rigorously hold; but it’s solid for the standard Schrödinger representation implied above, so this will not concern us).

One nice feature of this language is that one can put position and momentum degrees of freedom on the same footing by introducing the composite Weyl operator

\displaystyle W(\mathbf{a},\mathbf{b})\equiv e^{i(\mathbf{a}\cdot\mathbf{b})/2}V(\mathbf{b})U(\mathbf{a})~, \ \ \ \ \ (5)

whereupon the Weyl form of the canonical commutation relations may be encapsulated in the multiplication rule

\displaystyle W(\mathbf{a},\mathbf{b})W(\mathbf{a}',\mathbf{b}')=e^{-i\sigma[(\mathbf{a},\mathbf{b}),(\mathbf{a}',\mathbf{b}')]/2}W(\mathbf{a}+\mathbf{a}',\mathbf{b}+\mathbf{b}')~, \ \ \ \ \ (6)

where {\sigma[(\mathbf{a},\mathbf{b}),(\mathbf{a}',\mathbf{b}')]\equiv\mathbf{a}'\cdot\mathbf{b}-\mathbf{a}\cdot\mathbf{b}'} is none other than the familiar symplectic form on S. For completeness, one further defines

\displaystyle W(\mathbf{a},\mathbf{b})^*\equiv W(-\mathbf{a},-\mathbf{b})= e^{-i(\mathbf{a}\cdot\mathbf{b})/2}U(-\mathbf{a})V(-\mathbf{b})~. \ \ \ \ \ (7)

The point is then that any representation of the Weyl operators {W(\mathbf{a},\mathbf{b})} on a Hilbert space {\mathcal{H}} (more on what this means below) gives rise to a representation of the Weyl form of the canonical commutation relations, and vice-versa.

We can now drop the restriction to finite-dimensional spaces, and let the phase space {S} be an arbitrary infinite-dimensional vector space equipped with a symplectic form. Then the family {\{W_\pi(f):f\in S\}} of unitary operators acting in some Hilbert space {\mathcal{H}_\pi} satisfies the Weyl relations iff

\displaystyle \begin{gathered} W_\pi(f)W_\pi(g)=e^{-i\sigma(f,g)/2}W_\pi(f+g)~,\qquad\forall f,g\in S~,\\ W_\pi(f)^*=W_\pi(-f)~,\qquad\forall f\in S~. \end{gathered} \ \ \ \ \ (8)

The (self-adjoint) observables of the system are then obtained by taking arbitrarily linear combinations of Weyl operators.

In the above expressions, the subscript {\pi} denotes the representation. The idea behind a representation is to make an abstract algebra more concrete by representing the elements thereof as matrices; i.e., a representation reduces an abstract algebra to a linear algebra, which is often more practical to work with. For example, the Pauli matrices provide a convenient representation of the Lie group {\mathrm{SU}(2)}. In the present context, the representation determines the Hilbert space, i.e., the particle–and, arguably, physical–content of the theory itself. We will return to this important point momentarily, but first we must introduce a bit more machinery.

Trigger warning: this is about to become tedious, involving the introduction of various topologies and the like. I promise it’s important.

Let {\mathcal{F}} be the set of bounded operators (specifically, linear combinations of Weyl operators) acting on {\mathcal{H}_\pi}. One then says that a bounded operator {A} can be uniformly approximated by operators in {\mathcal{F}} iff

\displaystyle \forall\epsilon>0~,~\exists\tilde A\in\mathcal{F}\,:\; ||(A-\tilde A)x||<\epsilon\;\;\forall~\mathrm{unit}~\mathrm{vectors}~x\in\mathcal{H}_\pi~. \ \ \ \ \ (9)

(Note that this essentially imposes the uniform topology, in the sense of uniform convergence of {\tilde A} to {A}, hence the name). Now let {\mathcal{W}_\pi} (not to be confused with non-scripty {W_\pi}!) denote the set of all bounded operators on {\mathcal{H}_\pi} that can be uniformly approximated by elements in {\mathcal{F}}. {\mathcal{W}_\pi} is the {C^*}-algebra generated by the Weyl operators {\{W_\pi(f)\}}. It is important to note that this is only a subalgebra of the algebra of all bounded operators on {\mathcal{H}_\pi}, denoted {\mathcal{B}(\mathcal{H}_\pi)}, which is uniformly closed under adjoints {A\mapsto A^*}.

Now, suppose that we have two systems of Weyl operators generated by {\{W_\pi(f)\}} and {\{W_\phi(f)\}}, which act on {\mathcal{H}_\pi} and {\mathcal{H}_\phi}, respectively, and denote the associated {C^*}-algebras by {\mathcal{W}_\pi}, {\mathcal{W}_\phi}. A bijective mapping {\alpha:\mathcal{W}_\pi\rightarrow\mathcal{W}_\phi} is called a {*}-isomorphism iff {\alpha} is linear, multiplicative, and commutes with the adjoint operation. This sets the stage for the following important result:

Theorem 1 (Uniqueness theorem)
{\exists} a {*}-isomorphism {\alpha:\mathcal{W}_\pi\!\rightarrow\!\mathcal{W}_\phi} such that {\alpha\left(W_\pi(f)\right)\!=\!W_\phi(f)\;\forall f\in S}.

This means that the {C^*}-algebra generated by any system of Weyl operators is in fact representation-independent, and hence we may refer to this abstract algebra simply as the Weyl algebra {\mathcal{W}}. (Implicitly, we mean the Weyl algebra over {(S,\sigma)}, denoted {\mathcal{W}[S,\sigma]}, but we may suppress the arguments henceforth without confusion). Thus the problem of defining the Hilbert space amounts to choosing a representation {(\pi,\mathcal{H}_\pi)} of the Weyl algebra, i.e., the map {\pi:\mathcal{W}\rightarrow\mathcal{B}(\mathcal{H}_\pi)}. We will frequently denote this representation by {\pi(\mathcal{W})}.

In principle, since the Weyl algebra is representation-independent, one could refuse to choose a representation and instead proceed purely abstractly (e.g., defining states as positive normalized linear functionals on {\mathcal{W}}, describing dynamics in terms of a one-parameter group of automorphisms, etc). But representations are more powerful than mere convenience alone would suggest. In particular, the abstract Weyl algebra does not contain unbounded operators, many of which are of physical significance—for example, the total energy, the position & momentum observables in field theory, as well as the total number operator. However, via the introduction of the weak topology below, a representation can be used to extend the observables of the system beyond those contained in the abstract Weyl algebra itself.

Given {\mathcal{F}} as above, one says that a bounded operator {A} is weakly approximated by elements of {\mathcal{F}} iff

\displaystyle \forall\epsilon>0~\mathrm{and}~\forall x\in\mathcal{H}~,~\exists\tilde A\in\mathcal{F}~:~ \left|\langle x,Ax\rangle-\langle x,\tilde Ax\rangle\right|<\epsilon~. \ \ \ \ \ (10)

The important thing to note here is that unlike uniform approximation above, weak approximation requires one to select a representation (in order to evaluate the inner product), and hence has no abstract (representation-independent) counterpart. By von Neumann’s double commutant theorem, the set of bounded operators that can be weakly approximated by elements of {\pi(\mathcal{W})} is {\pi(\mathcal{W})''}, the von Neumann algebra generated by {\pi(\mathcal{W})}. Note that {\pi(\mathcal{W})\subseteq\pi(\mathcal{W})''}, since the latter is the weak closure of the former.

So far so good, but {\pi(\mathcal{W})''} still contains only bounded operators. The final step is to associated unbounded observables with {\pi(\mathcal{W})''} via their spectral projections. Namely, one says that an arbitrary (possibly unbounded) self-adjoint operator {A} on {\mathcal{H}_\pi} is affiliated with {\pi(\mathcal{W})''} iff all of {A}‘s spectral projections lie in {\pi(\mathcal{W})''}. The reason we had to first extend to the von Neumann algebra instead of doing this with the representation {\pi(\mathcal{W})} itself is that {C^*}-algebras do not contain non-trivial projections of their self-adjoint members. In other words, if we want to included unbounded operators, we need to work with the weak closure of the {C^*}-algebra {\pi(\mathcal{W})} (hence my promise above that topology would be important).

Now here’s the kicker, which foreshadows the ontological question underlying this post:

Theorem 2 (Non-uniqueness theorem)
There exist representations {\pi,\phi} of {\mathcal{W}[S,\sigma]} for which there is no {*}-isomorphism {\alpha} from {\pi(\mathcal{W})''} to {\phi(\mathcal{W})''} such that {\alpha\left(W_\pi(f)\right)\!=\!W_\phi(f)\;\forall f\in S}.

Thus the price of extending the set of observables to those affiliated with the von Neumann algebra {\pi(\mathcal{W})''} (that is, including unbounded operators) is the loss of uniqueness. In particular, this occurs when {\pi} and {\phi} are disjoint representations, which therefore leads to physically inequivalent Hilbert spaces! This is precisely what happens in the Minkowski vs. Rindler vacua.

In discussing the conceptual significance of “physically inequivalent” representations, it is necessary to distinguish various mathematical notions of equivalence. This will enable us to define the notion of disjoint representations, the importance of which should be obvious from the title of this post. First however, we must introduce two related concepts: irreducibility and factoriality.

A representation {\pi(\mathcal{W})} is irreducible iff no non-trivial subspace of {\mathcal{H}_\pi} is invariant under the action of all operators in {\pi(\mathcal{W})}. Since an invariant subspace exists iff the projection onto it commutes with {\pi(\mathcal{W})}, irreducibility implies {\pi(\mathcal{W})''=\mathcal{B}(\mathcal{H}_\pi)}. A representation {\phi(\mathcal{W})} is factorial iff the associated von Neumann algebra is a factor, meaning it has trivial center—that is, the only operators in {\phi(\mathcal{W})''} which commute with all other operators are proportional to the identity. (Incidentally, note that this supports the familiar QFT notion that the only non-trivial operator that commutes with all local operators is the identity). Furthermore, since {\mathcal{B}(\mathcal{H}_\pi)} is a factor, the fact that {\pi} is irreducible implies that it is also factorial.

We may now proceed to introduce the following sequence of equivalences: unitarily equivalent {\implies} quasi-equivalent {\implies} weakly equivalent. Two representations {\pi} and {\phi} are unitarily equivalent iff there exists a unitary operator {U} that maps {\mathcal{H}_\pi} isometrically onto {\mathcal{H}_\phi}, such that

\displaystyle U\phi(A)U^{-1}=\pi(A)\;\;\forall A\in\mathcal{W}~. \ \ \ \ \ (11)

The slightly weaker notion of quasi-equivalent is most concisely stated as the existence of a {*}-isomorphism {\alpha} from {\phi(\mathcal{W})''} onto {\pi(\mathcal{W})''} such that {\alpha\left(\phi(A)\right)=\pi(A)\;\forall A\in\mathcal{W}} (cf. Theorem 2). Unitary equivalence is then simply the special case in which {\alpha} is a unitary operator. If both representations are irreducible, then quasi-equivalence also implies unitary equivalence. If two representations are not even quasi-equivalent, they are disjoint.

Before proceeding to weakly equivalent, it is helpful to recast the above in terms of states. Abstractly, a state of a {C^*}-algebra {\pi(\mathcal{W})} is simply a positive normalized linear functional {\omega}. It turns out that some (but not all!) of these abstract states correspond to the familiar density operators from quantum theory; we denote these so-called normalstates by

\displaystyle \omega_\rho(A)\equiv\mathrm{tr}\left(\rho A\right)~,\;\;\forall A\in\pi(\mathcal{W})~. \ \ \ \ \ (12)

The subset of normal states is called the folium of the representation {\pi}, and is denoted {\frak{F}(\pi)}; i.e., {\omega\in\frak{F}(\pi)} iff there exists a density operator {\rho} acting on {\mathcal{H}_\pi} such that

\displaystyle \omega(A)=\mathrm{tr}\!\left(\rho\,\pi(A)\right)~,\;\;\forall A\in\mathcal{W}~. \ \ \ \ \ (13)

The two forms of equivalence introduced above can then be restated more intuitively as follows: {\pi} and {\phi} are quasi-equivalent iff {\frak{F}(\pi)=\frak{F}(\phi)}, and disjoint iff {\frak{F}(\pi)\cap\frak{F}(\phi)=\emptyset}. That is, two representations are disjoint iff they have no normal states in common (in which case, all normal states in one are orthogonal to those in the other).

Now, given the folium {\frak{F}(\pi)} of {\pi(\mathcal{W})}, one says that an abstract state {\omega} in {\mathcal{W}} can be weak{^*} approximated by states in {\frak{F}(\pi)} iff

\displaystyle \forall\epsilon>0~,\exists\omega'\in\frak{F}(\pi)\,:\, |\omega(A_i)-\omega'(A_i)|<\epsilon~,\;\;\forall\{A_i\in\mathcal{W}:i=1,\ldots,n\}~. \ \ \ \ \ (14)

If all states in {\frak{F}(\pi)} can be weak{^*} approximated by states in {\frak{F}(\phi)} (note that this implies the converse), then {\pi} and {\phi} are weakly equivalent.

We are now prepared to state the following important results:

Theorem 3 (Stone-von Neumann uniqueness theorem)
When {S} is finite-dimensional, every regular representation of the Weyl algebra {\mathcal{W}[S,\sigma]} is quasi-equivalent to the Schrödinger representation.

A regular representation {\pi} is one in which the map {t\in\mathbb{R}\mapsto\pi\left(W(tf)\right)\;\forall f\in S} is weakly continuous, which by Stone’s theorem (not to be confused with the above) guarantees the existence of unbounded self-adjoint operators {\{\Phi(f):f\in S\}} on {\mathcal{H}_\pi} with {\pi\left(W(tf)\right)=e^{i\Phi(f)t}}, which are in turn affiliated with {\pi(\mathcal{W})''}. This is the mechanism by which one recovers the usual Schrödinger representation of canonical position and momentum operators on Hilbert space. The Stone-von Neumann theorem is basically the statement that this process is unique: thanks to quasi-equivalence, any classical theory with finitely many degrees of freedom will yield the same quantum mechanical theory. Note the crucial qualifier “finite”: this theorem does not hold in quantum field theory (where {S} is infinite-dimensional). In other words, the failure of the Stone-von Neumann theorem in QFT is what allows one to have disjoint representations, and hence opens the door to the ontological puzzle of inequivalent field quantizations.

However, this isn’t to say that disjoint representations are entirely incompatible, as alluded by the following theorem:

Theorem 4 (Fell’s theorem)
Every abstract state of a {C^*}-algebra {\mathcal{A}} can be weak{^*} approximated by states in {\frak{F}\left(\pi(A)\right)}.

In other words, all representations of {\mathcal{W}} are at least weakly equivalent. Finally, we state the GNS theorem as the grand conclusion to this post:

Theorem 5 (Gelfand-Naimark-Segal theorem)
Any abstract state {\omega} of a {C^*}-algebra {\mathcal{A}} admits a unique (up to unitary equivalence) representation {\left(\pi_\omega,\mathcal{H}_\omega\right)} and a vector {\Omega_\omega\in\mathcal{H}_\pi} such that

\displaystyle \omega(A)=\langle\Omega_\omega,\pi_\omega(A)\Omega_\omega\rangle~,\;\;\forall A\in\mathcal{A}~, \ \ \ \ \ (15)

and the set {\{\pi_\omega(A)\Omega_\omega:A\in\mathcal{A}\}} is dense in {\mathcal{H}_\omega}. Furthermore, {\pi_\omega} is irreducible iff {\omega} is pure.

The triple {\left(\pi_\omega,\mathcal{H}_\omega,\Omega_\omega\right)} is referred to as the GNS representation of {\mathcal{A}} induced by {\omega}, and {\Omega_\omega} is a cyclic vector for this representation. (Note that {\Omega_\omega} is therefore both cyclic and separating, and furnishes a representation of the vacuum state). One consequence of this theorem is that, even if {\omega\notin\frak{F}(\pi)}, there always exists some representation {\phi} with {\omega\in\frak{F}(\phi)}; i.e., every state is normal in some representation.

Thus, the precise statement of inequivalence of the Minkowski and Rindler vacua is that these induce disjoint GNS representations for the Weyl algebra. To see this in detail, let alone address the physical implications, is the subject of another post.


  1. R. Clifton and H. Halvorson, “Are Rindler quanta real? Inequivalent particle concepts in quantum field theory.” Brit. J. Phil. Sci. 52 (2001) 417-470, arXiv:quant-ph/0008030.
  2. H. Halvorson and M. Mueger, “Algebraic Quantum Field Theory.” arXiv:math-ph/0602036.
Posted in Philosophy, Physics | Leave a comment

The Reeh-Schlieder Theorem

The Reeh-Schlieder theorem is perhaps the most notorious result in algebraic quantum field theory, simultaneously one of the least intuitive and most fundamental.

Denote {\mathcal{H}_0} the vacuum sector of Hilbert space, which consists of all states that can be created from the vacuum {\Omega} by local field operators. Note that {\mathcal{H}_0} is not necessarily the full Hilbert space {\mathcal{H}}, which can generically contain other superselection sectors. By “local”, we mean that a field operator {\phi\left( x^\mu\right)} is smeared against some smooth function {f} with finite spacetime support to create the smeared operator {\phi_f=\int\mathrm{d}^{D}x f(x)\phi(x)}, where {D} is the spacetime dimension and {x=(t,\mathbf{x})}. The smearing ensures that the states have finite norm and are thus well-defined members of Hilbert space. Hence states of the form

\displaystyle |\psi_\mathbf{f}\rangle=\phi_{f_1}\ldots\phi_{f_n}|\Omega\rangle \ \ \ \ \ (1)

are sufficient to generate {\mathcal{H}_0}, i.e., any state in {\mathcal{H}_0} can be approximated arbitrarily well by linear combinations of {\psi_\mathbf{f}}. This defines the vacuum sector of the theory.

Classically, one formulates the initial data for a theory on a Cauchy (i.e., complete spacelike) hypersurface {\Sigma}. Quantum mechanically, this is the “time-slice axiom” of QFT [1], and encodes the physical expectation that there exists a dynamical law that enables one to compute fields at arbitrary time given the set of fields at some time slice (e.g., if {\Sigma} is taken to be the surface {t=0}, one could take the functions {f} to have support in some open neighborhood {\mathcal{U}} of {\Sigma}, say {|t|<\epsilon} for some small positive {\epsilon\in\mathbb{R}}). (As a side-comment for holography, note that there are no Cauchy slices in AdS, since data can always sneak in from infinity! This follows from the fact that the boundary is timelike rather than null or spacelike. Hence while any spacelike slice does cover the whole space, there nevertheless exist null geodesics from timelike infinite that do not intersect any point thereupon. However, the Cauchy problem can still be made well-posed within the Einstein static universe, which only covers half the spacetime).

Now consider the vacuum sector on {\Sigma}, that is, the space of states (1) with the support of {f_i} in {\mathcal{U}}. The Reeh-Schlieder theorem (RS) is the remarkable statement that even if we restrict to an arbitrarily small open set {\sigma\subset\Sigma}, and the support of the functions {f_i} to the corresponding neighborhood {\mathcal{U}_\sigma} of {\sigma} in spacetime, the states {\psi_\mathbf{f}} still suffice to generate the vacuum sector {\mathcal{H}_0} of the full theory! This is often phrased as the statement that by acting on the vacuum with operators localized within some small region — say, the room in which you’re reading this — we can create the Moon!

Witten’s proof [2] of this is by contradiction. In particular, if RS were false, then there would exist a state {|\chi\rangle} orthogonal to all {|\psi_\mathbf{f}\rangle} with all {f_i} supported in {\mathcal{U}_\sigma},

\displaystyle \langle\chi|\psi_\mathbf{f}\rangle=0~. \ \ \ \ \ (2)

This equality holds for all functions {f_i} iff it holds without smearing, i.e.,

\displaystyle \langle\chi|\phi(x_1)\ldots\phi(x_n)|\Omega\rangle=0~,\qquad\forall x_i\in\mathcal{U}_\sigma~. \ \ \ \ \ (3)

While this second formula is simpler to deal with, the matrix element of a product of such exactly local fields has singularities given by a function of the {x_i}, and hence it must be rigorously interpreted as a distribution as in (2). We will not bother to restate Witten’s elegant proof here; it can be found in section 2.2 of [2]. The basic idea is to show that if (3) holds for all {x_i\in\mathcal{U}_\sigma}, then it actually holds for all {x_i} in Minkowski spacetime {M_D}. But this implies that {\chi} must vanish, by the definition of the vacuum sector (i.e., the states {\psi_\mathbf{f}} are dense in {\mathcal{H}_0}). Hence only the zero vector is orthogonal to all states created from the vacuum by local operators supported in {\mathcal{U}_\sigma}; i.e., such states are also dense in {\mathcal{H}_0}.

A relevant question is to what extent this relies on the full spacetime being Minkowski, as opposed to some more general spacetime (e.g., AdS). Witten’s proof assumes the existence of a Hamiltonian which annihilates the vacuum state, {H|\Omega\rangle=0}, and is bounded from below by 0 (for holomorphicity arguments). (In fact this first assumption can be weakened: one needs only an energy-momentum operator {P^\mu} such that {\exp\{ic\cdot P\}} is a bounded operator on Hilbert space, where {c} is a general {D}-vector, so that {\exp\{ic\cdot P\}|\Omega\rangle} varies holomorphically with the components {c_i}). One then uses the fact that any point in {M_D} can be reached by zig-zagging back and forth in different timelike directions, and hence the points {x_i} can be shifted anywhere outside {\mathcal{U}_\sigma}. Thus the problem in extending RS to arbitrary globally hyperbolic{^{**}} spacetimes {M} is two-fold: in curved spacetimes, there is no natural analog of the vacuum state, and no natural translation generators {P^\mu}. Nonetheless one expects that an analog of RS should apply; see for example Ian Morrison’s proposed adaptation to AdS [3].

{{}^{**}}Note the restriction to globally hyperbolic spacetimes is intended to ensure we can still define a Cauchy surface. As mentioned above, this doesn’t technically hold in AdS, but the spirit of this requirement — namely, ensuring causality, and avoiding pathologies like closed timelike curves and naked singularities — survives intact, so we’ll follow the masses and consider AdS to be an honorary member of this family.

The Reeh-Schlieder theorem appears disturbing, and has deep consequences for the notion of locality in field theory (more on this later). However, there’s an important caveat involving unitarity here, which we can illustrate following the example from Witten [2]. Consider {\varsigma\subset\Sigma} to be the region of spacetime, spacelike separated from {\sigma}, in which we wish to create the Moon. Define the operator {\mathcal{M}\in\mathcal{U}_\varsigma} to have expectation value 0 in states which do not contain the Moon in region {\varsigma}, and 1 in states that do. Hence

\displaystyle \langle\Omega|\mathcal{M}|\Omega\rangle=0~. \ \ \ \ \ (4)

However, RS implies that states in {\mathcal{U}_\sigma} are dense in {\mathcal{H}_0}, i.e., there exists an operator {a\in\mathcal{U}_\sigma} such that {a\Omega} approximates the state in which {\mathcal{U}_\varsigma} contains the Moon arbitrarily well,

\displaystyle \langle a\Omega|\mathcal{M}|a\Omega\rangle= \langle\Omega|a^\dagger\mathcal{M} a|\Omega\rangle=1~. \ \ \ \ \ (5)

Now, since {a^\dagger\in\mathcal{U}_\sigma}, and {\mathcal{M}} is supported in the spacelike separated region {\mathcal{U}_\varsigma}, these operators commute, and hence (5) becomes

\displaystyle \langle\Omega|\mathcal{M} a^\dagger a|\Omega\rangle=1~. \ \ \ \ \ (6)

If {a} were unitary, then the fact that {a^\dagger a=1} would imply a contradiction between (4) and (6). But RS does not guarantee the existence of a unitary operator in {\mathcal{U}_\sigma} that will create the Moon in the (in principle arbitrarily distance) region {\mathcal{U}_\varsigma}, merely that there exists some operator that will do this. And indeed, we conclude from the above that it is not possible to perform a unitary transformation with support in {\mathcal{U}_\sigma} that effects any change in observables in a spacelike separated region {\mathcal{U}_\varsigma}, since such operators would satisfy {\langle a\Omega|\mathcal{M}|a\Omega\rangle=\langle\Omega|\mathcal{M}|\Omega\rangle}. Note that this example highlights an important relationship between causality and unitarity; we shall comment on this again below.

So, it is not possible for a physical (read: unitary) operation to affect measurements in spacelike separated regions. Rather, the takeaway message of RS is that there are correlations in the vacuum between spacelike separated operators, even in free field theory. In the example above, these manifest in the fact that

\displaystyle \langle\Omega|\mathcal{M} a^\dagger a|\Omega\rangle\neq \langle\Omega|\mathcal{M}|\Omega\rangle\langle\Omega|a^\dagger a|\Omega\rangle~, \ \ \ \ \ (7)

which follows from (4) and (6).

Of course, it’s common knowledge that there are no diffeomorphism-invariance local operators in quantum gravity, and even in gauge theory one already runs into trouble trying to define local operators or factorizing the Hilbert space. But RS highlights the fact that, even in free field theory, Hilbert space still doesn’t factorize! This is why the entanglement entropy of a subregion in free field theory is both infinite and universal (that is, the UV divergence is the same in any state as it is in vacuum): the divergence is not a property of any particular state, but of the fact that {\mathcal{H}\neq\mathcal{H}_\mathcal{U}\otimes\mathcal{H}_{\bar{\mathcal{U}}}}.

Note also that the above example highlights the subtle yet crucial distinction between the related concepts of locality and causality in quantum field theory. The latter is defined as the vanishing of commutators outside the lightcone, and remains intact insofar as we cannot use alter the state in a spacelike separated region by acting with any physical operator. But the concept of locality is badly broken: we can localize free-field operators with appropriately chosen smearing functions, but vacuum correlations imply that the corresponding states are only local in an approximate sense.

Thus one must exercise care in speaking about the localization of states, as distinct from operators, in this context. If by “localized state” one means a state created by operators whose support is restricted to some finite subregion {\mathcal{U}_\sigma}, then RS clearly renders this concept untenable, since we can equally-well create this state by acting with operators in {\mathcal{U}_\varsigma} instead (at the cost of unitarity). Similarly, asserting that localized state is one whose expectation value is non-zero only within some finite region {\sigma} for all operators is equivalent to the claim that {\langle\psi|\!\mathcal{O}\!|\psi\rangle=0\;\;\forall\mathcal{O}\in\mathcal{U}_\varsigma}. But this contradicts the theorem, since we can create any state in {\sigma} by acting in the spacelike separated region {\varsigma}, and thereby reproduce these non-zero expectation values beyond the circumscribed region. For example, let {\psi} be a “localized state” with non-zero expectation value for operators in {\mathcal{U}_\sigma},

\displaystyle \langle\psi|\mathcal{O}_\sigma|\psi\rangle\neq0~, \ \ \ \ \ (8)

but vanishing expectation value for all operators in the spacelike separated region {\mathcal{U}_\varsigma},

\displaystyle \langle\psi|\mathcal{O}_\varsigma|\psi\rangle=0~. \ \ \ \ \ (9)

However, RS implies that operators in {\mathcal{U}_\varsigma} suffice to generate the full (vacuum sector of the) Hilbert space of the theory. Hence there exists an operator {b\in\mathcal{U}_\varsigma} such that the state {b\psi} approximates the state with non-vanishing expectation value arbitrarily well, that is:

\displaystyle \langle b\psi|\mathcal{O}_\sigma|b\psi\rangle= \langle\psi|\mathcal{O}_\sigma b^\dagger b|\psi\rangle= \langle\psi|\mathcal{O}_b|\psi\rangle \neq0~, \ \ \ \ \ (10)

where the operator {\mathcal{O}_b\equiv\mathcal{O}_\sigma b^\dagger b} does not belong to {\mathcal{U}_\sigma}, which contradicts the assertion in (9).

The caveat about unitarity notwithstanding, RS has some interesting implications. For example, the entanglement of the vacuum is what allowed [4] to map the support of precursor states to within a given boundary subregion (dual to a Rindler wedge in the bulk), and hence RS may have implications for bulk reconstruction in this context. It also sheds light on the problem of precursors in AdS/CFT, and for the encoding of bulk information within holographic shadows in general: namely, that this acausal information is encoded in non-unitary operators in the CFT. If correct, this implies that only a limited form of complete bulk reconstruction can ever succeed: RS seems to strengthen the claim that the CFT knows about the entire spacetime via entanglement, but any measurement a boundary observer performs will only be sensitive to timelike or null bulk data (since spacelike data is encoded in non-unitary operators, which are not observables). (EDIT 2021-07-19: I’ve learned a bit since writing this post; see [5], or my later post on the black hole interior for more).


[1] R. Haag, “Local Quantum Physics: Fields, Particles, Algebras.” 1992.

[2] E. Witten, “Notes on Some Entanglement Properties of Quantum Field Theory,” arXiv:1803.04993 [hep-th].

[3]  I. A. Morrison, “Boundary-to-bulk maps for AdS causal wedges and the Reeh-Schlieder property in holography,” JHEP 05 (2014) 053, arXiv:1403.3426 [hep-th].

[4]  B. Freivogel, R. Jefferson, and L. Kabir, “Precursors, Gauge Invariance, and Quantum Error Correction in AdS/CFT,” JHEP 04 (2016) 119, arXiv:1602.04811 [hep-th].

[5] R. Jefferson, “Comments on black hole interiors and modular inclusions,” SciPost Physics 6 no. 4, (Apr, 2019), arXiv:1811.08900 [hep-th].

Posted in Physics | Leave a comment

General covariance, diffeomorphism invariance, and background independence

My attempts to understand the significance of diffeomorphism invariance in general relativity have been hampered by the confusion surrounding active vs. passive transformations, invariance vs. (general) covariance, background independence, etc. This post comprises my ambitious attempt to settle the matter once and for all.

Let’s start with invariance vs. covariance, which is relatively straightforward. A quantity is invariant under a transformation if it remains unchanged; that is, if {F} is a functional of the fields {\phi}, and we make the transformation {\phi\rightarrow\phi'}, then {F[\phi]=F[\phi']} means that {F} is invariant under this transformation. Effectively, invariant quantities transform as scalars.

Covariance is the invariance of the form of physical laws under a given transformation. General covariance is a slight extension of this, in which their form is invariant under arbitrary (differentiable) coordinate transformations. For example, the action of a real scalar field {\phi} is invariant under a Lorentz transformation, while the Klein-Gordon equation is Lorentz covariant (meaning that if {\phi} satisfies the equation of motion, then so will {\phi'}).

On to passive vs. active transformations. A passive transformation is merely a change of coordinates. In the case of the Lorentz group, which takes {x^\mu} to {x'^\mu=\Lambda^\mu_{~\nu}x^\nu}, we define {\phi'(x)=\phi(x')=\phi(\Lambda x)}. In other words, we think of {\phi} and {\phi'} to be the same field configuration, such that the new function in the original coordinates is the same as the original function in the new coordinates.

Active transformations, though ostensibly more abstract, are formally easier to understand. Consider two manifolds {M} and {N}, respectively equipped with coordinate charts {x} and {x'}. Let {\phi: M\rightarrow\mathbb{R}}, and {\phi':N\rightarrow\mathbb{R}}. Now consider a diffeomorphism {f:M\rightarrow N}. Then the original field {\phi} is related to the transformed field {\phi'} via the pullback, {\phi(x)=(f^*\phi')(x)\equiv(\phi'\circ f)(x)}. In general of course, {\phi(x)} and {\phi'(x')} may map to different points in {\mathbb{R}}. But if we demand {\phi'(x')=\phi(x)}, one can see from this picture that this imposes that the new field configuration (on the new manifold) nonetheless maps to the same point in {\mathbb{R}}. (Incidentally, the fact that this image doesn’t work for “passive diffeomorphisms” leads me to think that they don’t exist, i.e., that a diffeomorphism is “active” by definition. In part for this reason, we shall henceforth take the unqualified “diffeomorphism” to mean an active diffeomorphism, and relegate “passive diffeomorphism” to “coordinate transformation”).

In short, {\phi'(x)=\phi(x')=\phi(\Lambda x)} is a passive (Lorentz) transformation, while {\phi'(x')=\phi(x)\implies\phi'(x)=\phi(\Lambda^{-1}x)} is an active (Lorentz) transformation. The former amounts to a mere coordinate redefinition, while the latter specifies an entirely new field configuration; this answer [1] on Stack Exchange contains a helpful illustration of the difference (see also [2]).

In practical terms, the distinction between passive and active transformations amounts to a choice of convention. But when discussing diffeomorphism invariance, general covariance, or background independence in the context of general relativity, the distinction is important, and the failure to accord it due care can lead to a great deal of confusion. In particular, the salient feature of (active) diffeomorphisms is that they generate new metrics, while (passive) coordinate transformations merely re-express the original metric in new terms. To highlight the difference, consider the wave equation in curved spacetime,

\displaystyle \left( g^{\mu\nu}\nabla_\mu\nabla_\nu+\xi R\right)\phi(x)=0~, \ \ \ \ \ (1)

where {\xi} is the coupling constant (where minimal coupling means {\xi=0}). Obviously, a coordinate change will preserve solutions to this equation: they’ll simply be mapped from one coordinate system to another. It is therefore generally covariant. But it is not diffeomorphism invariant, since a diffeomorphism changes the metric {g}. In contrast, in Einstein’s equations

\displaystyle R_{\mu\nu}-\frac{1}{2}Rg_{\mu\nu}+\Lambda g_{\mu\nu}=\frac{8\pi G}{c^4}T_{\mu\nu} \ \ \ \ \ (2)

the metric is the very thing we’re solving for, so (2) is diffeomorphism invariant by construction.

When Einstein first introduced GR, he emphasized the background independence (“no prior geometry”) under the guise of general covariance. But as alluded above, all laws of physics, properly formulated, are generally covariant! Writing the wave equation in Cartesian or spherical coordinates does not alter its content. Thus to emphasize general covariance as the defining or special feature of GR is both misleading and rather void of content. (Misner, Thorne, & Wheeler’s classic textbook suggests that at the time, mathematics was not sufficiently advanced to properly distinguish background independence from coordinate independence, so Einstein’s choice of phrasing is only confusing in retrospect). Rather, the special feature of GR is that it is background independent: in contrast to the wave equation above, where the metric plays the role of a fixed background (spoiling diff invariance in the process), in Einstein’s equations the metric is a dynamical variable. This is what is meant by “no prior geometry”.

However, lest we be mislead by the above example, it is important to emphasize that diffeomorphism invariance is not the same as background independence, even though the two appear hand-in-hand in Einstein’s equations. To illustrate this, compare the action for Maxwell’s electromagnetism,

\displaystyle S_{M}[A]=-\frac{1}{4}\int\mathrm{d}^4x\sqrt{-g}F_{\mu\nu}F^{\mu\nu}~, \ \ \ \ \ (3)

with the Einstein-Hilbert action

\displaystyle S_{EH}[g]=\frac{1}{16\pi G}\int\mathrm{d}^4x\sqrt{-g}R^{\mu\nu}g_{\mu\nu}~. \ \ \ \ \ (4)

(Example taken from [3]). Both of these depend on the metric tensor {g_{\mu\nu}(x)}, and are manifestly covariant (i.e., all metric indices are contracted). But in the former, the metric appears as a fixed background, and only the electromagnetic potential {A_\mu(x)} appears as an argument of the functional {S_M[A]}. In contrast, in the Einstein-Hilbert action {S_{EH}[g]}, the metric is a dynamical variable, and thus the background is a solution to the equations rather than something given externally at the outset. Thus diffeomorphism invariance simply means that the manifold on which the theory is formulated is irrelevant (modulo isomorphisms) to the underlying physics (or, to take the passive view, that we can choose any coordinate patch we like), while background independence is the stronger statement that the manifold itself is not fixed a priori. And this is what makes general relativity special.

P.s. Note that reference [2], which states that diffeomorphisms do not generally map geodesics to geodesics, is misleadingly phrased. A diffeomorphism {f:M\rightarrow M} will certainly map geodesics {\gamma} for some metric {g} on {M} to geodesics {f\circ\gamma} for the new metric {f^*g}. What the author means is that, except in the special case where the diffeomorphism is an isometry (i.e., {f^*g=g}; note isometry {\neq} isomorphism!), {f\circ\gamma} will not be a geodesic for the original metric {g}.

Posted in Physics | 3 Comments

Decoherence with holography

I recently read an interesting paper [1] that uses holography to study decoherence in strongly-coupled systems. It relies on the fact that, in the case of a linear coupling between the subsystem and the environment, the Feynman-Vernon influence functional — which encapsulates the influence of the environment on the subsystem — can be viewed as the generating function for Schwinger-Keldysh propagators—that is, nonequilibrium Green’s functions. The latter are well-known objects of study in AdS/CFT, and can be computed on the gravity side for strongly-coupled field theories. Unfortunately, there’s a fatal flaw in the connection to holography that invalidates the specific application of this path-integral formalism in [1]. But the Feynman-Vernon/Schwinger-Keldysh approach itself is generally valid and particularly elegant. And in fact, there may be a means of salvaging the holographic hopes of [1]. Hence we shall proceed to discuss the analysis, and return to comment on the stumbling block at the end.

Let us very briefly summarize the notion of decoherence before we begin. The evolution of the entire system (i.e., universe) is of course unitary, but an initially pure subsystem can evolve to a mixed state via interaction with its complement, the environment. The effect of the latter is to select from the Hilbert space of the subsystem a basis of states which are most stable against further environmental perturbation. These are called pointer states, and correspond to the classical solutions in the limit {\hbar\rightarrow0}. The quantum coherence between these pointer states is encoded in the off-diagonal elements of the reduced density matrix of the subsystem, and thus, loosely speaking, the disappearance of these off-diagonal elements characterizes the decoherence process.

(As a technical footnote, the adjective “loosely” is related to the qualifier “most” above: as noted in [1], the disappearance of the off-diagonal elements is a basis-dependent statement. But since the pointer states are not exactly stable away from the classical limit, they are not invariant under the decoherence process. Thus one needs a more careful mathematical characterization of decoherence, but we can safely ignore this technicality for the moment).

The reduced density matrix for the subsystem is defined in the usual manner, namely by tracing out the degrees of freedom of the environment:

\displaystyle \rho_\mathrm{sys}(t)=\mathrm{tr}_\mathrm{env}\rho(t)~, \ \ \ \ \ (1)

where {\rho} is the density matrix of the total system, which evolves unitarily according to some Hamiltonian {H},

\displaystyle \rho(t)=e^{-iH(t-t_i)}\rho(t_i)e^{iH(t-t_i)}~. \ \ \ \ \ (2)

One can prepare the subsystem to be in an initially pure state, with

\displaystyle \rho(t_i)=\rho_\mathrm{sys}(t_i)\otimes\rho_\mathrm{env}(t_i)~, \ \ \ \ \ (3)

whereupon decoherence evolves {\rho_\mathrm{sys}} from a quantum to classical state. This is the process we wish to study.

Since we shall work in the path-integral formalism, we need a Lagrangian, which we take to be of the form

\displaystyle \mathcal{L}[\phi,\chi]=\mathcal{L}_\mathrm{sys}[\phi]+\mathcal{L}_\mathrm{env}[\chi]+\mathcal{L}_\mathrm{int}[\phi,\chi]~, \ \ \ \ \ (4)

where {\phi} and {\chi} denote the degrees of freedom of the subsystem and the environment, respectively, which are coupled only through the interaction term {\mathcal{L}_\mathrm{int}}. The authors of [1] consider a strictly linear coupling of the form

\displaystyle \mathcal{L}_\mathrm{int}[\phi,\chi]=g\phi f[\chi]~, \ \ \ \ \ (5)

where {f[\chi]} is an arbitrary functional of the fields {\chi}, and {g} is a dimensionful coupling (in {(d\!+\!1)-}Euclidean, {[\phi]=(d\!-\!1)/2}, so {[g]} will depend on the precise form of {f[\chi]}). Unfortunately, as we will see shortly, the linear form of the coupling is actually necessary in order to view the influence functional as a source term. This will result in us having to put in certain other terms by hand. It is, as far as I know, an open question as to whether it’s possible to extend the formalism to allow more general couplings. However, despite this drawback, a powerful aspect of this approach is that the environment Lagrangian is entirely arbitrary (in principle, at least; practice is, as usual, another matter entirely). The idea is then that the subsystem itself can be used as a probe to study decoherence due to different environments. Accordingly, [1] considers the simple Lagrangian

\displaystyle \mathcal{L}_\mathrm{sys}[\phi]=-\frac{1}{2}\left(\partial_\mu\phi\right)^2-\frac{1}{2}\Omega^2\phi^2~. \ \ \ \ \ (6)

The aforementioned influence functional, which we will define precisely below, can then “be regarded as the probe’s effective action, which is obtained after the environmental degrees of freedom {\chi} are integrated out” [1]. This is the basic idea behind the Feynman-Vernon formalism.

To proceed, we must re-express the evolution equation (2) in the language of path integrals. The time-evolution operators {e^{\pm iHt}} then require that this integral is computed along a closed path, namely the Keldysh contour (basically, this amounts to a Lorentzian timefold in an otherwise Euclidean contour, thereby allowing one to insert unitary operators). For concreteness and convenience, [1] considers the thermostatic case {\rho_\mathrm{env}=e^{-\beta H_\mathrm{env}}}, so that Euclidean time runs from {t_i} to {t_i-i\beta}. Hence the contour first moves along the Lorentzian timefold from {t_i} to {t_f} and back, and then continues in Euclidean to {t_i-i\beta}.

With this picture in mind, we wish to compute the reduced density matrix {\rho_\mathrm{sys}} at {t_f}. Specifically, consider the amplitude

\displaystyle \langle\bar\phi_+|\rho_\mathrm{sys}(t_f)|\bar\phi_-\rangle =\int\mathrm{d}\bar\chi\langle\bar\phi_+\bar\chi|\rho(t_f)|\bar\phi_-\bar\chi\rangle~, \ \ \ \ \ (7)

where we have used (1), with the bar denoting the values of the fields at {t=t_f}. By inserting two resolutions of the identity, and using the evolution equation (2) with (3), this becomes

\displaystyle \begin{aligned} \langle\bar\phi_+|&\rho_\mathrm{sys}(t_f)|\bar\phi_-\rangle =\int\mathrm{d}\bar\chi\mathrm{d}\tilde\chi_+\mathrm{d}\tilde\chi_-\mathrm{d}\tilde\phi_+\mathrm{d}\tilde\phi_-\\ &\times\langle\bar\phi_+\bar\chi|e^{-iH(t_f-t_i)}|\tilde\phi_+\tilde\chi_+\rangle \langle\tilde\phi_+\tilde\chi_+|\rho_\mathrm{sys}(t_i)\otimes\rho_\mathrm{env}(t_i)|\tilde\phi_-\tilde\chi_-\rangle \langle\tilde\phi_-\tilde\chi_-|e^{iH(t_f-t_i)}|\bar\phi_-\bar\chi\rangle~. \end{aligned} \ \ \ \ \ (8)

The reason for rewriting it in this way is that we can now isolate the propagators corresponding to the two (forward and backward) legs of the Lorentzian timefold:

\displaystyle \begin{aligned} \langle\bar\phi_+\bar\chi|e^{-iH(t_f-t_i)}|\tilde\phi_+\tilde\chi_+\rangle &=\int_{\tilde\phi_+,\tilde\chi_+}^{\bar\phi_+,\bar\chi_+}\mathcal{D}\phi_+\mathcal{D}\chi_+\,e^{i\int_{t_i}^{t_f}\mathrm{d} t\mathcal{L}[\phi_+,\chi_+]}\,,\\ \langle\tilde\phi_-\tilde\chi_-|e^{iH(t_f-t_i)}|\bar\phi_-\bar\chi\rangle &=\int_{\tilde\phi_-,\tilde\chi_-}^{\bar\phi_-,\bar\chi_-}\mathcal{D}\phi_-\mathcal{D}\chi_-\,e^{-i\int_{t_i}^{t_f}\mathrm{d} t\mathcal{L}[\phi_-,\chi_-]}~. \end{aligned} \ \ \ \ \ (9)

Inserting these path-integral representations back into (8), and using the fact that the correlator {\langle\rho_\mathrm{sys}\otimes\rho_\mathrm{env}\rangle} factorizes, we have

\displaystyle \begin{aligned} \langle\bar\phi_+|&\rho_\mathrm{sys}(t_f)|\bar\phi_-\rangle =\int\mathrm{d}\bar\chi\mathrm{d}\tilde\chi_+\mathrm{d}\tilde\chi_-\mathrm{d}\tilde\phi_+\mathrm{d}\tilde\phi_- \int_{\tilde\phi_+,\tilde\chi_+}^{\bar\phi_+,\bar\chi_+}\mathcal{D}\phi_+\mathcal{D}\chi_+ \int_{\tilde\phi_-,\tilde\chi_-}^{\bar\phi_-,\bar\chi_-}\mathcal{D}\phi_-\mathcal{D}\chi_-\\ &\times e^{i\int_{t_i}^{t_f}\mathrm{d} t\left(\mathcal{L}[\phi_+,\chi_+]-\mathcal{L}[\phi_-,\chi_-]\right)} \langle\tilde\chi_+|\rho_\mathrm{env}(t_i)|\tilde\chi_-\rangle\langle\tilde\phi_+|\rho_\mathrm{sys}(t_i)|\tilde\phi_-\rangle~. \end{aligned} \ \ \ \ \ (10)

Now, as mentioned above, the central feature of the Feynman-Vernon formalism is that it packages all information about the effects of the environment into the so-called influence functional, which we define as

\displaystyle \begin{aligned} \mathcal{F}[\phi_+,\phi_-]\equiv &\int\mathrm{d}\bar\chi\mathrm{d}\tilde\chi_+\mathrm{d}\tilde\chi_-\int_{\tilde\chi_+,\tilde\chi_-}^{\bar\chi_+,\bar\chi_-}\mathcal{D}\chi_-\mathcal{D}\chi_-\\ &\times e^{i\int_{t_i}^{t_f}\mathrm{d} t\left(\mathcal{L}_\mathrm{env}[\chi_+]-\mathcal{L}_\mathrm{env}[\chi_-]+\mathcal{L}_\mathrm{int}[\phi_+,\chi_+]-\mathcal{L}_\mathrm{int}[\phi_-,\chi_-]\right)} \langle\tilde\chi_+|\rho_\mathrm{env}(t_i)|\tilde\chi_-\rangle~, \end{aligned} \ \ \ \ \ (11)

where we have used (4). This is then used to define the propagation function

\displaystyle J[\bar\phi_+,\bar\phi_-;t_f|\tilde \phi_+,\tilde\phi_-;t_i]\equiv\int_{\tilde\phi_+,\tilde\phi_-}^{\bar\phi_+,\bar\phi_-}\mathcal{D}\phi_+\mathcal{D}\phi_- \,e^{i\int_{t_i}^{t_f}\mathrm{d} t\left(\mathcal{L}_\mathrm{sys}[\phi_+]-\mathcal{L}_\mathrm{sys}[\phi_-]\right)}\mathcal{F}[\phi_+,\phi_-]~, \ \ \ \ \ (12)

which describes the evolution of the subsystem. With this in hand, we can neatly express the correlator (10) as

\displaystyle \langle\bar\phi_+|\rho_\mathrm{sys}(t_f)|\bar\phi_-\rangle= \int\mathrm{d}\tilde\phi_+\mathrm{d}\tilde\phi_- J[\bar\phi_+,\bar\phi_-;t_f|\tilde \phi_+,\tilde\phi_-;t_i] \langle\tilde\phi_+|\rho_\mathrm{sys}(t_i)|\tilde\phi_-\rangle~. \ \ \ \ \ (13)

Thus the main outstanding task is the computation of the influence functional (11). In general of course, this is prohibitively difficult; but here is where the linear form of the interaction (5) comes into play. First, observe that we can express the influence functional more compactly as the expectation value of the initial density matrix {\rho_\mathrm{sys}(t_i)}:

\displaystyle \mathcal{F}[\phi_+,\phi_-]=\langle\mathcal{T}_\mathcal{K}e^{i\int_\mathcal{K}\mathcal{L}_\mathrm{int}[\phi,\chi]}\rangle_\mathrm{env} =\langle\mathcal{T}_\mathcal{K}e^{ig\int_\mathcal{K}\phi f[\chi]}\rangle_\mathrm{env}~, \ \ \ \ \ (14)

where {\mathcal{T}_\mathcal{K}} is the path-ordering symbol with respect to the Keldysh contour, and

\displaystyle \langle\ldots\rangle_\mathrm{env}\equiv\mathrm{tr}_\mathrm{env}\left[\rho_\mathrm{env}(t_i)\ldots\right]~. \ \ \ \ \ (15)

Note that, as foreshadowed above, this is tantamount to tracing out the environmental degrees of freedom {\chi}. Such a trace would normally be written

\displaystyle \mathrm{tr}_\mathrm{env}\left(\mathcal{O}\right)=\int\!\mathrm{d}\bar\chi\,\langle\bar\chi|\mathcal{O}|\bar\chi\rangle \ \ \ \ \ (16)

for some operator {\mathcal{O}}. In this case however, we have to take into account the Keldysh contour that runs forward to {t_f} and back. Thus, denoting fields on the forward and backward legs with the subscripts {+} and {-}, respectively, what we really want is something like {\langle\tilde\chi_+|\mathcal{O}|\tilde\chi_-\rangle}. The trace then instructs us to integrate over all possible paths that interpolate between the two, subject to the boundary condition that {\bar\chi_+=\bar\chi_-=\bar\chi} at {t=t_f}; hence

\displaystyle \mathrm{tr}_\mathrm{env}\left(\mathcal{O}\right)=\int\!\mathrm{d}\bar\chi \mathrm{d}\tilde\chi_+\mathrm{d}\tilde\chi_-\int_{\tilde\chi_+,\tilde\chi_-}^{\bar\chi_+,\bar\chi_-}\mathcal{D}\tilde\chi_+\mathcal{D}\tilde\chi_- \langle\tilde\chi_+|\mathcal{O}|\tilde\chi_-\rangle~, \ \ \ \ \ (17)

which provides the link to (11). Now, given the form (14), we can view the probe field {\phi} as a source for the environmental fields {\chi}, whereupon this expression is precisely the generating function for the Schwinger-Keldysh propagator (i.e., real-time finite-temperature Green’s functions), defined as

\displaystyle G_{ss'}(x_1,x_2)\equiv-i\langle\mathcal{T}_\mathcal{K}\mathcal{O}_s(x_1)\mathcal{O}_{s'}(x_2)\rangle_\mathrm{env} =-i\frac{\delta^2\ln\mathcal{F}[\phi_+,\phi_-]}{\delta\phi_s(x_1)\phi_{s'}(x_2)}\bigg|_{\phi=0}~, \ \ \ \ \ (18)

where {s=\pm}.

As an aside, note that the natural log appears because the Schwinger-Keldysh Green’s function is given by the connected 2-point function. That is, recall that given the partition function {Z[J]} (where we’re temporarily reverting to standard QFT notation in which {J} denotes the source), which generates both connected and disconnected Feynman diagrams via

\displaystyle \langle\mathcal{O}(x_1)\ldots\mathcal{O}(x_n)\rangle=\frac{1}{i^n}\frac{\delta^nZ[J]}{\delta J(x_1)\ldots\delta J(x_n)}\bigg|_{J=0}~, \ \ \ \ \ (19)

the generating functional {W[J]} for the connected {n}-point function is related (in the present, Euclidean convention) by

\displaystyle Z[J]=e^{-W[J]}\;\;\implies\;\;W[J]=-\ln Z[J]~. \ \ \ \ \ (20)

When working with the effective action instead of the action, only connected diagrams contribute—hence one uses {W} rather than {Z}, and directly computes the connected {n}-point function via

\displaystyle \langle\mathcal{O}(x_1)\ldots\mathcal{O}(x_n)\rangle_c=\frac{1}{i^n}\frac{\delta^nW[J]}{\delta J(x_1)\ldots\delta J(x_n)}\bigg|_{J=0}~. \ \ \ \ \ (21)

In the present case, since we’re after the Green’s function, we set {n=2} and multiply by {-i} to fix conventions:

\displaystyle G(x_1,x_2)=-i\langle\mathcal{O}(x_1)\mathcal{O}(x_2)\rangle_c=-i\frac{1}{i^2}\frac{\delta^2W[J]}{\delta J(x_1)\delta J(x_2)}\bigg|_{J=0} =-i\frac{\delta^2\ln Z[J]}{\delta J(x_1)\delta J(x_2)}\bigg|_{J=0}~, \ \ \ \ \ (22)

which gives (18) upon identifying {Z[J]} with {\mathcal{F}[\phi]}. I believe this is what the authors mean when they say that the influence functional may be regarded as an effective action for the system (having integrated out the environment degrees of freedom), though strictly speaking this is misleading: first, because obviously partition functions are not actions, and second, because the effective action is given by {W=-\ln\mathcal{F}} rather than {\mathcal{F}} itself.

Equation (18) thus provides the connection between the Feynman-Vernon and Schwinger-Keldysh formalisms. And while it clearly enables one to compute the latter given the former, reference [1] is interested in the opposite scenario: given the Schwinger-Keldysh propagator {G}, which we can compute for strongly-coupled environments via AdS/CFT, how do we extract the influence functional {\mathcal{F}} that describes the decoherence effect of this environment on the subsystem?

For irrelevant couplings ({g\!\ll\!1}), this is feasible if we approximate the influence functional to quadratic order. This is more clearly explained in [2] (beware however that this reference uses the term “influence functional” to mean {-i\ln\mathcal{F}}; they also appear to use a different sign convention. I suspect reference [1] is atypical in both regards, and that the notation in [2] is actually more standard). The basic idea is that for small coupling, we may evaluate the path integral perturbatively, and exponentiate the result to obtain the effective action. We first expand (14) (suppressing the time-ordering symbol for compactness):

\displaystyle \begin{aligned} \langle e^{ig\int_\mathcal{K}\phi f[\chi]}\rangle_\mathrm{env} &=\,1\,+\,ig\int_\mathcal{K}\!\mathrm{d}^4 x\,\phi(x)\langle f[\chi(x)]\rangle_\mathrm{env}\\ &+\,\frac{(ig)^2}{2}\int_\mathcal{K}\!\mathrm{d}^4x_1\!\mathrm{d}^4 x_2\,\phi(x_1)\phi(x_2)\langle f[\chi(x_1)]f[\chi(x_2)]\rangle_\mathrm{env} \,+\,\ldots~. \end{aligned} \ \ \ \ \ (23)

This is the standard expansion of the exponential of the connected correlation function mentioned above, and hence we identify everything but the leading unit term on the r.h.s. with {W} (up to factors of {i}, depending on convention). Upon adding an appropriate counterterm to kill the tadpole (i.e., imposing that the vev of the 1-pt function vanishes, {\langle f[\chi]\rangle=0}), the effective action contains only the quadratic contribution (up to this order), and we may therefore write

\displaystyle \begin{aligned} -i\ln\mathcal{F}[\phi_+,\phi_-]&\approx-\frac{g^2}{2}\int\mathrm{d}^{d+1}x\mathrm{d}^{d+1}x'\sum_{s,s'}\mathrm{sgn}(ss')\phi_s(x)G_{ss'}(x-x')\phi_{s'}(x')\\ &=-g^2\int\mathrm{d}^{d+1}x\mathrm{d}^{d+1}x'\left[\Delta(x)G_R(x-x')\Sigma(x')-\frac{i}{2}\Delta(x)G_\mathrm{sym}(x-x')\Delta(x')\right]~, \end{aligned} \ \ \ \ \ (24)

where on the second line we have switched to the so-called “relative-average” basis

\displaystyle \Delta\equiv\phi_+-\phi_-~,\qquad \Sigma\equiv\frac{1}{2}\left(\phi_++\phi_-\right)~, \ \ \ \ \ (25)

and re-expressed the Schwinger-Keldysh Green’s functions {G_{ss'}} in terms of the standard advanced, retarded, and symmetric Green’s functions,

\displaystyle \begin{aligned} G_A(x_1,x_2)&=i\Theta(t_2-t_1)\langle[\mathcal{O}(x_1),\mathcal{O}(x_2)]\rangle_\mathrm{env}~,\\ G_R(x_1,x_2)&=i\Theta(t_1-t_2)\langle[\mathcal{O}(x_2),\mathcal{O}(x_1)]\rangle_\mathrm{env}~,\\ G_\mathrm{sym}(x_1,x_2)&=\frac{1}{2}\langle\{\mathcal{O}(x_1),\mathcal{O}(x_2)]\}\rangle_\mathrm{env}~, \end{aligned} \ \ \ \ \ (26)


\displaystyle G_A=G_{++}-G_{-+}~,\qquad G_R=G_{++}-G_{+-}~,\qquad G_\mathrm{sym}=\frac{i}{2}\left( G_{++}+G_{--}\right)~. \ \ \ \ \ (27)

Note that by definition, {G_{++}+G_{--}=G_{+-}+G_{-+}} and {G_A(x_1,x_2)=G_R(x_2,x_1)}.

In principle, one can use (18) to study decoherence for general environments, but further simplifications are obtained by restricting to the thermostatic case mentioned above, where {\rho_\mathrm{env}=e^{-\beta H_\mathrm{env}}}. In this case the trace in (15) becomes a thermal average, and the corresponding path integral runs from {t_i} to {t_i-i\beta} (after the Keldysh contour, i.e., the Lorentzian timefold from {t_i} to {t_f} and back). At thermal equilibrium, the Green’s functions become periodic in imaginary time; that is, they satisfy the KMS condition

\displaystyle G_{+-}(t-i\beta,\mathbf{x})=G_{-+}(t,\mathbf{x})~. \ \ \ \ \ (28)

In momentum space, this translates into the relation

\displaystyle G_\mathrm{sym}(\omega)=-[1+2n(\omega)]\mathrm{Im}G_R(\omega)~, \qquad\mathrm{where}\qquad n(w)=\frac{1}{e^{\beta\omega}-1}~. \ \ \ \ \ (29)

Here {n(\omega)} is simply the thermal distribution function. The utility of this relation is that it allows us to express (18) entirely in terms of {G_R}, and therefore knowledge of the environment’s retarded Green’s function completely determines the dynamics of {\rho_\mathrm{sys}}.

As alluded above, (18) is a rather powerful formula, since it applies for arbitrary environments. Traditionally however, computing the retarded Green’s function is intractable except in the case of free theories. The novel aspect of [1] is to use holography to compute {G_R} for strongly-coupled theories; if this were valid, it would enable the study of decoherence for a whole new class of systems. The details are quite technical, and we refer the interested reader to the paper for details; here we will only sketch the most basic overview.

First, the authors show that the propagation function {J} has the correct semi-classical limit, namely that it yields the Langevin equation for quantum brownian motion of the center of mass—in this case, for the “average” field {\Sigma}. This is achieved by observing that, in the relative-average basis {\{\Delta,\Sigma\}}, {\Delta} can be thought of as a light field that encodes the random fluctuations around the heavy center-of-mass field {\Sigma}. Integrating out the fast degree of freedom {\Delta} then leads to the classical equation of motion — the Langevin equation — for the slow field {\Sigma}.

The authors then proceed to derive the master equation for {\rho_\mathrm{sys}}. This entails explicitly evaluating the path integral {J} in the relative-average basis, and then computing its time-derivative. One can then deduce the differential equation describing the dynamics of {\rho_\mathrm{sys}} from (13).

In some sense, the master equation for {\rho_\mathrm{sys}} is all one needs to describe the decoherence process. However, as mentioned above, the disappearance of the off-diagonal elements of the reduced density matrix is not a rigorous characterization of decoherence, since the basis of pointer states is itself evolving (away from the classical limit, {\hbar\rightarrow0}). (There is also the subtlety that since {\phi} is a (continuous) field, {\rho_\mathrm{sys}} is an infinite-dimensional matrix; but this can be circumvented by suitably coarse-graining the system). Thus, instead of examining the off-diagonal elements of {\rho_\mathrm{sys}}, a better, basis-independent probe of decoherence is given by the Wigner function {\mathcal{W}(\Sigma,\rho,t)}, which is the Fourier transform of the density matrix {\rho(\Sigma,\Delta,t)} with respect to the fast field {\Delta}. In other words, the Wigner function is the quantum analogue of the distribution function over phase space. The key feature is that in general, {\mathcal{W}} is not positive-definite, but becomes so once the subsystem decoheres to a classical state. The negative contributions to {\mathcal{W}} therefore parameterize the degree of decoherence, and their rate of disappearance allows one to define the decoherence timescale.

Interestingly, as the authors of [1] observe, Rényi entropies — which are scalar quantities — also provide a basis-independent characterization of decoherence. Since the entanglement entropy is zero for a pure state and maximized for a completely mixed state, and decoherence evolves the subsystem from the former to the latter, entanglement or Rényi entropies are a natural way of tracking this process. This also allows the authors to compare the timescales for decoherence with those obtained in the study of local quantum quenches.

Unfortunately, as mentioned at the beginning of this post, there’s a basic flaw in the connection to holography that invalidates the application of this beautiful formalism. Specifically, the key observation of [1] was that the influence functional (14) has precisely the same form as the partition function in the extrapolate dictionary. They therefore identify the subsystem field with the source, and the environment field with the corresponding CFT operator. But the source isn’t a dynamical field on the boundary, and hence the splitting of the Lagrangian (4) does not make sense in this context. Another way to say this is that the (bulk) source and (boundary) operator are dual operators, and hence do not interact in the same Hilbert space. However, this implicitly assumes the standard choice of boundary conditions in holography, and it’s not clear whether a more sophisticated treatment would enable one to salvage this approach. [I am grateful to Billy Cottrell for discussions on this issue]. Given the fundamental importance of decoherence and (to?) holography, and the otherwise general elegance of the Feynman-Vernon approach, this is an interesting open question.


[1] S.-H. Ho, W. Li, F.-L. Lin, and B. Ning, “Quantum Decoherence with Holography,” JHEP 01 (2014) 170, arXiv:1309.5855 [hep-th].

[2] D. Boyanovsky, K. Davey, and C. M. Ho, “Particle abundance in a thermal plasma: Quantum kinetics vs. Boltzmann equation,” Phys. Rev. D71 (2005) 023523, arXiv:hep-ph/0411042 [hep-ph].

Posted in Physics | Leave a comment

General relativity or gravitons?

Question: How is the existence of a graviton consistent with the GR paradigm of gravity as a purely geometrical effect?

Answer: Ontologically, it’s not! Gravitons are predicated on a quantum field-theoretic formulation of gravity, while spacetime curvature is the corresponding classical description. By analogy, the electromagnetic force may be alternatively described in terms of the exchange of virtual bosons (in QFT), or in terms of electromagnetic waves (in classical electromagnetism); these are fundamentally different paradigms, but are epistemically consistent in the sense that the former (quantum electrodynamics) reduces to the latter (classical electrodynamics) in the appropriate limit.

That said, it is possible to show that the classical electromagnetic and gravitational forces must correspond — in the language of Lorentz-invariant quantum particle (as opposed to field) theory — to the transmission of a massless virtual particle with helicity {\pm1} and {\pm2}, respectively. In particular, as shown in a beautiful paper by Boulware and Deser in 1975 [1], “a quantum particle description of local (non-cosmological) gravitational phenomena necessarily leads to a classical limit which is just a metric theory of gravity. [If] only helicity {\pm2} gravitons are included, the theory is precisely Einstein’s general relativity…” This implies that Einstein’s theory enjoys a sort of quantum uniqueness (at least at tree level: it is entirely possible that the high-frequency behaviour of gravitons differs substantially from the (experimentally probed) low-energy regime of effective field theory).

The remarkable aspect of this correspondence is that one sees the emergence of a metric theory from a non-geometrical, flat-space formulation. Perhaps this will shed light on the notion of “emergent spacetime” from other non-geometrical precepts (namely, entanglement)?

To begin, consider the description of the world entirely in terms of S-matrix elements (or rather, the generalizations thereof necessitated by zero-mass particles, i.e., soft theorems). Observation of a force then implies the existence of a mediating particle whose exchange produces it. Since the effective potential for the exchange of a massive particle is {V\sim e^{-mr}/r}, the experimental {1/r} gravitational potential implies that the graviton must be massless, at least to within experimental accuracy.

Establishing the spin is more subtle. It must be an integer, since the Pauli exclusion principle prevents any virtual particle obeying Fermi-Dirac statistics from conspiring in sufficient numbers to produce a classical force. (The keyword here is “virtual”. Spin-{1/2} electrons, for example, are not force carriers in this paradigm; that role belongs to the integer bosons). We can also rule out spin {\pm1}, since a vector exchange would result in repulsion between like charges—masses, in this case.

(As an aside, the term “vector boson” for particles of spin {\pm1} arises from the fact that in quantum field theory, the component of a (massive) particle’s spin along any axis can take one of three values: {0}, {\pm\hbar}. Thus the dimension of the space of spin states is the same as that of a vector in three-dimensional space, and in fact can be shown to form a representation of SU(2), the corresponding group of rotations).

Ruling out a scalar particle — that is, spin 0 — can be done by considering the bending of light by a gravitational field. In particular, as shown in Boulware and Deser’s paper [1], the scattering angle for a photon interacting with a massive object (such as the sun) via a scalar graviton depends on both the momentum of the photon and its polarization. But experiments reveal no such dependence.

Finally, the possibility of spin greater than 2 was quashed by Weinberg’s 1964 paper [2], though we shall not repeat the arguments here. The graviton must therefore be a spin-2 particle. Furthermore, one can show that a finite-range exchange is untenable on both experimental and theoretical grounds [3,4]. We therefore conclude that the gravitational force corresponds, in the special relativistic scattering paradigm, to the exchange of massless spin-2 virtual bosons with infinite range.

The above is essentially the argument that classical (Einstein) gravity implies the existence of a massless spin-2 virtual particle. What about the other direction, namely that this graviton uniquely leads to Einstein gravity in the classical limit?

Unfortunately this direction is substantially more technical, so we will only summarize the argument here. The basic idea is that the virtual exchange of helicity {\pm2} gravitons is governed by second-rank tensor vertices, which correspond to matrix elements of the stress-energy tensor of a local field theory. In particular, Boulware and Deser show [1] that the graviton can only couple (to other particles as well as itself) via a conserved stress-energy tensor; otherwise, the graviton is free (i.e., it couples to nothing).

While Boulware and Deser’s analysis is only at tree level, it suffices to show that in the low-frequency limit, Einstein gravity follows uniquely from special relativistic scattering theory combined with a few observational constraints. Philosophically, this implies that one can view Einstein’s theory — and by extension, the intuitively-pleasing conception of gravity as the curvature of spacetime — as a phenomenological theory for macroscopic interactions. It is by no means necessary that this same geometrical interpretation continue to hold at small distances and times—such as at the Compton wavelength of a given interaction, where quantum effects are expected to be relevant. And while we have yet to understand the UV nature of gravity, the above picture of low-energy, geometrical gravity as a purely phenomenological descriptor, however much it may baffle one’s intuition, is tantalizingly in line with the emergent spacetime paradigm.


[1] D. G. Boulware and S. Deser, “Classical General Relativity Derived from Quantum Gravity,” Annals Phys. 89 (1975) 193.

[2] S. Weinberg, “Photons and Gravitons in s Matrix Theory: Derivation of Charge Conservation and Equality of Gravitational and Inertial Mass,” Phys. Rev. 135 (1964) B1049–B1056.

[3] H. van Dam and M. J. G. Veltman, “Massive and massless Yang-Mills and gravitational fields,” Nucl. Phys. B22 (1970) 397–411.

[4] D. G. Boulware and S. Deser, “Can gravitation have a finite range?,” Phys. Rev. D6 (1972) 3368–3382.

Posted in Physics | Leave a comment


Consider a scalar field in {3+1} dimensions with the standard decomposition into creation and annihilation operators,

\displaystyle \phi(x)=\int\frac{\mathrm{d}^3k}{\sqrt{2\omega(2\pi)^3}}\left( a_ke^{-ik\cdot x}+a_k^\dagger e^{ik\cdot x}\right)~. \ \ \ \ \ (1)

Then the commutation relations between the creation/annihilation operators,

\displaystyle \left[a_k,a_{k'}^\dagger\right]=2\omega(2\pi)^3\delta^3\left(\mathbf{k}-\mathbf{k}'\right)~, \;\;\; \left[a_k,a_{k'}\right]= \left[a_k^\dagger,a_{k'}^\dagger\right]= 0~, \ \ \ \ \ (2)

where {\omega^2=\mathbf{k}^2+m^2}, imply exact commutativity of spacelike separated fields:

\displaystyle \begin{aligned} \left[\phi(x),\phi(y)\right]&=\int\frac{\mathrm{d}^3k\mathrm{d}^3k'}{(2\pi)^64\omega\omega'} \left( a_ka_{k'}^\dagger e^{-ikx+ik'y}+a_k^\dagger a_{k'} e^{ikx-ik'y} -a_{k'}a_{k}^\dagger e^{-ik'y+ikx}-a_{k'}^\dagger a_{k} e^{ik'y-ikx}\right)\\ &=\int\frac{\mathrm{d}^3k\mathrm{d}^3k'}{(2\pi)^64\omega\omega'}\left(\left[a_k,a_{k'}^\dagger\right]e^{-ikx+ik'y}-\left[a_{k'},a_k^\dagger\right]e^{ikx-ik'y}\right)\\ &=\int\frac{\mathrm{d}^3k}{(2\pi)^32\omega}\left( e^{-ik(x-y)}-e^{ik(x-y)}\right)=0~, \end{aligned} \ \ \ \ \ (3)

where the last step follows from the fact that since {(x-y)^2<0}, there exists a continuous Lorentz transformation that interchanges the order of events; in particular, this allows us to take {(x-y)\rightarrow-(x-y)} in one of the two terms, whereupon the commutator vanishes. A similar argument can be made for commutators of the form {\left[\phi^\dagger(x),\phi(y)\right]}.

The vanishing of the commutator between spacelike separated observables is what is technically meant by microcausality in QFT. The prefix “micro” is to distinguish this quantum concept from macrocausality, which refers to our classical notion that no effects propagate faster than light. Since the latter is almost always taken for granted in such discussions, one often takes the unqualified “causality” to refer to the former, but we shall refrain from following this convention here for the sake of exactness.

Note that the correlation function {\left<\phi^\dagger(x)\phi(y)\right>} does not vanish, even for {(x-y)^2<0}. This does not imply any superluminal violations of causality. Rather, this is simply the statement that the fields share some small correlation; i.e., that their past lightcones overlap. (Ignoring such issues as cosmological expansion, in flat space this will always be true if one goes back far enough). In other words, it is the commutator, and not the correlator, that provides the correct diagnostic of microcausality in field theories.

Ironically, the exact microcausality of fields (inherently nonlocal entities defined on the whole space) does not extend to particles (the local excitations thereof). Consider the number operator density {\mathcal{N}(x)}, defined via

\displaystyle N_V\equiv\int_V\mathrm{d}^3\,ka_k^\dagger a_k=\int\mathrm{d}^3x\,\mathcal{N}(\mathbf{x},t)~, \ \ \ \ \ (4)

where {N_V} is the number operator that counts the number of particles in a state within a given spatial region {V}. It is then straightforward to show (see, e.g., Anthony Duncan’s The Conceptual Framework of Quantum Field Theory, 2012, sec. 6.5) that

\displaystyle \left[\mathcal{N}(\mathbf{x},t),\mathcal{N}(\mathbf{y},t)\right]\neq0~, \ \ \ \ \ (5)

and therefore measurements of the number of particles in two non-overlapping volumes, {N_{V_1}}, {N_{V_2}}, (where {\mathbf{x}\in V_1} and {\mathbf{y}\in V_2}) will exhibit an interference that falls off exponentially with the minimum spacelike separation between {V_1} and {V_2}. In other words, a one-particle state can be localized with respect to the number density operator only with an energy density that falls off exponentially at a rate determined by the Compton wavelength, i.e., {\sim e^{-2m|x|}}. This is why it is meaningless to speak of “particles” below their Compton wavelength. Incidentally, lest one worry, the above is a uniquely quantum phenomenon: one recovers {\left[N_{V_1},N_{V_2}\right]=0} in the non-relativistic limit, {c\rightarrow\infty}. (In fact, the discussion is even more subtle: technically speaking, number operators localized within finite subregions do not exist).

As should now be apparent, the concepts of causality and locality are intimately linked, but technically distinct. In particular, the impossibility of exactly localizing particles reflects the fact that one cannot localize physical attributes (e.g., energy, momentum, charge) at a dimensionless spacetime point. Indeed, the point-like nature of elementary particles is merely a statement about their interaction (via a Hamiltonian which multiplies them at the same spacetime point), and is therefore epistemic. At best, one can localize particles as wavepackets. This is sufficient to ensure clustering: the factorization of the S-matrix for groups of separated particles such that, in the limit that the separation distance goes to {\infty}, the total scattering amplitude approaches the product of the independent scattering amplitudes of the particles in each region. Ultimately, this is what we observe in the laboratory, and as such one could consider this an operational definition of locality. But it is microcausality that ensures the analyticity of the S-matrix (via the singularity structure, which inhibits future interactions from influencing the past) on which the covariant formulation of clustering rests.

Posted in Physics | Leave a comment

A brief history of firewalls

Black hole thermodynamics

In 1973, Jacob Bekenstein observed that black holes must be endowed with an entropy in order to preserve the second law of thermodynamics; otherwise, one could decrease the entropy of the universe by simply throwing subsystems with high entropy (e.g., a hot cup of coffee, this blog post) into a black hole. At face value, this is an intuitive proposal: since the information about the degrees of freedom that comprise the hypothetical subsystem would then be hidden behind the event horizon, it makes sense to count them among the microstates of the black hole.

The unintuitive twist (the first of many!) comes from the realization that this naïve bookkeeping is not at all how black holes operate. The entropy of familiar systems scales with the volume thereof, {S\!\sim\!V}, which is consistent with simply counting the obvious (particulate) degrees of freedom in the examples above. Black hole entropy, in stark contrast, scales with the area of the event horizon, {S\!\sim\!A}. Bekenstein’s original motivation for this proposal hinged largely on Hawking’s 1971 result that the surface area of a black hole cannot decrease in any classical process (the so-called “area theorem”). This lead Bekenstein to propose an analogy between black holes and statistical thermodynamics, which has since been enshrined in the laws of black hole thermodynamics for stationary black holes:

  • Zeroth Law: The surface gravity, {\kappa}, on the horizon is constant. This implies that surface gravity is analogous to temperature.
  • First Law: For a stationary Kerr-Newman black hole, the change in energy under small perturbations is given by

    \displaystyle \mathrm{d} E=\frac{\kappa}{8\pi}\mathrm{d} A+\Omega \mathrm{d} J+\Phi\mathrm{d} Q~. \ \ \ \ \ (1)

    This is the statement of energy conservation, where the r.h.s. is equal to {T\mathrm{d} S}.

  • Second Law: Assuming the weak energy condition holds, the horizon area is non-decreasing,

    \displaystyle \frac{\mathrm{d} A}{\mathrm{d} t}\geq0~. \ \ \ \ \ (2)

    This is the aforementioned area theorem, and corresponds (under the instigating observation of Bekenstein above) to the statement that the entropy never decreases.

  • Third Law: It is not possible to form a black hole with vanishing surface gravity,

    \displaystyle \kappa>0~. \ \ \ \ \ (3)

    The third law of ordinary, statistical thermodynamics is essentially the statement that a system at absolute zero must be in the state with minimum possible energy. In the usual example of a perfect crystal, this is assumed to be comprised of a single eigenstate, hence the entropy vanishes. The corresponding example here is an extremal black hole, which has {\kappa=0}.


However, despite the apparent necessity of ascribing to black holes an entropy proportional to {A}, thus far black hole thermodynamics is little more than an analogy: classically, black holes do not radiate (hence the name), and therefore have zero temperature and consequently zero thermodynamic entropy. Indeed, Bekenstein’s original proposal explicitly views the entropy in an information-theoretic — as opposed to thermodynamic — sense, i.e., as the Shannon entropy measuring the inaccessibility of the internal microstates of a system. General relativity ensures that these degrees of freedom are forever isolated from the external universe, hence an external observer can never extract information, and thus the entropy of the black hole must be non-decreasing. It is worth emphasizing however that, at least at the classical level, this entropy is properly regarded as referring to the equivalence class of black holes with the same mass, charge, and angular momentum, rather than to the temperature of any single black hole.

The situation changed the following year, when Hawking showed that, quantum mechanically, black holes do radiate, with temperature

\displaystyle T=\frac{\kappa}{2\pi}=\frac{1}{8\pi M}~, \ \ \ \ \ (4)

and entropy

\displaystyle S=\frac{A}{4\ell_P^2}~, \ \ \ \ \ (5)

where we have explicitly included the Planck length, {\ell_P=\sqrt{\hbar G/c^3}}, in the latter formula lest the reader be disturbed by the mismatch in dimensions between {S} and {A}. The existence of Hawking radiation implies that black holes can evaporate, and thus their surface area {A} can in fact decrease. (In other words, the aforementioned area law was a purely classical statement. Quantum mechanical effects render the weak energy condition — a key assumption — invalid). This requires a modification of the second law, to the effect that the total entropy of the black hole (still identified with its horizon area) plus the entropy of the Hawking radiation is non-decreasing. This is referred to as the generalized second law.

With Hawking’s discovery that black holes are not completely black after all, black hole thermodynamics went from epistemic to ontic in one fell swoop. The precise nature of the Hawking radiation itself, however, remains muddled to this day.

The vast interpretational quagmire surrounding Hawking radiation is due in no small part to the fact that there are a multitude of seemingly distinct derivations thereof. Hawking’s original 1975 calculation considers a black hole that forms from collapse. The mode expansion of a scalar field at past and future null infinity differ, on account of the difference in vacuum state—namely, the Minkowski and Schwarzschild vacua, respectively. One can express the latter in terms of the former by means of a Bogoliubov transformation, which results in a thermal expectation value for the outgoing modes. (More technically, the initial Minkowski vacuum {|0_M\rangle} corresponds to the Kruskal or Hartle-Hawking vacuum {|0_K\rangle}, while the final Schwarzschild vacuum {|0_S\rangle} is analogous to Rindler space {|0_R\rangle}. While the Kruskal modes are defined on the entire manifold, a Rindler observer, who has access to only the exterior spacetime, will perceive a thermal vacuum corresponding to tracing out the degrees of freedom behind the horizon. This is the mechanism that underlies the Unruh effect).

However, the collapsing geometry is in fact entirely incidental to the radiation, as Hawking himself observed. Rather, it is the presence of the event horizon that is the key feature. Indeed, it is straightforward to show that an accelerating observer in Minkowski space observes a thermal spectrum associated with the Rindler horizon, which well-approximates the near-horizon region of a large Schwarzschild black hole.

The centrality of horizons in this context is elegantly demonstrated by the 1977 paper by Gibbons and Hawking, in which they compute the entropy of a black hole from what is essentially a purely geometrical argument. (In contrast to the usual jargon, here I mean “geometry” as distinct from “classical gravity”, since the presence of {\hbar} in the path integral technically places us beyond the domain of the latter). The basic idea is to compute the path integral for the black hole by Wick rotating to Euclidean signature, in which the geometry pinches off smoothly at the horizon. This corresponds to the fixed point of the {U(1)} symmetry, which we obtain by periodically identifying Euclidean time to avoid a conical deficit. The contribution from the fixed point dominates the path integral {Z}; and since {Z} is also the partition function, a simple thermodynamic argument allows one to derive an expression for the entropy in terms of the leading saddle-point, which yields precisely the above, well-known result (5).

The information paradox

The fact that black holes radiate has shattering implications, which Hawking was swift to point out in his subsequent work. Suppose that we form a black hole by collapsing some matter distribution in an initially pure state. After the black hole has completely evaporated, we are left with radiation in a thermal state, which is by definition mixed. But the transformation from a pure state to a mixed state violates unitarity, a fundamental principle of quantum mechanics necessary to ensure conservation of probabilities. In other words, non-unitary evolution would imply that information is lost in the process, which quantum mechanics forbids. Thus it appears that the very quantum mechanical laws which give rise to Hawking radiation are violated as a result! This is the substance of the black hole information paradox. As we shall see, it provides perhaps the first hints that our conception of locality may require modification.

It is illuminating to contrast this situation with the apparently pure-to-thermal evolution of normal matter upon incineration, say a burning lump of coal. Supposing this to be in an initially pure state, the final state again involves a thermal bath of radiation, with the apparent loss of information that implies. But we do not concern ourselves with unitarity-violating barbecues. The reason is that subtle correlations between early and late radiation conspire to preserve the purity of the total system. It is only in coarse-graining (or tracing out whatever fraction of coal remains at a given stage) that we perceive a thermal state. It may be impossible to actually recover this information in practice, but in principle, the laws of quantum mechanics survive intact—that is, a sufficiently powerful computer could do it.

The essential difference between the coal and the black hole is that the former has no horizon. Early “Hawking” modes are entangled with modes inside the coal, which can — via their interactions with other interior modes — imprint this information on the late radiation. In contrast, the presence of a horizon imposes a very specific entanglement structure on the modes that prevents those behind the horizon from transmitting the information in any obvious manner. This follows from the fact that the Minkowski vacuum is in some sense an infinitely entangled state: the correlation function between local field excitations at spacelike-separated points {A} and {B} will diverge as {A\rightarrow B}. We can make this more precise by considering the Rindler decomposition of the vacuum,

\displaystyle |0\rangle=\frac{1}{\sqrt{Z}}\sum_ie^{-\pi\omega_i}|i\rangle_L|i'\rangle_R~, \ \ \ \ \ (6)

where {Z} is the Euclidean path integral with no insertions, and the relation between the basis vectors for the left (L) and right (R) wedges is {|i\rangle_L=\Theta^\dagger|i'\rangle_R}, where {\Theta} is the CPT operator. (This is an antiunitary operator that exists in all QFTs, whose action on a scalar field {\Phi} is {\Theta^\dagger\Phi(t,x,\mathbf{y})\Theta=\Phi^\dagger(-t,-x,\mathbf{y})}). Now consider decomposing a free scalar field into modes of definite boost energy {\omega} ({-\omega}) in the right (left) Rindler wedge. Then the vacuum state can be equivalently written as a product state over all modes:

\displaystyle |0\rangle=\bigotimes_{\omega,k}\sqrt{1-e^{-2\pi\omega}}\sum_ne^{-\pi\omega n}|n\rangle_{L\omega(-k)}|n\rangle_{R\omega k}~. \ \ \ \ \ (7)

This pairwise entanglement between modes across the horizon is ultimately what prevents the modes from sharing their entanglement as in the lump of coal.

On this point, an important clarification bears mentioning: while the pairwise entangled modes are a characteristic feature of horizons, the popular conception of Hawking radiation as pairwise entangled particles is misleadingly invalid—“a cartoon Hawking invented to explain his paper to children”, as Harlow quotes of Susskind. The wavelength of the modes is of order {M^{-1}}, the size of the black hole, and thus the particle interpretation breaks down long before one reaches the horizon. It is therefore meaningless to speak of the radiation as being localized in this manner. (Just to be clear, this of course does not imply that an infalling observer won’t see particles as usual in her own reference frame, as per the equivalence principle. It is merely the blueshifting of Hawking modes back from infinity that is ill-defined; the associated divergence is simply the statement that, from the perspective of an external observer, time appears to stop at the horizon.) The related question of where, precisely, the Hawking radiation originates has not been settled, though the evidence suggests that the adjective “precisely” may lose out to nonlocality as well.

Despite these difficulties, there have been several attempts to reconcile the apparent information loss by black holes by appealing to subtle correlations in the Hawking radiation. And indeed, in this regard it is worth emphasizing an oft-misstated point, namely that the radiation is not exactly thermal in the technical sense of the term—meaning, possessing a Planckian spectrum. Lower bounds on deviations from thermality can be derived from greybody factors, as well as from adiabatic and phase space constraints. The appearance of (exact) thermality in certain calculations of the radiation spectrum (e.g., Hawking’s original work) stems from the fact that the Hartle-Hawking state presupposes that the black hole is in thermal equilibrium with the radiation, in which case one inevitably recovers a perfect black body spectrum. That said, the spectrum is thermal to a very good approximation, so we shall follow the conventional abuse of terminology and continue to use “thermal” in the colloquial sense, i.e., in reference to a highly mixed state with an approximately, rather than exactly, Planckian spectrum.

Even allowing for small deviations from exact thermality, it has been argued that subtle correlations in the Hawking radiation are insufficient to restore unitarity, and that these would instead have to constitute an {O(1)} correction, which would destroy the very semiclassical physics they were intended to save. But the possibility of encoding information in such a manner has not been ruled out. In fact, arguments from holography — more specifically the AdS/CFT correspondence — indicate that unitarity is indeed preserved, and consequently the belief that the information is somehow encoded in the Hawking radiation is currently the most popular position.

An alternative proposal is that the evaporation process halts with a Planck-scale remnant, which contains all the information necessary to purify the radiation. However, whether remnants actually possess such an information capacity has been called into question. Furthermore, even if the issue of unitarity could be resolved (or rather, sidestepped) in this manner, it would require an object on the order of {10^{-35}~m} to contain an (in principle) infinite number of internal states! This hardly seems a reasonable resolution, and remnants are generally disfavored for these and other reasons. That said, it is worth commenting that once the black hole approaches the Planck scale, semi-classical gravity breaks down, and a full theory of quantum gravity is needed to specify what happens in the final moments of a black hole’s life.

A somewhat more fanciful possibility is to suppose that the black hole gives rise to another universe, such that unitarity is preserved in the total system (that is, the resulting multiverse). However, information would still be lost from the perspective of outside observers. (Certain models suggest that when making measurements on an ensemble, the loss of information to the baby universes is not observable; however, this does not appear to resolve the paradox when restricted (as we are) to a single parent universe.) Additionally, there is ongoing debate as to whether evolution to a mixed state (or in this case, to a state defined on a non-Cauchy surface) violates conservation of energy. (Note that in this context we are considering the evolution of the entire system, as opposed to subsystems, from pure to mixed. The latter is a benign and fundamental feature of quantum mechanics known as decoherence). In any case, this possibility would seem qua definitione beyond observable verification. And as we shall see below, holography provides stronger arguments against black holes acting as “information sinks”, and thus we leave this option aside as well.

Black hole complementarity

All three of the proposed solutions (or rather, classes of solutions) above suffer drawbacks that, as of yet, have prevented a satisfactory resolution from emerging. However, in the early 90’s, Susskind, Thorlacius, and Uglum argued that that there is in fact no contradiction due to what they termed black hole complementarity (BHC). Building on earlier ideas by ‘t Hooft, they proposed what is essentially a radical deviation from locality, whereby the same information is observed in different locations by complementary observers. The adjective here denotes the key restriction that these observers are unable to communicate; both measurements are then equally valid, since the contradiction between them could only be observed by transmitting and comparing. (This is not to say that the information is in two places simultaneously, since that would violate the no-cloning principle. Rather, “complementarity” refers to the fundamental feature of quantum mechanics whereby non-commuting observables cannot be simultaneously measured (the most famous example of which are the canonical position and momentum operators)).

The postulates of BHC are as follows:

  1. Unitarity: Black hole formation and evaporation is described by a unitary S-matrix within the context of standard quantum field theory.
  2. EFT: Physics outside the horizon is described to a good approximation by effective field theory.
  3. Thermodynamics: To an external observer, the black hole appears to be a quantum system with discrete energy levels, and the dimension of the subspace is given by {e^S}.
  4. Equivalence principle: A freely falling observer experiences “no drama”, i.e., no substantial deviation from the predictions of general relativity, when crossing the horizon of a large black hole.

Postulates 1 and 3 follow from the usual demands of quantum mechanics and black hole thermodynamics, respectively, as described above. Postulates 2 and 4 essentially follow from the fact that the horizon of a large black hole is a region of low curvature, and (insofar as event horizons are global constructs) its presence is not revealed by any local invariant. (However, there are large nonlocalinvariants, in particular a large relative boost. In standard quantum field theory, only large local invariants can lead to a breakdown. But highly boosted strings behave differently than point particles, and some recent work has investigated string scattering near the horizon as a means of probing the possible breakdown of locality in effective field theory). Indeed, the Earth could be falling through the event horizon of a sufficiently large black hole at this very moment; according to the equivalence principle, we’d be unable to tell. In other words, while new physics, specifically a theory of quantum gravity, is obviously needed for the Planck-scale region near the singularity, one fully expects that semi-classical physics remains valid on (large) horizon scales.

The upshot of BHC is that an observer who remains outside the black hole perceives a hot membrane at the horizon which radiates information, while an infalling observer encounters nothing out of the ordinary as she falls through. The former sees unitary evolution but cannot verify the apparent loss of the equivalence principle, while the situation for the latter is precisely reversed. (Note that BHC does not contradict the relativistic law that physics is the same in all reference frames, but merely asserts that the description of events in frames “separated by a large boost parameter” may differ).

It is instructive to ask what prevents the external observer from jumping into the black hole at some later time in order to compare her observations with those of the earlier infaller. If possible, this would violate the no-cloning principle and thereby render BHC invalid. However, the external observer must wait until after the Page time before she can collect any information. If she then attempts to receive an illegal quantum copy from the earlier infaller by subsequently diving into the hole, the message must be sent with more energy than the entire black hole itself contains—otherwise, she’ll hit the singularity first. Thus it appears that a careful balance of factors conspires to keep the two frames of reference complementary in the above sense.

BHC is not as far-fetched as it initially sounds. Indeed, the idea that one should only endow observable quantities with ontic status is not only central to relativity, but a core tenet of science in general—that which cannot be measured (that is, does not interact with the physical universe) cannot be meaningfully said to exist. Nonetheless, BHC does entail a significant departure from standard quantum mechanics with regards to the interpretation of the Hilbert space on a Cauchy slice that crosses into the interior of the black hole in such a way as to intersect both “copies” of the information. In particular, the question is whether a global Hilbert space can be meaningfully said to exist on these “nice slices”.

If one posits a global Hilbert space, it must be the case that spacelike operators — specifically those in the interior and exterior — no longer commute. Otherwise, an observer whose causal past includes both regions would be able to measure them simultaneously. In this case, one preserves the usual formulation of quantum mechanics, except that locality is broken in such a manner as to make the same piece of information appear differently to different observers—specifically, observers who are complementary in the above sense. This is sometimes referred to as the weak interpretation of BHC, in contrast to the alternative below. As we shall see, this interpretation is morally in line with AdS/CFT, which also presumes quantum mechanics (i.e., the existence of a single, global Hilbert space) but is fundamentally nonlocal or holographic in nature.

Alternatively, one can deny the existence of such a global Hilbert space. In this so-called strong interpretation of BHC, the interior and exterior observers have their own separate Hilbert spaces, with some suitable matching conditions on the boundary (namely, the horizon). This preserves locality in the sense that spacelike observables commute as expected within each Hilbert space, but it is unclear whether it is possible to formulate a consistent set of matching conditions. (For example, insofar as horizons are global properties of the spacetime, the matching conditions would need to be defined nonlocally in time). Additionally, as Polchinski has noted, this interpretation still constitutes a “weakening” of local quantum field theory, since it makes the Hilbert space structure subordinate to the causal structure. (This is the inverse of the standard formulation of QFT, wherein locality or “microcausality” is seen to emerge from quantum mechanics in conjunction with special relativity and the clustering property (i.e., factorization of the S-matrix)).

Firewalls: the paradox reloaded

Until recently, BHC was generally the de facto (albeit perhaps not entirely satisfactory) solution to the information paradox. In 2013 however, Almheiri, Marolf, Polchinski, and Sully (AMPS) argued that the postulates of BHC are in fact mutually inconsistent. This rekindled the information paradox with a vengeance, and the modern, as yet unresolved version is known as the firewall paradox.

The AMPS argument can be crudely summarized as follows: smoothness of the horizon — i.e., the equivalence principle — requires that a given Hawking mode {H} and its interior partner {P} be maximally entangled, as discussed above (more generally, the exterior mode is purified by its interior partner), while purity of the final radiation — i.e., unitarity — requires that {H} be maximally entangled with the earlier radiation {R}. But this violates the monogamy of quantum entanglement, and thus it appears that at least one of the assumptions must be modified. AMPS chose the equivalence principle as the least egregious sacrifice. This would imply that an infalling observer indeed encounters the hot membrane perceived by her external collaborator—and is completely incinerated; hence the name “firewall”.

Note that the old argument that saved BHC — namely that it is impossible for an external observer to perform a measurement, and then dive in and obtain an illegal quantum copy of the result — does not save us here. The reason is that this argument is based on information recovery, which requires that we wait until after the Page time before jumping in. But as AMPS pointed out, we needn’t wait that long simply to uncover a problem. Instead, the external observer can make a measurement of a single Hawking mode at some early time, which must eventually be entangled with the late radiation if the final state is to be pure. She can then immediately jump in and capture the entanglement between this later mode and its interior partner, thus violating the no-cloning principle.

The aftermath of AMPS was considerable. As of this writing, less than 5 years after their paper’s appearance, it has received nearly 700 citations. Despite this effort however, the Firewall remains unextinguished. Operationally, there are reasons to believe that the problem is purely academic, insofar as no observer would actually perceive such a violation. For example, the first paper I wrote during my PhD casts doubt on whether all the ingredients for the paradox fit within a single observer’s causal patch. But such operational arguments feel somehow unsatisfactory, insofar as they do not shed light on how unitarity is preserved in principle—that is, how the requisite information escapes, as AdS/CFT implies it must.

One perspective is that our concept of locality will require modification, such as in the so-called non-violent nonlocality proposals of Giddings and collaborators. In fact, locality is not the only tenet that appears in need of reassessment. As we’ve mentioned before, nearly every discussion in this context involves an assumption about the entanglement structure at the horizon, namely that the Hilbert space factorizes into a tensor product structure. This fails in gauge theory, let alone gravity, and a more rigorous approach to field theory suggests that it’s not even valid for non-interacting QFTs. The deepening connections between entanglement and spacetime geometry uncovered in recent years may shed light on this issue, and it is one to which we hope to return.


Publication note: this post is essentially the first chapter of my unpublished PhD thesis, minus the bibliography (with apologies to the many deserving authors therein). A few bits, mostly around the beginning, resurfaced in my submission to the Black Hole Initiative’s 2018 essay competition, which can be found here.




Posted in Physics | 1 Comment

The Renormalization Group

Why is the universe comprehensible? Wigner referred to this as the “unreasonable effectiveness of mathematics in the natural sciences”, and even Einstein is famous for writing, in 1936, that “[t]he eternal mystery of the world is its comprehensibility… The fact that it is comprehensible is a miracle.” At a philosophical level however, this follows inexorably from the fact that reality is subject to certain inescapable features. To wit, the universe is comprehensible because it is logical; it cannot be otherwise (double entendre very much intended!).

At the level of physics however, this is not enough, for nowhere is it a priori guaranteed that a complete UV description of the world — namely, a full theory of quantum gravity — is not necessary to understand our low-energy lives. This does not prevent a sufficiently intelligent observer from comprehending the world in principle, of course (nor restore profundity to the equivalence class of misguided philosophy represented above), but it does pose a formidable problem in practice. The simple argument from separation of scales, in other words, cannot be so blithely accepted. Enter the renormalization group.

One of the first puzzles one encounters in QFT is the problem of UV and IR divergences. The latter arises due to the infinite number of degrees of freedom integrated over all space. For free field theory in 3+1 dimensions for example, the expectation value of the Hamiltonian has a divergence of the form

\displaystyle \left<0\right|H\left|0\right>=\int\mathrm{d}^3\mathbf{k}\omega\delta^3(0)\left<0|0\right>=\int\mathrm{d}^3\mathbf{k}\omega\int\frac{\mathrm{d}^3\mathbf{x}}{(2\pi)^3}e^{i0\mathbf{x}}=\int\frac{\mathrm{d}^3\mathbf{k}}{(2\pi)^3}\omega V~, \ \ \ \ \ (1)

where {V} is the (infinite!) spatial volume and {m^2\!=\!\omega^2\!-\!\mathbf{k}^2} (and we have assumed the normalization {[a_k,a_{k'}^\dagger]=(2\pi)^32\omega\delta^3\left(\mathbf{k}-\mathbf{k}'\right)}). This is clearly divergent: the energy of the ground state oscillators {\omega} integrated over all space is infinite. But since only energy differences are measurable, we simply renormalize away this infinite factor by demanding that all operators be normal ordered. In flat space, one can think of this as simply subtracting off the infinite vacuum divergence (though in curved spacetime one cannot be so cavalier). Alternatively, one can put the system in a box of finite size {L} (i.e., {V=L^3}), and consider the limit {L\rightarrow\infty}. In this case, {L} acts as an IR cutoff on the modes—excitations whose energy is below the cutoff scale {L^{-1}} simply don’t fit.

The UV divergences, in contrast, are not so easily tamed. Unlike the IR divergence, which we traced back to the zero-point energy of the vacuum, the UV divergences pose serious problems for the finiteness of the perturbative expansion. For even in a process in which all external momenta are small, momentum conservation at each vertex still allows arbitrarily high momenta to circulate in loops. Thus, even the first-order correction to the scalar propagator in, say, {\phi^4} theory would seem to depend on the details of arbitrarily high-energy physics.

Consistent management of these UV divergences is accomplished in field theory via the dual framework of regularization and renormalization. The latter gives rise to the renormalization group (RG), which serves as our quantum field-theoretic understanding of why it is possible to do physics — i.e., to understand the world — at all.

Regularization is the process of removing divergences by some systematic procedure—for example, the imposition of a cutoff as in the IR regularization above. Subsequently, renormalization is performed to adjust the parameters accordingly; the original, bare parameters are infinite, while the renormalized parameters correspond to what one actually measures at a particular (finite) energy scale. Of course, the final result should not depend on the details of the regularization scheme. Thus for example, if {\epsilon} is a position-space UV regulator, then a sensible regularization procedure requires that the theory has a well-defined limit as {\epsilon\rightarrow0}. In most cases, terms like {1/\epsilon} spoil this behaviour, in which case renormalization is used to relate the regularized expressions to observed values—essentially by accounting for self-interactions (crudely speaking, it “renormalizes” the couplings, in the colloquial sense of the word). The existence of such a well-defined limit, and the independence of the final result from the regulator, are highly non-trivial facts, and indeed may be thought of as a concrete manifestation of the miracle of comprehensibility perceived by Einstein above. These facts stem from universality (as in, the universality of dynamical systems), to which the RG flow endows a specific definition, as we shall see.

The explanation for universality, as well as the most elegant formulation of the combined regularization and renormalization procedure, is the Wilsonian RG. One begins with the generating functional,

\displaystyle Z[J]=\int\mathcal{D}\phi e^{-S_E[\phi]+\int\mathrm{d}^dxJ(x)\phi(x)}~, \ \ \ \ \ (2)

where {S_E} is the Euclidean action, and imposes a momentum-space cutoff {\Lambda} such that

\displaystyle Z[J]=\int\mathcal{D}\phi_{|k|<\Lambda} e^{-S_E^{\mathrm{eff}}[\phi;\Lambda]+\int\mathrm{d}^dxJ(x)\phi(x)}~, \ \ \ \ \ (3)


\displaystyle \mathcal{D}\phi_{|k|<\Lambda}=\prod_{|k|<\Lambda}\mathrm{d}\phi(k)~, \ \ \ \ \ (4)

and the Wilsonian effective action is

\displaystyle e^{-S_E^{\mathrm{eff}}[\phi;\Lambda]}=\int\mathcal{D}\phi_{|k|>\Lambda}e^{-S_E[\phi]}~. \ \ \ \ \ (5)

The path integral now includes only modes with {|k|<\Lambda}, while all the modes above this scale have been integrated out in the effective action. This is the manner in which the twin goals of regularization and renormalization are achieved. The cutoff removes the UV divergence from the path integral, but since physical quantities cannot depend on {\Lambda}, the couplings are rescaled in the effective action in such a way as to cancel out any explicit dependence. In other words, integrating out the UV modes imposes a {\Lambda}-dependence on the couplings, which is quantified by the beta function:

\displaystyle \beta(g)\equiv\frac{\partial g}{\partial\ln\Lambda}=\Lambda\frac{\partial g}{\partial\Lambda}~. \ \ \ \ \ (6)

As an aside, note that unlike in canonical approaches such as cutoff or dimensional regularization, where the cutoff is merely a computational tool, in the Wilsonian approach the cutoff corresponds to a physical scale. In condensed matter systems for example, it corresponds to the lattice spacing, which provides a natural UV regulator. In high-energy systems, it corresponds to the energy scale at which the effective action breaks down.

The dependence of the couplings on the energy scale given by the beta function is known as the running of the couplings. To understand this, consider what happens as we lower the cutoff from {\Lambda} to {b\Lambda}, with {b<1}. Clearly, Fourier modes {\phi(k)} between {b\Lambda<|k|<\Lambda} will be integrated out in the effective action {S_E^\mathrm{eff}[\phi;b\Lambda]}, while the path integral measure now runs over {|k|<b\Lambda}. Note that we also set {J(k)=0} for {|k|>b\Lambda}. At this level, one can see that the process of integrating out high-energy degrees of freedom is associative: a subsequent reduction to {b_2\Lambda} with {b_2<b<1} from our current point is equivalent to directly integrating out all modes above {b_2\Lambda}. Furthermore, while modes above the cutoff scale are no longer explicitly present in the effective action, their physics is encoded in the renormalization of the parameters. The beta function above is essentially a description of how these parameters depend on the coupling as we lower the cutoff scale, progressively integrating out more and more high-energy modes in the process.

Note that since we integrate out degrees of freedom in the course of flowing from the UV to the IR, Wilsonian renormalization is in fact a form of coarse graining, and entails an irreversible loss of information. In this sense, the renormalization group is really only a half a group: once one flows to the IR, one can’t flow back. The key is that this loss of information is undetectable to any low-energy observer: the renormalization prescription is designed so that correlation functions below the cutoff are preserved. Hence the name effective field theory to describe the resulting theory, which is effectively valid up to the cutoff, but not beyond.

Actually computing the beta function requires a calculation in perturbation theory, in particular the isolation of the aforementioned loop divergences. To take the well-known example of {\phi^4} theory, the Feynman rules lead to the following one-loop correction to the 2-pt function:

\displaystyle \Gamma^{(2)}_\mathrm{1~loop}=-\frac{g}{2}\int\frac{\mathrm{d}^dp}{(2\pi)^d}\frac{1}{p^2+m^2-i\delta} \ \ \ \ \ (7)

where {\Gamma^{(n)}_\mathrm{1~loop}} is the generator of one-particle irreducible (1PI) {n}-point correlation functions (note that this is unfortunately sometimes also called the “effective action”, not to be confused with {S^\mathrm{eff}}), written here in Euclidean signature. Now, according to the RG prescription above, modes above the cutoff {\Lambda} are integrated out, so for {d=4} we would adjust the integration measure as

\displaystyle \int\mathrm{d}^4p\rightarrow2\pi^2\int_0^\Lambda p^3\mathrm{d} p~, \ \ \ \ \ (8)

where the {2\pi^2} comes from the volume of the {S^3}. However, while this particular integral can be readily evaluated within this restricted momentum range (see, for example, David Skinner’s Advanced QFT notes, chapter 6, “Perturbative Renormalization”), most expressions are much less tractable. (Consider, by way of analogy, that the simplicity of Gaussian integrals depends crucially on the infinite domain of integration). For this reason, in practice one usually resorts to other calculational methods, most commonly dimensional regularization. Let us see how this works for the case at hand.

The integral above can be performed for general {d} to yield

\displaystyle \Gamma^{(2)}_\mathrm{1~loop}=-\frac{g}{2}\frac{\Gamma\left(1-\frac{d}{2}\right)}{(4\pi)^{d/2}}m^{(d-2)/2}~, \ \ \ \ \ (9)

which is clearly divergent for {d=4}. Dimensional regularization deals with this by instead working in dimension {d=4-\epsilon}, and then considering the limit {\epsilon\rightarrow0}. By expanding in this limit, one can isolate the singular and non-singular contributions as

\displaystyle \Gamma^{(2)}_\mathrm{1~loop}=\frac{1}{2}\frac{g}{16\pi^2}m^2\left(\frac{2}{\epsilon}+1-\ln m^2\right)\left( e^{-\gamma}4\pi\right)^{\epsilon/2}~, \ \ \ \ \ (10)

where {\gamma} is the Euler-Mascheroni constant. Neglecting the finite parts, we therefore have

\displaystyle \Gamma^{(2)}_\mathrm{1~loop, div}=\frac{1}{2}\frac{g}{16\pi^2}m^2\ln\Lambda~, \ \ \ \ \ (11)

where, for future purposes, we’ve replaced the simple pole in {\epsilon} with {\ln\Lambda} (this is not an obvious replacement, but follows by comparing the result here with what one would have obtained in the cutoff prescription).

The reason we only identify the beta function with the divergent piece is because, as stated above, we want to identify a function that describes the running of the couplings as we dial the energy scale. Furthermore, we want a description that does not depend on the details of our regularization scheme. With this in mind, observe that there are three kinds of terms in (10). The convergent (constant) terms have no dependence on the cutoff, and therefore won’t tell us anything about the running of the couplings. Terms that diverge as {\Lambda^p} to some positive power {p}, meanwhile, essentially give the relationship between the bare couplings and the renormalized values at a particular scale, but don’t tell us anything about the flow between scales. The log divergence is therefore the only interesting term in this regard: it isn’t associated with any particular scale (said differently, it receives contributions from all scales), and is insensitive to any scheme-dependent behaviour (such as exhibited in the constant terms). Before we proceed, it is worth noting that while dimensional regularization does provide a useful tool for regulating individual loop integrals over the full range {|p|\!\in\![0,\infty)}, it does not guarantee finiteness of the path integral as in the Wilsonian approach (where the UV regime is simply absent). Additionally, while the Wilsonian cutoff {\Lambda} has a physical interpretation as the energy scale, the non-integer status of the dimensions has no such ontic value. Nevertheless, it is of great practical convenience, particularly in gauge theories.

Now, to continue our search for the beta function, let us repeat the above for the 1-loop contribution to the 4-point function,

\displaystyle \Gamma^{(4)}_\mathrm{1~loop}=\frac{g^2}{2}\int\frac{\mathrm{d}^dp}{(2\pi)^d}\frac{1}{p^2+m^2-i\delta}\frac{1}{(p-q)^2+m^2-i\delta} \ \ \ \ \ (12)

which occurs at order {g^2}. The integral in this case is more involved, but can be managed with the use of Feynman’s trick,

\displaystyle \frac{1}{AB}=\int_0^1\frac{\mathrm{d} x}{\left( xA+(1-x)B\right)^2}~, \ \ \ \ \ (13)

which is explained in, e.g., Jim Cline’s Advanced QFT notes. After a fair amount of work, dimensional regularization eventually yields

\displaystyle \Gamma^{(4)}_\mathrm{1~loop,div}=\frac{3g^2}{16\pi^2}\ln\Lambda~, \ \ \ \ \ (14)

for the singular contribution.

Having regularized the divergences in (11) and (14), we must now renormalize the Lagrangian appropriately. The simplest means of doing so is the minimal subtraction scheme, wherein we only compensate for divergences arising at the one-loop level. The idea is to define a bare Lagrangian

\displaystyle \mathcal{L}_\mathrm{bare}=\mathcal{L}+\mathcal{L}_\mathrm{ct}~, \ \ \ \ \ (15)

where {\mathcal{L}} is the original Lagrangian, and {\mathcal{L}_\mathrm{ct}} is the Lagrangian consisting of counterterms to compensate for divergences. In the present example of {\phi^4} theory,

\displaystyle \mathcal{L}_\mathrm{bare}=-\frac{1}{2}\partial^\mu\phi_0\partial_\mu\phi_0-\frac{m_0^2}{2}\phi_0^2-\frac{g_0}{4!}\phi_0^4~, \ \ \ \ \ (16)

where the subscript zero denotes that the fields and couplings are bare quantities. These do not depend on {\Lambda}, but include the contributions from divergences and are therefore infinite. This is in contrast to the renormalized quantities (without subscript) implicit on the r.h.s. of (15). We shall determine the precise relationship between the bare and renormalized parameters below; it relies on the fact that finite values for the latter are obtained through the introduction of the Lagrangian of counterterms,

\displaystyle \mathcal{L}_\mathrm{ct}=-\frac{A}{2}\partial^\mu\phi\partial_\mu\phi-\frac{B}{2}\phi^2-\frac{C}{4!}\phi^4~, \ \ \ \ \ (17)

where the coefficients {A}, {B}, and {C}, are fixed by our considerations above by observing that, at tree level, {\mathcal{L}_\mathrm{ct}} yields additional contributions to the vertex functions of the form

\displaystyle \Gamma^{(2)}_\mathrm{tree,ct}=-Ap^2-B~,\;\;\; \Gamma^{(4)}_\mathrm{tree,ct}=-C~. \ \ \ \ \ (18)

Therefore, if we wish to cancel the divergences (11) and (14), we must define

\displaystyle A\equiv0~,\;\;\;B\equiv\frac{gm^2}{16\pi^2}\ln\Lambda~,\;\;\;C\equiv\frac{3g^2}{16\pi^2}\ln\Lambda~, \ \ \ \ \ (19)

so that, to one-loop level, the total contribution {\Gamma^{(n)}_\mathrm{tree,ct}+\Gamma^{(n)}_\mathrm{1~loop,div}} no longer has a divergence as {\Lambda\rightarrow\infty}. (In the language of dimensional regularization, we’ve removed the simple pole at {\epsilon=0}).

Of course, had we gone beyond the minimal subtraction scheme to consider higher loops, additional counterterms would be required (for example, {A} becomes non-trivial at 2-loops). But it turns out that the same basic idea can be performed systematically to all orders in perturbation theory. This is the subject of multiplicative renormalization, which consists in showing that all infinities can be reabsorbed in a finite number of coupling constants (including masses). Finite results are then obtained in the infinite cutoff limit, {\Lambda\rightarrow\infty} (equivalently {\epsilon\rightarrow0} in the dimensional approach), which corresponds to including the full UV regime of the original theory. Theories in which this program can be carried out successfully are called renormalizable. (In fact, renormalizability implies that all counterterms are of the same form as those in the original Lagrangian, which we already assumed in (17)). In contrast, theories requiring an infinite number of counterterms are non-renormalizable—gravity being the most notorious example.

To relate the bare quantities ({\phi_0}, {m_0}, {g_0}) to the renormalized quantities ({\phi}, {m}, {g}) requires one more piece of information, namely the behaviour of the kinetic term as we change the energy scale. Recalling the Wilsonian approach above, this term is no different than the others in that it receives quantum corrections as we integrate out UV modes. We thus define the field renormalization factor {Z_\phi} (not to be confused with the partition function {Z}), which depends on {\Lambda} such that {\phi_0=Z_\phi^{1/2}\phi}. (In fact, in the Wilsonian approach, this is sometimes labeled {Z_\Lambda}, to reflect the fact that at a new scale {\Lambda'}, renormalizing the field requires a different factor {Z_{\Lambda'}\neq Z_\Lambda}. But here we only care about the final result; again, the RG is associative). Now, comparing {\mathcal{L}_\mathrm{bare}} and {\mathcal{L}_\mathrm{ct}}, we have

\displaystyle Z_\phi=1+A~,\;\;\;m_0^2=\frac{m^2+B}{Z_\phi}=\frac{m^2+\delta m^2}{Z_\phi}~,\;\;\;g_0=\frac{g+C}{Z_\phi^2}=\frac{g+\delta g}{Z_\phi^2}~. \ \ \ \ \ (20)

At this level, these expressions already describe how the couplings vary, but this is made more concrete in the renormalization group equation (a.k.a. the Callan-Symanzik equation), which we now derive.

First, observe that the field renormalization factor implies that the bare and renormalized {n}-point correlators are related by an overall scaling,

\displaystyle \Gamma^{(n)}_0\left( p_1,\ldots,p_n\right)=Z_\phi^{-n/2}\Gamma^{(n)}\left( p_1,\ldots,p_n\right)~, \ \ \ \ \ (21)

which simply follows from the fact that there are {n}-fields in the correlation function, i.e.,

\displaystyle \left<\phi(x_1)\ldots\phi(x_n)\right)\sim\int\mathcal{D}\phi e^{-S_E}\phi(x_1)\ldots\phi(x_n)~. \ \ \ \ \ (22)

Since {\Gamma^{(n)}_0} is independent of {\Lambda}, it should remain unchanged under RG, hence

\displaystyle 0=\Lambda\frac{\mathrm{d}\Gamma^{(n)}_0}{\mathrm{d}\Lambda}=\Lambda\frac{\mathrm{d}}{\mathrm{d}\Lambda}\left( Z_\phi^{-n/2}\Gamma^{(n)}\right) =\Lambda\left(\frac{\mathrm{d} Z_\phi^{-n/2}}{\mathrm{d}\Lambda}\Gamma^{(n)}+Z_\phi^{-n/2}\frac{\mathrm{d}\Gamma^{(n)}}{\mathrm{d}\Lambda}\right)~. \ \ \ \ \ (23)

And therefore, by the chain rule,

\displaystyle 0=\Lambda Z_\phi^{-n/2}\left(\frac{-n}{2Z_\phi}\frac{\partial Z_\phi}{\partial\Lambda}+\frac{\partial}{\partial\Lambda}+\frac{\partial g}{\partial\Lambda}\frac{\partial}{\partial\Lambda}+\frac{\partial m}{\partial\Lambda}\frac{\partial}{\partial\Lambda}\right)\Gamma^{(n)}~. \ \ \ \ \ (24)

Multiplying through by {Z_\phi^{n/2}}, we obtain the aforementioned RG equation,

\displaystyle 0=\left(\Lambda\frac{\partial}{\partial\Lambda}+\beta(g)\frac{\partial}{\partial g}+m\gamma_m\frac{\partial}{\partial m}-n\gamma\right)\Gamma^{(n)}~, \ \ \ \ \ (25)

where we have defined

\displaystyle \beta(g)\equiv\Lambda\frac{\partial g}{\partial\Lambda}~,\;\;\; \gamma_m\equiv\frac{\Lambda}{m}\frac{\partial m}{\partial\Lambda}~,\;\;\; \gamma\equiv\frac{\Lambda}{2Z_\phi}\frac{\partial Z_\phi}{\partial\Lambda}~. \ \ \ \ \ (26)

The first of these is the promised beta function, while {\gamma_m} and {\gamma} are the anomalous dimension of the mass and field, respectively. The name stems from the fact that the renormalized correlation function behaves as if the field scaled with mass dimension {(d-2)/2+\gamma} rather than {(d-2)/2}; similarly for {\gamma_m}. Both of these can be viewed as beta functions for the mass and kinetic terms. The former, after all, is fundamentally no different. Indeed, it is a basic exercise to show that if one treats the mass as an interaction term and sums the resulting Feynman diagrams that contribute to the 2-point function to all orders in {m}, one recovers the usual (massive) propagator.

As stated above, the beta function describes the running of the coupling as we flow from the UV to the IR. Operators that are suppressed as we flow into the IR are called irrelevant. Conversely, operators that becomes increasingly important are called relevant. At the border of these two regimes are operators which are unaffected by the energy scale, which are called marginal. (The terminology can be remembered by asking which operators are relevant, in the colloquial sense, for everyday life (i.e., low-energy physics)). Modulo one significant caveat that we’ll mention shortly, this behaviour can be read-off directly from the Lagrangian. Since the action must be dimensionless, and the measure {\mathrm{d}^dx} has mass dimension {-d} (and {[\partial_\mu]=1}) the parameters in the {\phi^4} Lagrangian must have

\displaystyle [\phi]=\frac{d-2}{2}~,\;\;\;[m]=1~,\;\;\;[g]=d-4~. \ \ \ \ \ (27)

However, the perturbative expansion relies on setting, e.g., {g<<1}, which makes no sense if {g} is dimensionful. More generally, for a given coupling {g_n} with {[g_n]=d-n}, {n<d} implies that the correct dimensionless parameter is {g/\Lambda^{n-d}}, and thus {g_n} controls an interaction that becomes increasingly important at low energies {\Lambda<<g_n}. In this case the hypothetical term {g_n\phi^n} is relevant. Conversely, {n>d} implies that we perform an expansion in {g\Lambda^{n-d}}, in which case the interaction becomes less and less important as {\Lambda} becomes small; hence, irrelevant. The marginal case is {n\!=\!d}, in which case we really can expand in {g_n}, since it’s already dimensionless.

The aforementioned caveat is that quantum corrections can modify the RG behaviour of the coupling. In the example above, the classical mass dimension of the field in {\phi^4} theory is modified by the anomalous dimension {\gamma}. In particular, one must watch out for marginal operators that become either marginally relevant or marginally irrelevant under RG. Such operators actually play an important role in phenomenology. More generally, the fact that operators can mix under RG flow is important in, for example, the possible emergence of gauge fields.

The existence of an infinite-dimensional space of theories whose coordinates is the set of all possible couplings in the effective action implies the existence of an infinite number of irrelevant operators. In contrast, since each additional field or derivative increases the dimension of an operator, there are only finitely many relevant operators (and typically very few). We define the critical surface to be the infinite-dimensional space in the UV where all relevant operators vanish (which has finite codimension, for the reason just stated). As we flow to the IR — which we might accomplish by perturbing away from the critical surface by the introduction of some relevant operator(s) — we follow a trajectory through the space of theories until we reach a critical or fixed point, where all beta functions vanish. If the beta function has a zero at {g_*}, and is positive for {g<g_*}, then {g\rightarrow g_*} as {\Lambda\rightarrow\infty}. In this case, {g_*} is a UV fixed point. Alternatively, if {\beta(g)>0} for {g<g_*}, then {g\rightarrow0} as {\Lambda\rightarrow\infty}; since we dialed the energy in the same direction, it’s still a UV fixed point, but the vanishing of the coupling implies that the theory becomes asymptotically free, a feature that characterizes certain non-abelian gauge theories, notably QCD. An IR fixed point, in obvious contrast, is obtained from a similar analysis with {\Lambda\rightarrow0} (though asymptotic freedom uniquely refers to the UV case).

Another case that deserves mention is the possibility that the beta function diverges at some finite {\Lambda}. This is an obvious pathology, since it implies that the coupling constant — i.e., the interaction strength — becomes infinite. But in fact, this is precisely what happens in our beloved {\phi^4} theory, and is a common feature of theories which are not asymptotically free, such as QED. One possible solution to this is that the fully renormalized coupling actually goes to zero as we take the cutoff scale to infinity. The proposed mechanism by which this quantum triviality comes about is via vacuum fluctuations (essentially, corrections from the self-energy of the field), which completely screen the interaction in the absence of a cutoff. (This is sometimes referred to as charge screening, in analogy with electrodynamics). The alternative is to suppose that the perturbative expansion simply breaks down at strong coupling, since the pathology appears at one- or two-loop level, in which case non-perturbative methods must be used to address the issue, such as in lattice gauge theory. We don’t normally concern ourselves with triviality in {\phi^4} theory, because the energy scale at which it occurs is inaccessibly high. However, field theories involving only a scalar Higgs boson in four dimensions also suffer from quantum triviality, but at a scale that may be accessible to the LHC; the possible inconsistency of such theories is an open area of research.

There is of course a great deal more to be said about the RG; see for example Skinner’s explanation for how it vastly simplifies the Feynman diagram expansion, or Polchinski’s interpretation of RG as a form of heat flow. There’s also the interesting fact that the RG flow entails a shift in the vacuum energy (Skinner, page 39), which suggests both complications for renormalization in curved spacetime as well as a tantalizing hint towards the emergence of the radial direction in AdS/CFT—namely, holography as RG flow, a fascinating research direction in its own right.

Posted in Physics | Leave a comment

Measurement and evolution

In an earlier post, we sketched the basic mathematical description of quantum mechanics, culminating in the general description of quantum states as (reduced) density matrices. We also claimed that generic measurements are not orthogonal projections, and evolution is not unitary. We shall here expand upon the aforementioned infrastructure to explain these statements, resolving some un-answered questions in the process. We shall again draw from Preskill’s Quantum Information and Computation course notes, as well as a lecture given by Mario Flory on POVMs and superoperators.

The naïve picture is that, as a consequence of Schmidt decomposition, one can write the density matrix for a mixed state as an ensemble of orthogonal pure states, the eigenvalues of which are interpreted as the probability of their occurring. When we measure the system, we project onto one of these eigenstates, hence the notion of measurements as orthogonal projections. And indeed this works fine for isolated systems; but as explained previously, this is an idealization. The problem that demands a more generalized notion of measurement is that an orthogonal measurement in a tensor product {\mathcal{H}_A\otimes\mathcal{H}_B} is not necessarily orthogonal if we restrict to subsystem {A} alone.

Let us first make the notion of orthogonal projections a bit more precise, following von Neumann’s treatment thereof. To perform a measurement of an observable {M}, we couple the system to some classical pointer variable that we can actually observe, in the literal sense of the word. In particular, we assume that the pointer is sufficiently heavy that the spreading of its wavepacket can be neglected during the measurement process (it is classical, after all). The Hamiltonian describing the interaction of the pointer with the system is then approximated by {H=\lambda MP}, where {\lambda} is the coupling between the pointer’s momentum {P} and the observable under study. The time evolution operator is therefore

\displaystyle U(t)=\mathrm{exp}\left(-i\lambda tMP\right)=\sum_i\left|i\right>\mathrm{exp}\left(-i\lambda t M_iP\right)\left<i\right|~, \ \ \ \ \ (1)

where in the second equality we’ve expanded {M} in the diagonal basis, {M=\sum_i\left|i\right>M_i\left<i\right|}. (Note that we are implicitly assuming that either {\left[M,H_0\right]=0}, where {H_0} is the original, unperturbed Hamiltonian, or that the measurement occurs so quickly that free evolution of the system can be neglected throughout. We’re also suppressing hats/bold-print on the operators, since this is clear from context).

Since {P=-i\partial_x} is the generator of translations for the pointer, it shifts the position-space wavepacket thereof by some amount {x_0}: {e^{-ix_0P}\psi(x)=\psi(x-x_0)}. Thus, if the system is initially in a superposition of {M} eigenstates unentangled with the state of the pointer {\left|\psi(x)\right>}, then after time {t} it will evolve to

\displaystyle U(t)\left(\sum_i\alpha_i\left|i\right>\otimes\left|\psi(x)\right>\right) =\sum_i\alpha_i\left|i\right>\otimes\left|\psi\left( x-\lambda tM_i\right)\right>~. \ \ \ \ \ (2)

Now the position of the pointer is correlated with the value of the observable {M}. Thus, provided the pointer’s wavepacket is sufficiently narrow such that we can resolve all values of {M_i} (namely, {\Delta x\lesssim\lambda t\Delta M_i}, which can be guaranteed by making the pointer sufficiently massive since {\Delta x\gtrsim1/\Delta p=(mv)^{-1}}), observing that the position of the pointer has shifted by {\lambda tM_i} is tantamount to measuring the eigenstate {\left|i\right>}, which occurs with probability {\left|\alpha_i\right|^2}. In this manner, the initial state of the quantum system, call it {\left|\phi\right>}, is projected to {\left|i\right>} with probability {\left<i|\phi\right>^2}. This is von Neumann’s model of orthogonal measurement, which involves so-called projection valued measurements, or PVMs.

Of course, in principle the measurement process could project out some superposition of eigenstates, rather than a single position eigenstate as in the above example. Indeed, if we can couple any observable to a pointer, then we can perform any orthogonal projection in Hilbert space. Thus to formulate the above more generally, consider a set of projection operators {P_a} such that {\sum_aP_a=1}. Carrying out the measurement procedure above takes the initial (pure) state {\left|\phi\right>\left<\phi\right|} to

\displaystyle \frac{P_a\left|\phi\right>\left<\phi\right|P_a}{\left<\phi|P_a|\phi\right>} \ \ \ \ \ (3)

with probability

\displaystyle \mathrm{Prob}(a)=\left<\phi|P_a|\phi\right>~, \ \ \ \ \ (4)

as usual.

Thus far we have been referring to measurements on a single isolated Hilbert space, for which PVMs suffice. But in practice we only ever deal with subsystems, for which our concept of measurement must be suitably extended. As we shall see, the relevant entities for the job are positive operator valued measures, or POVMs. The key difference between a POVM and a PVM is that the latter are a subset of the former for which the eigenstates are orthogonal by construction.

Mathematically, a POVM is a measure (basically, a partition of unity) whose values are non-negative self-adjoint operators on Hilbert space. That is, denoting the set of operators that comprise the POVM by {\{F_a\}}, it has the properties {F_a=F_a^\dagger}, {\left<\psi|F_a|\psi\right>\geq0}, and {\sum_aF_a=1}, where {\left|\psi\right>\in\mathcal{H}}. The idea is that a POVM element {F_a} is assigned to every possible measurement result such that {\left<\psi|F_a|\psi\right>=\mathrm{Prob}(a)} (hence the requirement that these sum to 1).

Given the positivity of the operators {F_a}, there exists a (not necessarily unique) set of so-called measurement operators {\{M_a\}} such that {F_a=M_a^\dagger M_a}. Introducing these operators allows one to express the state immediately after measurement in the usual manner:

\displaystyle \left|\psi_a\right>=\frac{M_a\left|\psi\right>}{\left<\psi\right|M_a^\dagger M_a\left|\psi\right>^{1/2}}~. \ \ \ \ \ (5)

Note that this expression is precisely the same as that given for PVMs above; in other words, {M_a=P_a} identically. The difference here is that in the case of a POVM, repeated measurement will not necessarily yield the same result. This is because unlike the {P_a}, which are idempotent orthogonal projection operators, the {F_a} are not projectors, and hence the state after measurement does not exist in a single orthogonal eigenstate. The PVM {\{P_a\}}, which is used in decomposing an observable {A=\sum_aa_aP_a}, corresponds to the special case of a POVM with {F_a=P_a\left(=M_a\right)}.

To elaborate on this slightly further, let us take the familiar example of a tensor product space {\mathcal{H}=\mathcal{H}_A\otimes\mathcal{H}_B}, containing an initial state {\rho_{AB}=\rho_A\otimes\rho_B} and a PVM given by {\{P_a\}}. We now wish to restrict our attention to {\mathcal{H}_A}, so we define a new set of operators {\{F_a\}} acting thereupon that faithfully reproduces the outcome labeled by index {a} of a measurement on {\mathcal{H}}, namely:

\displaystyle \mathrm{Prob}(a)=\mathrm{tr}\left( P_a\rho_{AB}\right)=\mathrm{tr}_A\left(\mathrm{tr}_B\left( P_a\rho_{AB}\right)\right)\equiv\mathrm{tr}_A\left({F_a\rho_A}\right)~. \ \ \ \ \ (6)

We may obtain an explicit expression for {F_a} by writing this expression in component form. Recall that a reduced density matrix can be written in terms of basis vectors as

\displaystyle \rho_A=\mathrm{tr}_B\left(\left|\psi\right>\left<\psi\right|\right)=\sum_{ijm}a_{mj}^*a_{ij}\left|i\right>_{A~A}\left<m\right|~. \ \ \ \ \ (7)

Since {j} is a dummy index, this requires two indices when written in matrix notation, {\left(\rho_A\right)_{im}}. This implies that four indices will label the tensor product {\rho_{AB}=\rho_A\otimes\rho_B}. The quantity {F_a\rho_A} therefore carries two free indices (since {F_a} is a map from {\mathcal{H}_A\rightarrow\mathcal{H}_A}), and similarly {P_a\rho_{AB}} carries four, all of which will be summed over when taking the appropriate traces. Hence the above expression, in component form, is

\displaystyle \begin{aligned} \sum_{ijmn}\left( P_a\right)_{nj,mi}\left(\rho_A\right)_{ij}&\left(\rho_B\right)_{mn}=\sum_{ij}\left( F_a\right)_{ji}\left(\rho_A\right)_{ij}\\ \implies\left( F_a\right)_{ji}=&\sum_{mn}\left( P_a\right)_{nj,mi}\left(\rho_B\right)_{mn}~, \end{aligned} \ \ \ \ \ (8)

where {\{\left|i\right>\}}, {\{\left|j\right>\}} and {\{\left|m\right>\}}, {\{\left|n\right>\}} are orthonormal bases for {\mathcal{H}_A} and {\mathcal{H}_B}, respectively. With this expression for {F_a} in hand, one can show (see, e.g., Preskill p87) that the {F_a} do indeed satisfy the properties claimed for it above, namely Hermiticity, positivity (non-negativity), and completeness {\left(\sum_aF_a=I_A\right)}. As we have emphasized however, they are not necessarily orthogonal, which is again the crucial difference between POVMs and PVMs. Indeed, the number of {F_a}‘s is limited by the dimension of the total Hilbert space {\mathcal{H}}, which may be arbitrarily greater than that of {\mathcal{H}_A}.

As one might have expected given that POVMs act on subspaces, a POVM can be lifted to a PVM by expanding the Hilbert space of the former and performing the latter in the resulting superspace. This is the content of Neimark’s (sometimes transliterated from the Cyrillic “Наймарк” as “Neumark”) theorem. Note that the converse also holds: any PVM on a Hilbert space reduces to a POVM on any subspace thereof. This means that one can realize a POVM as a PVM on an enlarged Hilbert space, which allows one to obtain the correct measurement probabilities (by which we mean, the relative weights in the ensemble; see below) by performing orthogonal projections. Conversely, an orthogonal measurement of a bipartite system {\mathcal{H}_A\otimes\mathcal{H}_B} may be a nonorthogonal POVM on {A} alone.

In addition to the crucial role they play in measurement, POVMs are useful for formulating a suitable generalization of evolution that applies to subsystems. By way of example, suppose the initial state in {\mathcal{H}=\mathcal{H}_A\otimes\mathcal{H}_B} is given by {\rho_{AB}=\rho_A\otimes\left|0\right>_{BB}\left<0\right|}. Since evolution of the total bipartite system is unitary, it is described by the action of a unitary operator {U_{AB}},

\displaystyle U_{AB}\left(\rho_A\otimes\left|0\right>_{BB}\left<0\right|\right) U_{AB}^\dagger~, \ \ \ \ \ (9)

whereupon the density matrix of subsystem {A} is

\displaystyle \rho'_A=\mathrm{tr}_B\left( U_{AB}\left(\rho_A\otimes\left|0\right>_{BB}\left<0\right|\right) U_{AB}^\dagger\right) =\sum_n{}_B\left<n\right|U_{AB}\left|0\right>_B\rho_A{}_B\left<0\right| U_{AB}^\dagger\left|n\right>_B~, \ \ \ \ \ (10)

where {\{\left|n\right>\}} is an orthonormal basis for {\mathcal{H}_{B}}, and {{}_B\left<n\right|U_{AB}\left|0\right>_B\equiv M_n} is an operator acting on {\mathcal{H}_{A}}. Note that it follows from the unitarity of {U_{AB}} that

\displaystyle \sum_nM_n^\dagger M_n =\sum_n{}_B\left<0\right|U_{AB}^\dagger\left|n\right>_{BB}\left<n\right| U_{AB}\left|0\right>_B ={}_B\left<0\right|U_{AB}^\dagger U_{AB}\left|0\right>_B =I_A~. \ \ \ \ \ (11)

We may thus expression {\rho'_A} succinctly as

\displaystyle \rho'_A=\sum_nM_n\rho_A M_n^\dagger\equiv\$\left(\rho_A\right)~, \ \ \ \ \ (12)

where {\$} is a linear map that takes density matrices to density matrices (linear operators to linear operators). Such a map, when the above property of {M_n} is satisfied, is called a superoperator, which we’ve written here in the so-called operator sum or Kraus representation. The operator sum representation of a given superoperator {\$} is not unique, since performing the trace over {\mathcal{H}_B} in a different basis would lead to different measurement operators {N_i}. However, any two operator sum representations of the same superoperator are related by a unitary change of basis, e.g., {N_i=U_{in}M_n} (in other words, the {M_n} may be thought of as a particular choice of the {E_a} considered above).

The mapping {\$:\rho\rightarrow\rho'} inherits the usual properties from {\rho}: it is Hermitian, positive, and trace-preserving ({\mathrm{tr}\rho'=1} if {\mathrm{tr}\rho=1}). But these are not quite sufficient to ensure that our bipartite system evolves unitarily. The basic reason is that we are limiting our attention to subsystem {A}, and have no guarantee that there does not exist an uncoupled system {B} that evolves in such a manner as to screw things up. To amend this, we demand that {\$_A} instead satisfy complete positivity: given any extension of {\mathcal{H}_A} to {\mathcal{H}_A\otimes\mathcal{H}_B}, {\$_A} is completely positive in {\mathcal{H}_A} if {\$_A\otimes I_B} is positive for all such extensions. For an example of the necessity of this requirement, see Preskill p97-98 for an exposition of the transposition operator, {T:\rho\rightarrow\rho^T}, which is a positive operator that is not completely positive.

In addition to these three necessary properties, it is also customary to assume that {\$} is linear. As alluded in the previous post on the subject, non-linear evolution is difficult to reconcile with the ensemble interpretation, due to the inherently linear nature of probability. In some sense, linearity is demanded by the probabilistic interpretation — and indeed, as explained in Preskill, non-linear evolution can lead to rather strange consequences — but I’m not aware of any rigorous proof. Nonetheless, for the time being we shall demand this property of superoperators as well.

Unitary evolution, for an isolated system, is described by the Schrödinger equation. The analagous equation for general evolution by superoperators is called the Master equation. Preskill elaborates on this in some detail in section 3.5, but we will restrain ourselves from getting involved in such details here. Instead, we merely observe that unitary evolution can be thought of as the special case in which the operator sum contains only a single term. Under unitary evolution, pure states can only evolve to pure states:

\displaystyle \left|\psi\right>\left<\psi\right| \rightarrow U\left(\left|\psi\right>\left<\psi\right|\right) U^\dagger =\left|\psi'\right>\left<\psi'\right|~, \ \ \ \ \ (13)

and similarly mixed states remain mixed. But superoperators allow the evolution of pure states to mixed states. This is called decoherence. It is the process by which initially pure states become entangled, and consequently, it plays a fundamental role in both the mathematics of quantum mechanics and the (philosophical) interpretation thereof.

To connect back to our earlier example, suppose we perform a POVM on {\mathcal{H}_A}. By (11) and (12), this is tantamount to evolving the system with a superoperator that takes

\displaystyle \rho\rightarrow\sum_a\sqrt{F_a}\rho\sqrt{F_a}~. \ \ \ \ \ (14)

By Neimark’s theorem, the POVM {\{F_a\}} has a unitary representation on the bipartite space {\mathcal{H}}, meaning that there exists a unitary {U_{AB}} such that

\displaystyle U_{AB}:\left|\phi\right>_A\otimes\left|0\right>_B\rightarrow\sum_a\sqrt{F_a}\left|\phi\right>_A\otimes\left|a\right>_B~. \ \ \ \ \ (15)

In other words, the bipartite system undergoes a unitary transformation that entangles {A} with {B},

\displaystyle \left|\phi\right>_A\left|0\right>_B\rightarrow\sum_aM_a\left|\phi\right>_A\left|0\right>_B~. \ \ \ \ \ (16)

We could thus describe the measurement by a PVM on {\mathcal{H}_B} that projects onto {\{\left|a\right>\}} with probability

\displaystyle \mathrm{Prob}(a)=_A\left<\phi\right|M_a^\dagger M_a\left|\phi\right>_A=\mathrm{tr}\left( F_a\rho_A\right)~, \ \ \ \ \ (17)

where the second equality follows from comparison with (6). Normalizing the final state accordingly, we may write (14) as

\displaystyle \rho\rightarrow\$\rho=\frac{\sqrt{F_a}\rho_A\sqrt{F_a}}{\mathrm{tr}\left( F_a\rho_A\right)}~. \ \ \ \ \ (18)

We mentioned previously that for POVMs, repeated measurements will not necessarily yield the same result. Now we see why: the result of such a general measurement (that is, on a subsystem) is given an ensemble of pure states, and thus we require a description in terms of a density matrix rather than as a single (orthogonal) eigenstate.

This is also the description we would use if we knew only that a measurement had been performed, but were ignorant of the results. For example, suppose we perform a measurement by probing the system with a single particle (say, a photon from a laser). Immediately after the interaction with the probe, but before the interaction with the classical detector that records it, the system is in an entangled state. We would thus describe the process as evolution by a superoperator that produces a density matrix/ensemble as above. In other words, the system has slightly decohered: if the initial state were pure, some of the coherence has been lost upon evolution to a mixed state. The subsequent interaction with the (classical) detector that we colloquially think of as “measurement” is simply the same process of decoherence on a hugely expanded scale: the (now mixed) state becomes entangled with the trillions of particles that comprise the detector, decohering essentially instantaneously to a classical state. All the uniquely quantum information of the system has now been lost.

This is what is referred to as “collapse of the wavefunction” in the Copenhagen interpretation. The reason for the invalidity of this interpretation is that it posits a projection onto a single eigenstate as a result of observation (by which we simply mean, interaction with the measurement apparatus; anthropocentric language aside, consciousness is emphatically not involved in any fundamental way). But as we’ve seen above, a proper description of measurement is that of entanglement with the environment under evolution via superoperators. The measurement process proceeds by POVMs, not PVMs, on the (sub)system under study. And while at the end of the day one does arrive at an eigenstate in the expanded Hilbert space (that includes the measurement apparatus/detector/observer/etc), this is a consequence of decohering to a classical state, rather than directly projecting to it. Decoherence can thus be thought of as giving the appearance of wavefunction collapse; but as evidenced by the countless reams of confused literature on quantum foundations and related areas, it is most dangerous to indulge in such simplifications so blithely. (We note in passing that the “wavefunction of the universe” never decoheres, since evolution in an isolated system is unitary).

Another important fact that no doubt contributes to the collapse confusion is that decoherence is irreversible. Consider composing two superoperators to form a third: if {\$_1} describes the evolution from {t_0} to {t_1>t_0}, and {\$_2} describes the evolution from {t_1} to {t_2>t_1}, then {\$_1\circ\$_2} is a superoperator describing the evolution from {t_0} to {t_2}. But the inverse of a superopertor is only a superopertor if it is unitary. This is in stark contrast to unitary evolution, which is perfectly invertible: we can run the equations backwards as well as forwards. Not so for superoperators: inverting {\$_1\circ\$_2} will not result in a superoperator that evolves backwards from {t_2} to {t_0}. In other words, decoherence implies an arrow of time, and an irrevocable loss of quantum information. And while the former implication has philosophical implications which we shall not digress upon here, the latter is not at all surprising: as stated above, decoherence is the process by which quantum states become classical.

Several open questions remain. Perhaps chief among them is our failure to fully resolve the “disconcerting dualism” between deterministic evolution and probabilistic measurement. Insofar as probability is a statement of our ignorance and thus fundamentally epistemic, any formulation of quantum mechanics that relies thereupon is doomed to suffer the same characterization, for what does it mean to say that nature is fundamentally probabilistic? We may ask whether the associated lack of predictivity in quantum mechanics stems from the fact that there does not exist a state which is an eigenstate of all observables. One also wonders whether it is possible to formulate a consistent theory with non-linearly evolving superoperators, and what the interpretation thereof would be vis-à-vis probabilistic ensembles (that is, to what extent we can free ourselves from probability if we distance ourselves from the linearity it imposes). Zurek’s work on decoherence contains some clarifying insight into this issue, but that’s a subject for another post.

It is tempting to speculate that the issue of how to properly describe measurement and evolution lies at the heart of the black hole information paradox, wherein a black hole formed from the collapse of an initially pure state appears to evolve to a mixed state, in violation of the supposedly unitary S-matrix. Indeed, for various reasons, this picture is almost certainly too naïve. In particular, evolution is not unitary, but it remains to be shown precisely how a more ontologically accurate rendition of the problem would solve it.

Posted in Physics | Leave a comment