## Hilbert space factorization and quantum gravity

There’s a fundamental problem in gauge theory known as Hilbert space factorization. This has its roots in the issue of how local quantities (e.g., operators) are defined in quantum field theory, and has consequences for everything from entanglement and holographic reconstruction to quantum gravity at large.

In quantum mechanics, one is free to split the Hilbert space ${\mathcal{H}}$ into a tensor product ${\mathcal{H}_A\otimes\mathcal{H}_B}$. One can then define states in either subspace via reduced density matrices, and observables as self-adjoint operators acting thereupon. Things are more complicated in QFT, largely as a consequence of the fact that locality itself is far from fully tamed in this framework. Fields are, at the most basic level, simply space-time dependent objects that transform in a particular way under the Poincaré group (e.g., as scalars, vectors, spinors). In canonical quantization, the fields are promoted to operators, and the familiar commutation relations from quantum mechanics are applied. But a significant drawback of this approach is that it naturally relies on the Hamiltonian formalism carried over from quantum mechanics, and therefore requires a preferred choice of time. Insofar as the universe has no such bias, this is obviously unideal, hence the development of the path integral formulation which takes the Lagrangian as fundamental instead. This framework has the advantage of being manifestly covariant, though certain other features (such as unitarity of the S-matrix) are more obscure. Nonetheless, to make clear the role of Hilbert space factorization in field theory, we shall assume the canonical approach.

Upon quantizing, the fields are elevated to operators on Fock space, which is the Hilbert space completion of the direct sum of the (anti)symmetric tensors in the tensor product of single-particle Hilbert spaces:

$\displaystyle \mathcal{F}_{\pm}(\mathcal{H})=\bigoplus_{n=0}^\infty \sigma_{\pm}\mathcal{H}^{\otimes n} =\mathbb{C}\oplus\mathcal{H}\oplus\left(\sigma_\pm\left(\mathcal{H}\otimes\mathcal{H}\right)\right)\oplus\left(\sigma_\pm\left(\mathcal{H}\otimes\mathcal{H}\otimes\mathcal{H}\right)\right)\oplus\ldots~, \ \ \ \ \$

where ${\sigma_{\pm}}$ is the (anti)symmetrization operator, depending on whether the individual Hilbert spaces describe bosons (symmetrized, ${+}$) or fermions (antisymmetrized, ${-}$), and in the second equality we’ve implicitly assumed that the Hilbert space is complex. Thus Fock space is the (direct) sum of zero-, one-, two-, and etc-particle Hilbert spaces. The completion requirement (which is usually denoted in the above formula by adding an overbar) is necessary to ensure that this infinite sum converges. The mathematical details will not concern us here, but one important observation is that this encodes the factorization assumption from quantum mechanics at ground-level. For interacting theories, Fock space is at best only an approximation, but suffices for most perturbative calculations. In fact, strictly speaking this fails even for free theories; but this is a somewhat more abstract issue that we postpone to the end of the post. In the standard approach to QFT, Hilbert space factorization for free fields is assumed to go through relatively intact.

That is, at least, before one introduces gauge fields. The associated gauge constraints (e.g., Gauss’s law) obstruct the factorization of the global Hilbert space. The basic problem is that the elementary excitations for scalar fields are associated with points in space, and in this sense the field operators can be localized to either ${A}$ or ${\bar A}$. In contrast, the elementary excitations of gauge fields are associated with closed loops (for ${U(1)}$ fields, lines of electric flux). These cannot be classified as belonging to either ${A}$ or ${\bar A}$, since the set of loops which belong to both has non-zero measure: loops that cross the boundary between complementary regions violate Gauss’s law if one attempts to restrict to either. Thus the Hilbert space of physical states in gauge theory (that is, the space of states which satisfy Gauss’s law) does not admit a decomposition into a tensor product, ${\mathcal{H}\neq\mathcal{H}_A\otimes\mathcal{H}_{\bar A}}$.

To circumvent this, the trick employed in certain lattice models is to enlarge the Hilbert space of physical states to include open strings along the boundary between ${A}$ and ${\bar A}$. One simply cuts all the strings that cross the boundary, creating new states that are not invariant under gauge transformations. One then enlarges the original Hilbert space ${\mathcal{H}}$ to include such states. The resulting enlarged space ${\tilde{\mathcal{H}}}$ does factorize into ${\tilde{\mathcal{H}}_A\otimes\tilde{\mathcal{H}}_{\bar A}}$, but includes states which violate Gauss’s law only at the boundary. This is the so-called minimal extension, since Gauss’s law fails for the minimum number of states in the enlarged Hilbert space.

Note that cutting the boundary results in additional degrees of freedom that emerge as a result of the factorization. One further expects that these d.o.f. should contribute to the entanglement entropy between ${A}$ and ${\bar A}$. This hints at an important implication for emergent spacetime and leading area-dependence of entanglement entropy. See, for example, the closely related “edge modes” of Donnelly and Wall.

Of course, violating Gauss’s law is hardly a price we want to pay for a factorizable Hilbert space. Harlow’s idea that the gauge field is emergent is thus particularly elegant in this regard. In his 2015 paper, he considers the case of a ${U(1)}$ gauge symmetry associated with a vector potential ${A_\mu}$ in the bulk. In the vacuum state, reconstruction of gauge-invariant operators in the CFT can be accomplished by the choice of a suitable dressing, in particular Wilson lines which end on charged operators on the boundary. In sufficient deviations from vacuum however, the situation becomes more subtle.

Harlow considers in particular the case of AdS-Schwarzschild, specifically the eternal black hole dual to the thermofield double state (TFD). The two boundaries are connected through the bulk by a Wilson line that threads the bifurcation surface. However, it is not clear how to represent this bulk operator in the boundary. The problem is again that cutting the Wilson line (as in any naïve attempt to associate the part in the left- and right-bulk to the corresponding CFT) results in operators which are no longer gauge-independent, and thus violate Gauss’s law. This is precisely the same factorization issue encountered above, but it becomes even more puzzling in the context of AdS/CFT for the following reason: by construction, the global CFT in the TFD — and hence the microscopic Hilbert space of the theory — does factorize into a tensor product, seemingly in defiance of this wormhole-threading Wilson line.

To solve this problem in the ${U(1)}$ case, Harlow proposes an elegant mechanism by which the gauge field itself is emergent. The basic idea is to cut the Wilson line by placing oppositely charged bulk fields at each end of the resulting segments. This allows one to still satisfy Gauss’s law at distances greater than the separation scale of the charges, but one would begin to see deviations if one probed the system at energies above this scale. Translating this into the language of effective field theory, the resulting operator ${\mathcal{W}'}$ flows to the original gauge-invariant Wilson line ${\mathcal{W}}$ under the renormalization group. (This assertion relies on the fact that different line operators mix under renormalization group flow). The difference is thus undetectable in sufficiently low-energy states.

This construction has several interesting consequences. Perhaps the most general of which is that it reduces the problem of factorization to a problem in the UV. One could argue that this was foreshadowed by results from lattice gauge theory. As explained above, there is no gauge-invariant way to cut the gauge excitations (the loop operators, i.e., ${\mathcal{W}}$), and thus the microscopic Hilbert space does not factorize. It thus appears that AdS/CFT requires a different sort of UV cut-off than that provided by the lattice scale in the former example.

Another implication is that the bulk must contain charged fields which transform in the fundamental representation of ${U(1)}$. These are precisely the oppositely-charged operators formed at the cut. Since Wilson lines end on local operators, these imply the existence of local operators in the CFT which are charged under the symmetry generated by the current dual to the gauge field. And while the bulk effective field theorist can simply use the (original) Wilson line directly, the field theorist in the boundary must resort to using these operators. But the bulk charges can be quite heavy, provided their separation is smaller that this inverse mass. (Harlow demonstrates this in the course of proving that low-energy correlation functions are unaffected, via a “gauge-covariant operator product expansion” (or “string OPE”)). These heavy fields are dual to high-dimension operators in the CFT, and thus the aforementioned boundary theorist must consider high-energy states. This is another manifestation of the UV nature of the factorization problem as argued above. (Harlow resolves the tension between these high-energy states in the CFT and the low-energy effective field theory in the bulk by observing that the “high energy goes into creating a weakly curved background (away from black hole singularities), rather than any localized high-energy scattering process”).

As an aside, Harlow connects this construction to the weak gravity conjecture by observing that the charges cannot be so heavy as to form black holes. Indeed, a weakly-coupled gauge theory in the bulk requires the charges to be parametrically lighter than the Planck scale. This seems to demand the existence of a fundamentally-charged particle of mass ${m\lesssim q/\sqrt{G}}$, where ${q}$ is the gauge coupling. This may have implications for emergence, particularly in the gravitational case, but I shall not digress upon it here.

Harlow’s analysis demonstrates that in the case of ${U(1)}$ gauge theory, the factorization problem — in this case, the fact that there must exist a decomposition of the wormhole-threading Wilson line that respects the factorization into separately gauge-invariant CFTs on the boundary — can be completely resolved in effective field theory. One obtains an emergent gauge field at long distances, but a factorizable Hilbert space at short distances. But, as Harlow himself notes, this construction should not necessarily be taken to have ontic status. Indeed, the analagous problem in gravity does not appear amenable to the same mechanism, since we are restricted to ${m\ge0}$. Unless we wish to resort to tachyons, it is therefore unclear how to split the gravitational dressing in such a way as to preserve diffeomorphism invariance. Indeed, the situation in gravity is even worse: since any localized excitations (including operators with ${m=0}$) carry positive energy, their introduction violates Gauss’s law (more specifically, the Hamiltonian and shift constraints) in the entire spacetime. Thus, while Harlow is probably correct in asserting that “any solution to the factorization problem will teach us something about high-energy physics in the bulk” (due to the UV connection mentioned above; in the gravitational case, it presumably enters as some sort of short-distance modification of the constraints), I am less optimistic that the resolution in quantum gravity will be a straightforward generalization of the same mechanism.

That said, the underlying lesson above still holds: the Hilbert space of the CFT dual to the wormhole does have a tensor product structure, which must be respected by the physics in the bulk. Extrapolating this same reasoning to the gravitational case (ignoring certain technical differences), this ultimately implies that the bulk spacetime must emerge in a manner consistent with this factorization. This is perhaps the most concrete sense in which the problem of Hilbert space factorization is intimately connected with emergent spacetime.

This echos my own conclusions based on holographic shadows, insofar as the wormhole’s deep interior (which I shall define momentarily) is precisely analagous to the one-sided shadow regions described in the above-linked paper, in that it lies beyond the reach of any known bulk probe. The connection between the two can be seen by starting from the TFD, and sending in shocks to create a wormhole (note that in the discussion of Harlow’s work above, the “wormhole” consisted entirely of the bifurcation surface, and had no interior in the sense that we describe here). Upon sending in the second (suitably arranged) shockwave, we obtain a region which is causally disconnected from both boundaries. I refer to this as the wormhole’s deep interior. It shares the basic feature of both holographic shadows and precursors, in that the information about the spacetime therein must be encoded in a highly non-local (indeed, apparently non-causal) manner in the boundary. But relative to both shadows and precursors, addressing the question of reconstruction in the context of multi-shock wormholes has two advantages:

• Conceptually, it makes the problem as sharp as possible. In the case of both shadows and precursors, one might suppose that the information in the CFT is encoded non-locally, such that if one had perfect knowledge of the entire boundary — and perhaps some sophisticated quantum secret sharing scheme — one could in principle reconstruct regions arbitrarily deep in bulk, since these are still causally connected to the boundary. In contrast, the wormhole’s deep interior is disconnected from the boundary for all time, which forces us to consider more subtle or esoteric alternatives.
• By starting with a completely well-behaved geometry and perturbing it with thermal-scale operators, one might hope to track the breakdown of locality to some deeper feature, such as the connection to complexity proposed by Susskind and collaborators, rather than simply attributing it to the initial state of the system (e.g., the particular matter distribution that forms the shadow, the choice of precursor encoding).

However, the causal disconnection of the deep interior is a double-edged sword: despite these conceptual advantages, it is even less clear how one might relate the interior to elements in the CFT. One might imagine starting in the TFD with a Wilson loop that threads the bifurcation surface, but it’s not obvious that it would survive the highly boosted shockwaves that create the wormhole, and even less clear how to use it to construct the gravitational degrees of freedom of interest anyway. Nonetheless, insofar as the Wilson line serves as a diagnostic of connectivity in the bulk, some recent work has focused on attempting to isolate the associated gravitational degrees of freedom as those that “sew the wormhole together”. (On this note, the statement above that the deep interior is disconnected from the boundary for all time comes with the caveat that information about the connection (whatever this means) wasn’t destroyed or otherwise hidden by the shockwave). Another approach would be to understand the boundary dual of entwinement surfaces, and attempt to use them to probe within the wormhole—but while these penetrate holographic shadows, it’s not clear that the wormhole will swallow them. Yet a third method would be to connect with the holographic complexity proposals of Susskind and collaborators, but despite initial progress in defining complexity in field theories, we’re far from a useful CFT prescription.

Understanding the wormhole’s deep interior in the CFT is tantamount to isolating the degrees of freedom that sew the spacetime together. Thus one expects that their description in the CFT will tell us how this spacetime — i.e., gravity — emerges from elements in the boundary. This is the question to which we alluded above, when we claimed that Hilbert space factorization is intimately connected to emergent spacetime. But, while the ${U(1)}$ case merely illustrates that the bulk must somehow emerge in a manner consistent with the tensor product structure of the TFD, understanding precisely how this comes about — that is, how spacetime itself emerges, and not just the gauge fields — will require a solution of the factorization problem in the gravitational case (as well as, very likely, of the deeper problem we’ve been postponing; see below). Whether such a resolution amounts to solving quantum gravity, or only providing a crucial step along the way, remains to be seen.

Now, as alluded near the beginning of this post, one runs into trouble even before introducing gauge fields: Hilbert space factorization for free field theories is a lie. Indeed, it is generally believed (by the measure-zero subset of philosophically-inclined physicists who study the issue) that the description of observables as self-adjoint operators on Hilbert space is rigorously untenable. Rather, observables are identified as elements of an abstract ${C^*}$-algebra ${\mathcal{A}}$, on which states act as positive linear functionals. The pertinent issue is one of identifying — or, more to the point, localizing — degrees of freedom. Rather than associating a Hilbert space to spatial regions, as in the textbook approach above, one assigns algebras to regions, and recovers Hilbert space via something called the GNS construction. The problem is that the types of algebras that correspond to quantum field theories are Type III von Neumann algebras, which are characterized by an infinite degree of entanglement that prevents the Hilbert space from factorizing even before introducing gauge fields. (Incidentally, there has been some work along these lines in the present context, for example in lattice gauge theories by Casini and collaborators, who argued that a tensor factorization only exists in the case of local algebras with trivial center). Thus our reference to “the” factorization problem above was somewhat superficial: while adding gauge fields certainly compounds the problem, the latter remains in their absence. Such considerations suggest that in fact localizationis more fundamental than factorization; the former implies the latter. We shall return to this algebraic quantum field theory (AQFT) approach in other posts, but suffice to say one should be cautious of the foundation on which one builds. It may not be bedrock.

The importance of the factorization problem in this (greater) context can be seen, for example, in the Firewall paradox. In particular, most papers on the subject assume (implicitly, dare I say blithely), that the Hilbert space factorizes into an interior and exterior. I’ve expounded upon this issue elsewhere, so I won’t dwell upon it here. Suffice to say it touches on the question of ontology vs. epistemology which I’ve mentioned before, but the fact that Hilbert space factorization is an independently subtle issue makes it even more difficult to determine to which category a particular model of evaporation belongs.

Having said all that, my basic concern regarding Hilbert space factorization in quantum gravity can actually be summarized quite simply. In the emergent spacetime paradigm — as embodied by the It from Qubit collaboration — entanglement is thought to play a fundamental role in the emergence of gravity. And yet step zero of defining entanglement entropy, ${S=-\mathrm{tr}\left(\rho\ln\rho\right)}$, is the assumption of a factorized Hilbert space! But we know this doesn’t even hold for free fields, let alone in gauge or gravitational theories. Therefore, in the absence of a serious reconceptualization of one or both of the constituent theories (or a very clever generalization of Harlow’s emergent gauge field construction), the idea that “gravity emerges from quantum entanglement” rests on a contradiction.

It is worth pointing out that the factorization problem may not have a resolution within quantum field theory, perhaps due to these or other foundational issues. Indeed, Harlow states that it can’t be resolved in perturbative string theory, where “the gauge fields and gravity emerge together from a non-local theory.” Such a solution must exist if the arguments from AdS/CFT above hold true, but it is not clear how to describe the emergence of gauge fields in string theory in such a way as to allow it.

## Quantum 101

Reference: John Preskill’s notes on Quantum Information and Computation.

In quantum mechanics, a state is a complete description of a physical system. Mathematically, it is given by a ray (an equivalence class of vectors) in Hilbert space, ${\mathcal{H}}$, which is a vector space endowed with an inner product, and which is complete with respect to the norm induced by the latter. Simply put, Hilbert space is the abstract vector space in which quantum states “live”.

Hilbert spaces can be real or complex, finite- or infinite-dimensional; for definiteness we’ll assume complex, finite-dimensional Hilbert spaces here. The inner product is simply a map that associates an element of the field to pairs of elements in the vector space; in this case, ${\left<\cdot,\cdot\right>:\mathcal{H}\times\mathcal{H}\rightarrow\mathbb{C}}$. In Dirac’s bra-ket notation, vectors (states) in ${\mathcal{H}}$ are denoted ${\left|\psi\right>}$, and dual vectors (operators, which act as linear functionals on the states) are denoted ${\left<\psi\right|}$. The properties of the inner product ${\left\equiv\left}$ may then be written as follows:

1. Positivity: ${\left<\psi|\psi\right>\geq0}$, with equality iff ${\left|\psi\right>=0}$.
2. Linearity: ${\left<\phi\right|\left( a\left|\psi_1\right>+b\left|\psi_2\right>\right)=a\left<\phi|\psi_1\right>+b\left<\phi|\psi_2\right>}$.
3. Skew symmetry: ${\left<\phi|\psi\right>=\left<\psi|\phi\right>^*}$.

The inner product induces a norm, ${||\psi||=\left<\psi|\psi\right>^{1/2}}$, which defines the distance between states in ${\mathcal{H}}$. Any inner product space with such a distance function is a metric space, also known as a pre-Hilbert space. The aforementioned completeness criterion is what elevates a pre-Hilbert space to a Hilbert space: a pre-Hilbert space is complete if every Cauchy sequence converges with respect to the norm to an element in the space (intuitively, there are no “missing points”). The completeness criterion is important for infinite-dimensional Hilbert spaces, where it ensures the convergence of eigenfunction expansions that one encounters in, e.g., Fourier analysis.

Note that we are free to choose the normalization ${\left<\psi|\psi\right>=1}$, since this merely amounts to choosing a representative of the equivalence class of vectors that differ by a nonzero complex scalar. In this sense, both ${\left|\psi\right>}$ and ${e^{i\alpha}\left|\psi\right>}$ represent the same state; only relative phase changes between states in a superposition are physically meaningful.

Although states are the basic mathematical objects in this formalism, we never measure them. Rather, we measure observables, which are self-adjoint (a.k.a. Hermitian) operators that act as linear maps on states, ${A:\mathcal{H}\rightarrow\mathcal{H}}$. Such an operator has a spectral representation, meaning that its eigenstates form a complete orthonormal basis in ${\mathcal{H}}$. This allows us to write an observable ${A}$ as

$\displaystyle A=\sum_n a_n P_n~, \ \ \ \ \ (1)$

where ${P_n}$ is the orthogonal projection onto the space of eigenvectors with eigenvalue ${a_n}$ (such orthogonal projections can be proven to exist for complete inner product spaces, e.g., ${\mathcal{H}}$). In the simple case where ${a_n}$ is non-degenerate, ${P_n}$ is the projection onto the corresponding eigenvector, ${P_n=\left|n\right>\left. Of course, given the unit normalization above, the projection operators satisfy ${P_nP_m=\delta_{mn}P_n}$ and ${P_n^\dagger=P_n}$. (Note that the spectral theorem is more subtle for unbounded operators in infinite-dimensional spaces, but that will not concern us here).

The numerical result of a measurement in quantum mechanics is given by the eigenvalue of the observable in question, ${A}$. This implies that the system must, at the instant of measurement, be in an eigenstate of ${A}$ with the measured eigenvalue. If the quantum state immediately prior to a measurement is ${\left|\psi\right>}$, then the outcome ${a_n}$ is obtained with probability

$\displaystyle \mathrm{Prob}\left( a_n\right)=||P_n\left|\psi\right>||^2=\left<\psi|P_n\right>~, \ \ \ \ \ (2)$

and the normalized quantum state with eigenvalue ${a_n}$ is therefore

$\displaystyle \frac{P_n\left|\psi\right>}{\left<\psi|P_n\right>^{1/2}}~. \ \ \ \ \ (3)$

This is the point at which probability notoriously enters the picture; we’ll have more to say about this later. Note that, since the system is now in an eigenstate, immediately repeating the measurement will yield the same eigenvalue with probability 1.

The fact that the measurement process appears to induce such decisiveness on the part of the state leads to the notion of “wave function collapse”, which is a horribly misleading oversimplification to which we will return. Suffice to say that wave functions don’t collapse, but we’ve a bit more math to cover before the reality can be made precise.

So much for states and observables. What about dynamics? As in classical mechanics, the Hamiltonian ${H}$ is the generator of time translations, and its expectation value gives the energy of the state. The latter is a measurable quantity, which implies that in order to be a well-defined physical observable, the Hamiltonian operator must be self-adjoint, ${H^\dagger=H}$. By Stone’s theorem, the exponential of a self-adjoint operator is unitary; thus if ${U=e^{-iHt}}$, then ${U}$ is a bounded linear operator on ${\mathcal{H}}$ that satisfies ${U^\dagger U=1}$. Mathematically, this is why time evolution in quantum mechanics is unitary. Physically, this is simply the statement that time evolution preserves the inner product; i.e., that probabilities continue to sum to 1 (since, under time-evolution by a Hermitian operator ${U}$, ${\left\rightarrow\left=\left}$). Note that here we’re implicitly assuming that ${H}$ is time-independent, in order to write the time translation operator as the exponential thereof. For time-dependent cases, one can still show ${H^\dagger=H}$ perturbatively in ${t}$, it’s just less elegant.

Given such an operator ${U}$, the evolution of a state over some finite interval ${t}$ is unitary, and may be written

$\displaystyle \left|\psi(t)\right>=U(t)\left|\psi(0)\right>=e^{-iHt}\left|\psi(0)\right>~, \ \ \ \ \ (4)$

where in the second equality we’ve assumed ${H}$ to be time-independent. If we then consider an infinitesimal transformation ${\delta t}$ and expand both the left- and right-hand sides to first order, we have

$\displaystyle \left|\psi(\delta t)\right>=\left|\psi(0)\right>+\delta t\frac{\mathrm{d}}{\mathrm{d} t}\left|\psi(0)\right>=\left(1-iH\delta t\right)\left|\psi(0)\right>~. \ \ \ \ \ (5)$

Comparing terms at linear order, we recognize the Schrödinger equation,

$\displaystyle \frac{\mathrm{d}}{\mathrm{d} t}\left|\psi(t)\right>=-iH\left|\psi(t)\right>~, \ \ \ \ \ (6)$

which describes the evolution of states in the Schrödinger picture, wherein states are time-dependent while operators (including observables) are constants. This is precisely the opposite of the Heisenberg picture, wherein operators carry the time-dependence while states are constant. The two pictures are related by a change of basis, analagous to the relation between active and passive transformations. A third picture, the interaction picture, is often later introduced as a rather ham-fisted compromise between these two; it forms a fantastically successful premise for perturbation theory, but it doesn’t actually exist (see Haag’s theorem).

Note that unitary evolution, as encapsulated in the Schrödinger equation, is entirely deterministic: specification of an initial state ${\left|\psi(t)\right>}$ allows us to predict the state at any future time. But as described above, measurement is probabilistic: despite our infinite ability to predict future states, we cannot make definite predictions about measurement outcomes. One of the deepest (and most controversial) aspects of quantum mechanics is how deterministic evolution can nonetheless lead to probabilistic outcomes. Preskill quite aptly refers to this juxtaposition as a “disconcerting dualism”, and we shall return to it below.

Another interesting observation is that according to the Schrödinger equation, quantum mechanical evolution is linear, in contrast to the non-linear evolution often encountered in classical theories. This is tied up with the issue of probability above: probability theory is fundamentally linear. But the connection isn’t quite so straightforward.

As an aside, the linearity of quantum mechanics is why quantum chaos is so subtle. Naïvely, quantum systems should be incapable of supporting chaos, since small perturbations to the initial state don’t wildly change the evolution in the case of linear dynamics. However, two states which are close in Hilbert space can nonetheless yield wildly different measurements. Quantum chaos has important implications in a number of areas, particularly holography and black holes; but that’s a subject for another post.

So, to summarize, we’ve seen that in quantum mechanics, states are vectors in Hilbert space, observables are Hermitian operators, symmetries are unitary operators, and measurements are orthogonal projections.

Now here’s the kicker: everything we’ve said so far applies only to a single, isolated system. This is an idealization that simply does not exist. Another way to phrase this is that the formulation above only holds if applied to the entire universe. Even ignoring the issue of how one would make a measurement in such a scenario, this is clearly not a realistic description. In fact, it’s frankly wrong: in general (that is, when considering subsystems) states are not rays, measurements are not orthogonal projections, and evolution is not unitary!

The simplest extension of the above is to consider a bipartite system, the Hilbert space for which is a tensor product of the Hilbert spaces of the constituents, ${\mathcal{H}=\mathcal{H}_A\otimes\mathcal{H}_B}$. Given an orthonormal basis ${\{\left|i\right>_A\}}$ for ${\mathcal{H}_A}$ and ${\{\left|j\right>_B\}}$ for ${\mathcal{H}_B}$, an arbitrary pure state of ${\mathcal{H}_A\otimes\mathcal{H}_B}$ can be expanded as

$\displaystyle \left|\psi\right>_{AB}=\sum_{i,j}a_{ij}\left|i\right>_A\otimes\left|j\right>_B~, \ \ \ \ \ (7)$

where, by unitarity, the eigenvalues satisfy ${\sum_{i,j}|a_{ij}|^2=1}$. We’ve referred to this as a pure state in contrast to a mixed state; the former correspond to rays in the total Hilbert space, while the latter do not. This is the first crucial correction alluded to above.

Let us now consider an observable that acts only on subsystem ${A}$, ${M_A\otimes I_B}$. Its expectation value is

\displaystyle \begin{aligned} \left&={}_{AB}\left<\psi\right|M_A\otimes I_B\left|\psi\right>_{AB}\\ &=\sum_{mn}a_{mn}^*\left({}_A\left_A\otimes\left|j\right>_B\right)\\ &=\sum_{ijm}a_{mj}^*a_{ij}{}_A\left_A =\mathrm{tr}{M_A\rho_A}~, \end{aligned} \ \ \ \ \ (8)

where we’ve introduced the reduced density matrix

$\displaystyle \rho_A=\mathrm{tr}_B\left(\left|\psi\right>_{AB~AB}\left<\psi\right|\right)~. \ \ \ \ \ (9)$

In contrast to the trace, which is a scalar-valued function given by the sum of eigenvalues, the partial trace w.r.t. ${B}$ is an operator-valued function given by summing over the basis elements of ${B}$:

$\displaystyle \mathrm{tr}_B\left(\left|\psi\right>_{AB~AB}\left<\psi\right|\right) =\sum_j\big._B\left_{AB~AB}\left<\psi|j\right>_B =\sum_{ijm}a_{mj}^*a_{ij}\left|i\right>_{A~A}\left

The elegant expression for the expectation value ${\left}$ above then follows by the cyclic property of the trace.

The reduced density matrix will play a central role in what follows, so it’s worth elaborating the properties that follow from the definition above (in particular the explicit form (10)):

1. Hermiticity: ${\rho_A=\rho_A^\dagger}$.
2. Non-negativity (of its eigenvalues): ${\forall~\left|\psi\right>_A}$, ${\big._A\left<\psi\right|\rho_A\left|\psi\right>_A=\sum_j\big|\sum_ia_{ij}\big._A\left<\psi|i\right>_A\big|^2\ge0}$.
3. Unit norm: ${\mathrm{tr}{\rho_A}=\sum_{ij}|a_{ij}|^2=1}$ (since ${\left|\psi\right>_{AB}}$ is normalized).

As mentioned above, pure states are rays in Hilbert space, but mixed states are not. However, both are described by a reduced density matrix, which therefore provides a suitably general definition of quantum states. In the case of a pure state, ${\rho_A=\left|\psi\right>_A\big._A\left|\psi\right>}$, which is the projection operator onto the state (that is, onto the one-dimensional space spanned by ${\left|\psi\right>_A}$). The density matrix for a pure state is therefore idempotent, ${\rho^2=\rho}$. In contrast, for a general (mixed) state in the diagonal basis ${\{\left|\psi_a\right>\}}$,

$\displaystyle \rho_A=\sum_ap_a\left|\psi_a\right>\left<\psi_a\right|~, \ \ \ \ \ (11)$

where the eigenvalues satisfy ${0 and ${\sum_ap_a=1}$. It follows that a pure state has only a single non-zero eigenvalue, which must be 1, while a mixed state contains two or more terms in the sum (and ${\rho^2\neq\rho}$).

As alluded above, in a coherent superposition of states, the relative phase is physically meaningful (i.e., observable). This is merely a consequence of the linearity of the Schrödinger equation: any linear combination of solutions is also a solution. In contrast, the mixed state ${\rho_A}$ is an incoherent superposition of eigenstates ${\{\left|\psi_a\right>\}}$, meaning that the relative phases are experimentally unobservable. This gives rise to the concept of entanglement: when two systems ${A}$ and ${B}$ interact, they become entangled (i.e., correlated). This destroys the coherence of the original states such that some of the phases in the superposition become inaccessible if we measure ${A}$ alone. Henceforth we will reserve the unqualified “superposition” to refer to the former case.

We should note that probability again enters this updated picture when we consider that the expectation value of any observable ${M}$ acting on the subsystem described by ${\rho}$ is

$\displaystyle \left=\mathrm{tr}{M\rho}=\sum_ap_a\left<\psi_a|M\right>~, \ \ \ \ \ (12)$

which leads to the interpretation of ${\rho}$ as describing a statistical ensemble of pure states ${\left|\phi_a\right>,}$ each of which occurs with probability ${p_a}$. But we’re not quite ready to address the associated interpretive questions just yet.

As a concrete example, consider the spin state

$\displaystyle \left|\uparrow_x\right>=\frac{1}{\sqrt{2}}\left(\left|\uparrow_z\right>+\left|\downarrow_z\right>\right)~, \ \ \ \ \ (13)$

which is a (coherent) superposition of spins along the ${z}$-axis. Measuring the spin along the x-axis will result in ${\left|\uparrow_z\right>}$ or ${\left|\downarrow_z\right>}$ with probability ${\frac{1}{2}}$ each; e.g., from (2):

$\displaystyle \mathrm{Prob}\left(\uparrow_z\right)=||P_{\uparrow_z}\left|\uparrow_x\right>||^2 =\frac{1}{2}\left(\left<\uparrow_z|\uparrow_z\right>\right)^2=\frac{1}{2}~. \ \ \ \ \ (14)$

In contrast, the ensemble in which each of these states occurs with this probability is

$\displaystyle \rho=\frac{1}{2}\left(\left|\uparrow_z\right>\left<\uparrow_z\right|+\left|\downarrow_z\right>\left<\downarrow_z\right|\right)=\frac{1}{2}I~. \ \ \ \ \ (15)$

But since the identity is invariant under a unitary change of basis (${U^\dagger IU=I}$), we can obtain the state along an arbitrary axis ${\left|\psi(\theta,\phi)\right>}$ by applying a suitable unitary transformation to ${\left|\uparrow_z\right>}$ without changing the right-hand side. As a consequence, measuring the spin along any axis yields a completely random result:

$\displaystyle \mathrm{tr}{\left|\psi(\theta,\phi)\right>\left<\psi(\theta,\phi)\right|\rho}=\frac{1}{2}~. \ \ \ \ \ (16)$

In other words, we obtain spin up or down with equal probability, regardless of what we do. This is a reflection of the fact that the relative phases in a superposition are observable, but those in an ensemble are not. A mixed state can thus be thought of as an ensemble of pure states in many different ways, all of which are experimentally indistinguishable. (As an aside, further clarity on these relationships can be gained by studying the Bloch sphere, which I shall not digress upon here).

A bipartite pure state can be expressed in a standard form, which is often very useful. One begins by observing that an arbitrary state ${\left|\psi\right>_{AB}\in\mathcal{H}_A\otimes\mathcal{H}_B}$ may be expanded as

$\displaystyle \left|\psi\right>_{AB}=\sum_{i,j}a_{ij}\left|i\right>_A\left|j\right>_B=\sum_i\left|i\right>_A\left|\tilde i\right>_B~, \ \ \ \ \ (17)$

where ${\{\left|i\right>_A\}}$ and ${\{\left|j\right>_B\}}$ are the orthonormal bases defined in (7), and in the second equality we’ve defined a new basis ${\left|\tilde i\right>_B=\sum_ja_{ij}\left|j\right>_B}$. A priori, ${\{\left|\tilde i\right>_B\}}$ need not be orthonormal. However, since ${\{\left|i\right>_A\}}$ is, we are free to choose it such that ${\rho_A}$ is diagonal (cf. (11)), in which case we can write the reduced density matrix that describes subsystem ${A}$ alone as

$\displaystyle \rho_A=\sum_ip_i\left|i\right>_{A~A}\left

However, by definition (9), this is also equivalent to tracing out system ${B}$,

\displaystyle \begin{aligned} \rho_A&=\mathrm{tr}_B\left(\left|\psi\right>_{AB~AB}\left<\psi\right|\right) =\mathrm{tr}_B\left(\sum_{ij}\left|i\right>_A\left|\tilde i\right>_{B~A}\left_{A~A}\left_B\big._B\left_B =\sum_{ij}\big._B\left<\tilde j|\tilde i\right>_B\left|i\right>_{A~A}\left

And therefore, it must be the case that

$\displaystyle \big._B\left<\tilde j|\tilde i\right>_B=\delta_{ij}p_i \ \ \ \ \ (20)$

i.e., the new basis ${\{\left|\tilde i\right>_B\}}$ is orthogonal after all! Furthermore, by simply rescaling the vectors by ${p_i^{-1/2}\left|i\right>_B\equiv\left|i'\right>_B}$, we find that we can express the bipartite state (17) as

$\displaystyle \left|\psi\right>_{AB}=\sum_i\sqrt{p_i}\left|i\right>_A\left|j\right>_B=\sum_i\left|i\right>_A\left|\tilde i\right>_B~, \ \ \ \ \ (21)$

which is the Schmidt decomposition of the bipartite pure state ${\left|\psi\right>_{AB}}$ in terms of a particular orthonormal basis of ${\mathcal{H}_A\otimes\mathcal{H}_B}$. Note that our derivation was completely general; any bipartite pure state can be expressed in this form, though of course the particular orthonormal basis employed will depend on the state (that is, we can’t simultaneously expand ${\left|\psi\right>_{AB}}$ and ${\left|\phi\right>_{AB}}$ using the same orthonormal basis for ${\mathcal{H}_A\otimes\mathcal{H}_B}$).

Observe that by tracing over one of the Hilbert spaces in (21), we find that both ${\rho_A}$ and ${\rho_B}$ have the same nonzero eigenvalues, e.g.,

$\displaystyle \rho_B=\mathrm{tr}_A\left(\left|\psi\right>_{AB~AB}{\left<\psi\right|}\right)=\sum_ip_i\left|i'\right>_B\big._B\left

though since the dimensions of ${\mathcal{H}_A}$ and ${\mathcal{H}_B}$ need not necessarily be equal, the number of zero eigenvalues can still differ. The fact that ${\rho_A}$ and ${\rho_B}$ have no degenerate non-zero eigenvalues implies that they uniquely determine the Schmidt decomposition: one can diagonalize the reduced density matrices, and then pair up eigenstates with the same eigenvalue to determine (21). (There is still the potential for ambiguity in the basis if either ${\rho_A}$ or ${\rho_B}$ individually has degenerate eigenvalues—to wit, which ${\left|i'\right>_B}$ gets paired with which ${\left|i\right>_A}$).

The Schmidt decomposition is useful for characterizing whether pure states are separable or entangled (for mixed states, the situation is more subtle). In particular, the bipartite pure state above, ${\left|\psi\right>_{AB}}$ is separable iff there is only one non-zero Schmidt coefficient ${p_i}$. Otherwise, the state is entangled. If all the Schmidt coefficients are equal (and non-zero), then the state is maximally entangled. On account of this classification, it is common to associated a Schmidt number to the state ${\left|\psi\right>_{AB}}$, which is the number of non-zero eigenvalues (equivalently, the number of terms) in the decomposition (note that this implies the Schmidt number is a positive integer). Thus a pure state is separable iff its Schmidt number is 1. In this case we can write it as a direct product of states in ${\mathcal{H}_A}$ and ${\mathcal{H}_B:}$ ${\left|\psi\right>_{AB}=\left|\phi\right>_A\otimes\left|\chi\right>_B}$, which further implies that ${\rho_A=\left|\phi\right>_A\big._A\left<\phi\right|}$ and ${\rho_B=\left|\chi\right>_B\big._B\left<\chi\right|}$ are each pure. In contrast, an entangled state, with Schmidt number greater than 1, has no such direct product expression, in which case ${\rho_A}$ and ${\rho_B}$ are mixed.

Entanglement is quantified by the von Neumann entropy. Entanglement entropy is a tremendously rich topic in itself, to say nothing of its connections to other areas of physics, and thus we defer further discussion elsewhere.

To summarize, it is only in the case for idealized, isolated systems (i.e., the entire universe) that quantum states may be described by rays in Hilbert space. In reality, since we always deal with subsystems, states are given by (reduced) density matrices defined by tracing out the complement of the Hilbert space under consideration. (The Hilbert spaces themselves are associated with spatial regions, and thus what we mean is that we trace over all degrees of freedom localized in the complement of our subregion. As mentioned elsewhere however, this is generally still too naïve).

It remains to justify our earlier claim that generic measurements are not orthogonal projections, and evolution non-unitary. In the course of doing so, we shall resolve Preskill’s “disconcerting dualism” between determinism and probability, and explain why the notion of wave-function collapse is an illusion. But this will require us to develop slightly beyond the basic mathematical machinery above, and as such we partition the discussion into Part 2.

## Generalized gravitational entropy

The derivation of Ryu-Takayanagi (RT) put forward by Lewkowycz and Maldacena (LM) is essentially an extension of the (boundary) replica trick into the bulk. The basic idea of the replica trick is that entanglement entropy is generally hard to calculate, but Rényi entropies are comparatively easy. The latter are defined as

$\displaystyle S_n=\frac{1}{1-n}\ln\mathrm{tr}{\rho^n}~, \ \ \ \ \ (1)$

where ${\rho}$ is the reduced density matrix associated with the boundary subregion ${A}$ under consideration, whose bulk minimal surface we wish to use in RT. The procedure for obtaining the entanglement entropy is to make ${n}$ copies of the original manifold ${\mathcal{M}}$, and glue them together cyclically along the cuts formed by the region ${A}$; the resulting manifold is called the ${n}$-fold cover, ${\mathcal{M}_n}$, and the ${n}$ copies of region ${A}$ carry entropy ${S_n}$. The entanglement entropy associated to the region ${A\subset\mathcal{M}}$ is then recovered in the limit as ${n\rightarrow1}$:

$\displaystyle \lim_{n\rightarrow1}S_n=-\frac{\mathrm{tr}{\rho^n\ln\rho}}{\mathrm{tr}{\rho^n}}\bigg|_{n=1}=-\mathrm{tr}{\rho\ln\rho}=S~. \ \ \ \ \ (2)$

Note that in the first equality, the use of l’Hospital’s rule is justified since ${\mathrm{tr}{\rho}=1}$.

We’ve illustrated the ${n}$-fold cover below for ${n=4}$. Each blue sheet represents a copy of the original boundary manifold ${\mathcal{M}}$, which we’ve glued together along the cuts ${A}$ as follows: in the original Euclidean path integral, we had ${\tau\sim\tau+2\pi}$ at each of the two boundary points of ${A}$. Now, going to ${2\pi}$ takes us to the second copy, ${4\pi}$ to the third, ${6\pi}$ to the fourth, and finally ${8\pi}$ back to the first; that is, ${\mathcal{M}_n}$ has a ${Z_n}$ symmetry that takes ${\tau\sim\tau+2\pi n}$.

The density matrix is defined via the usual Euclidean path integral:

$\displaystyle \rho=\frac{1}{Z}\int\mathcal{D}\phi e^{-S_E}~,\;\;\;Z=e^{-\beta \hat H}~, \ \ \ \ \ (3)$

where ${S_E}$ is the Euclidean action on ${\mathcal{M}}$ and ${Z}$ is the thermal partition function at inverse temperature ${\beta}$, with time-evolution operator ${\hat H}$. Taking ${n}$ copies and computing the trace (i.e., integrating over the fields, with the aforementioned boundary conditions) then yields

$\displaystyle \mathrm{tr}{\rho^n}=\frac{Z_n}{Z^n}~, \ \ \ \ \ (4)$

where we’ve denoted the partition function on the ${n}$-fold cover by ${Z_n}$, and the denominator ensures that the normalization is preserved. Substituting this into the above formula for the Rényi entropy, we have

$\displaystyle S_n=\frac{1}{1-n}\left(\ln Z_n-n\ln Z\right), \ \ \ \ \ (5)$

which will prove more tractable in the manipulations to come.

So far we’ve dealt only with the boundary. Now we wish to move into the bulk. To do so requires finding the bulk solution whose boundary is ${\mathcal{M}_n}$. In general, there may be multiple such solutions; we’ll proceed with the dominant saddle point ${B_n}$. We then move into the bulk through the differentiate dictionary, which equates the bulk (that is, the on-shell bulk action at large ${N}$) and boundary partition functions. Expanding the partition function on ${\mathcal{M}_n}$ in the saddle-point approximation, we therefore have

$\displaystyle Z_n\equiv Z[\mathcal{M}_n]=e^{-S[B_n]+\ldots} \ \ \ \ \ (6)$

where the ellipsis denotes both subleading saddles and ${1/N}$ corrections; we’ll drop these henceforth.

At this point we must address a subtlety lurking in the above, namely: all this ${n}$-fold cover business assumes ${n\in\mathbb{Z}_+}$, and hence the ${n\rightarrow1}$ limit obviously requires that we analytically continue to ${n\in\mathbb{R}}$. But for non-integer ${n}$, ${Z_n}$ can generally not be written as a partition function with a local action. The reason is that for integer ${n}$, the fixed points on the boundary of the orbifold ${\mathcal{M}_n/Z_n}$ are simply ${\partial A}$, from which we can extend ${Z_n}$ locally into the bulk as usual; but for non-integer ${n}$, we have no regular orbifold structure.

There are two options for proceeding. One is to first calculate ${Z_n}$ for integer values and attempt to analytically continue the result, but this is generally hard. An easier alternative — and the key to the LM derivation — is to do the analytic continuation in the bulk instead. This relies on the observation that ${\mathcal{M}_n}$ has a ${Z_n}$ symmetry that cyclically permutes the ${n}$ replicas, and the assumption that this symmetry extends into the bulk to the dominant saddle ${B_n}$. In other words, rather than considering the boundary orbifold ${\mathcal{M}_n/Z_n}$, we instead consider the bulk orbifold ${B_n/Z_n\equiv\hat{B}_n}$, which is regular everywhere except at the fixed points of the ${Z_n}$ symmetry.

It is important to note that ${\hat{B}_n\neq B_1\equiv B}$. This can be seen in the illustration below. The left-most image is the original geometry, with a “bump” for illustration. In the middle image, we’ve cut the boundary ${\mathcal{M}}$, made ${n=3}$ copies, and glued them together to form the ${n}$-fold cover ${\mathcal{M}_n}$, with bulk dual ${B_n}$. The dotted lines illustrate the ${Z_n}$ symmetry, by which we orbifold to create the right-most image. Now one can see that although the boundary orbifold is simply the original manifold (the endpoints are identified), the same is not true in the bulk due to the conical defect. Imagine folding a piece of paper into a cone; away from the tip, it’s still locally flat, but one can detect the presence of the conical singularity by performing a parallel transport around the axis of symmetry. In this sense, the conical defect has a global effect on the geometry, and this is what prevents us from identifying ${B_n/Z_n\equiv\hat B_n}$ with ${B}$.

These fixed points form a codimension 2 surface with a conical deficit of ${2\pi-2\pi/n}$; we shall denote this surface ${C_n}$. As mentioned above, the fixed points on the boundary orbifold are simply ${\partial A}$, so this is where ${C_n}$ is anchored on the boundary. (Note that the surface ${C_n}$ is the analogue of the ${U(1)}$ fixed point in the original Gibbons-Hawking analysis).

Before continuing, let’s pause to observe why this procedure allows us to circumvent the restriction on naïve analytic continuation mentioned above, namely that we could not express ${Z_n}$ in terms of a local action. The boundary of ${\hat{B}_n}$ is the boundary orbifold ${\mathcal{M}_n/Z_n}$, which is simply the original manifold ${\mathcal{M}}$, with fixed points given by the boundary of the region ${A}$. The ${Z_n}$ symmetry therefore acts on the boundary of ${\hat{B}_n}$ as ${\tau\rightarrow\tau+2\pi}$, where ${\tau}$ is the angular coordinate around ${\partial A}$ (that is, ${\tau}$ is simply the thermal time in the Euclidean path integral, which is of course periodic), and there is no obstruction to locally extending the ${\tau}$ coordinate into the bulk such that the ${Z_n}$ symmetry acts on ${\hat{B}_n}$ in the same way. The fixed points of the action of ${Z_n}$ on ${\hat{B}_n}$ give us ${C_n}$ as described above. The key is that, since we can locally extend the symmetry from the boundary of the original manifold, it’s no longer necessary to think of ${C_n}$ as the ${Z_n}$ orbifold of some regular geometry, and thus we’re free to analytically continue away from integer ${n}$.

The upshot of all this is that the ${Z_n}$ symmetry allows us to write

$\displaystyle S\left[B_n\right]=nS[\hat{B}_n]~, \ \ \ \ \ (7)$

simply because, by extending the ${Z_n}$ symmetry in the above manner, we’re guaranteed that the contribution from the dominant saddle point of the ${n}$-fold cover, ${S\left[B_n\right]}$, is simply ${n}$ times that from the orbifold, ${S[\hat{B}_n]}$. This expression is useful because, in conjunction with the above expression for the partition function (dropping the higher order terms), we may write the Rényi entropy as

$\displaystyle S_n=\frac{n}{n-1}\left( S[\hat{B}_n]-S[B]\right) \ \ \ \ \ (8)$

where ${B\equiv \hat{B}_1}$ is simply the original bulk dual of ${A}$.

It now remains to analytically continue ${\hat{B}_n}$ to non-integer ${n}$, so that we can take the ${n\rightarrow1}$ limit of this expression. In the process, we shall see that ${C_n}$ is precisely the minimal bulk surface associated with the original region ${A\subset\mathcal{M}}$, which therefore enables us to prove RT. There are a couple ways to find the analytic continuation of ${\hat{B}_n}$, but here we will follow the so-called squashed cone method.

To begin, we choose a set of local coordinates such that ${\rho}$ parametrizes the radial (minimal) distance to ${C_n}$ from the boundary, with angular coordinate ${\tau}$:

$\displaystyle \mathrm{d} s^2=\rho^{-2\epsilon}\left(\mathrm{d}\rho^2+\rho^2\mathrm{d} \tau^2\right)+\left( g_{ij}+2K_{aij}x^a\right)\mathrm{d} y^i\mathrm{d} y^j+\ldots \ \ \ \ \ (9)$

where ${a,b,\ldots}$ are indices in the ${\left(\rho,\tau\right)}$ plane orthogonal to ${C_n}$, while ${i,j,\ldots}$ are indices along ${C_n}$. ${K_{aij}}$ is the extrinsic curvature tensor of ${C_n}$. The ellipsis denotes terms higher order in ${\rho}$, which are subleading near ${C_n}$ (since ${\rho\rightarrow0}$ there). This is called the squashed cone because the first term resembles the line element in polar coordinates (the “cone”; recall the familiar Euclidean black hole geometry), but the second term, with the index ${a}$ running over ${\tau}$, breaks the U(1) symmetry (the “squashed”).

In these coordinates, one sees that the conical deficit at ${\rho=0}$ is ${2\pi\epsilon}$ as follows: rewriting the metric in terms of the proper distance ${r}$, we have

$\displaystyle \rho^{-\epsilon}\mathrm{d}\rho=\mathrm{d} r\implies \frac{\rho^{1-\epsilon}}{1-\epsilon}=r\implies \mathrm{d} s^2=\mathrm{d} r^2+r^2(1-\epsilon)^2\mathrm{d}\tau^2+\ldots \ \ \ \ \ (10)$

hence we must identify

$\displaystyle \tau\sim\tau+\frac{2\pi}{1-\epsilon} \ \ \ \ \ (11)$

and thus, Taylor expanding around ${\epsilon=1}$, the deficit angle is

$\displaystyle 2\pi\left(1-\frac{1}{1-\epsilon}\right)= 2\pi\epsilon+O\left(\epsilon^2\right)~. \ \ \ \ \ (12)$

However, we know from the general considerations above that the deficit angle must be ${2\pi-2\pi/n}$ for ${n\in\mathbb{Z}}$, and therefore (to leading order) we must have ${\epsilon=1-1/n}$.

To find ${\hat{B}_n}$, one solves the bulk equations of motion with the unconventional IR boundary condition that the metric should resemble the form above near ${C_n}$. (The boundary condition is “unconventional” because we normally impose a UV boundary condition as per AdS/CFT; in this case however, the boundary of ${\hat{B}_n}$ is of course ${\mathcal{M}}$). In the course of doing so, one finds that, in complex coordinates ${z\equiv\rho e^{i\tau}}$, the ${zz}$-component of the Einstein equation is

$\displaystyle R_{zz}=2K_z\frac{\epsilon}{z}+\ldots~, \ \ \ \ \ (13)$

where ${K_z=K_{zij}g^{ij}}$ is the trace of the extrinsic curvature. The first term is clearly divergent as ${\rho\rightarrow0}$, while the remaining terms are higher-order in ${\epsilon}$ (and hence less divergent in this limit).

Now, the stress tensor from the matter (i.e., bulk) sector should be finite—it’s regular at integer ${n}$ because ${B_n}$ is regular, and as per our discussion above, this well-behavedness should be preserved under ${Z_n}$. Hence, since the l.h.s. is finite, the ${1/z}$ divergence on the r.h.s. must vanish. This implies that we must have

$\displaystyle K_z=0 \ \ \ \ \ (14)$

in the ${n\rightarrow1}$ limit. But this is precisely the condition for a minimal surface! And since this condition is satisfied when ${\rho=0}$, we conclude that ${C_n}$ is indeed the minimal surface associated to the boundary region ${A}$.

We now have all the ingredients in place to prove RT, but we must perform one final computation. Upon taking the ${n\rightarrow1}$ limit of ${S_n}$, we have

\displaystyle \begin{aligned} \lim_{n\rightarrow1}S_n&=\lim_{n\rightarrow1}\frac{n}{n-1}\left( S[\hat{B}_n]-S[B]\right)\\ &=\left[S[\hat{B}_n]-S[B]+n\left(\partial_nS[\hat{B}_n]-\partial_nS[B]\right)\right]\bigg|_{n=1}\\ &=\partial_nS[\hat{B}_n]\bigg|_{n=1}=S~, \end{aligned} \ \ \ \ \ (15)

and thus we must calculate the variation of the action to leading order in ${(n-1)}$. (In going to the second line, l’Hospital’s rule is justified since ${\hat B_1=B_1\equiv B}$). This will introduce boundary terms, which — as in the aforementioned Gibbons-Hawking result — turn out to be all-important.

We will not compute these boundary terms explicitly here; one can find the analysis in the LM paper. Rather, we will present the following simple heuristic argument offered by Dong. At ${n=1}$ there is no conical defect, and therefore the only contribution must be from boundary terms in the action. Since ${C_1}$ is non-singular, it gives no contribution, and hence we should excise a small region around ${C_1}$, thereby introducing a boundary which is precisely the area of ${C_1}$. Thus,

$\displaystyle S\sim\mathrm{Area\left( C_1\right)}~, \ \ \ \ \ (16)$

where, of course, fixing the constant of proportionality requires performing the explicit computation. Upon fixing this, and using the above fact that ${C_1}$ is precisely the minimal surface associated to the region ${A}$, one indeed obtains RT.

If the above argument about excising ${C_1}$ seems sketchy, consider again the expression ${S[B_n]=nS[\hat B_n]}$. The l.h.s. is the bulk dual of the entire ${n}$-fold cover ${\mathcal{M}_n}$, and is therefore regular everywhere; in particular, it has no conical deficit. Thus, if we want this expression to hold, the r.h.s. cannot include any contribution from ${C_n}$ either.

Posted in Physics | 1 Comment

## Hawking pairs and ontic toys

The most widely known picture of Hawking radiation involves the creation of a particle-antiparticle pair at the horizon. At face value, this seems natural enough: we know from QFT that the vacuum is hardly vacuum at all, but instead a writhing sea of virtual particle excitations. They key word is “virtual”: in technical terms, these are off-shell contributions to Feynman diagrams that one requires to get the right answer, but whose ontology is altogether less clear. The reason the vacuum still looks “empty” despite all this seething (indeed, infinite!) activity is that these particles never go on-shell—that is, they never contribute to anything we can actually measure. The presence of a horizon changes this.

One technical note before we proceed: pair creation can occur whenever the incoming energy is at least equal to the total rest mass of the two particles being created. For example, an electron has a mass of about 511 eV, so any interaction that pumps in at least 1022 eV could create an electron-positron pair—both with equivalent, positive rest-mass energy. Note that this is an energy-conserving process. In virtual pair production, wherein we start with zero energy and fluctuate off-shell, one of the partners must therefore carry negative energy. This will play only a minor clarifying role in the following cartoon, but it’s worth bearing in mind if you’re concerned about the details.

Now, suppose a virtual pair fluctuates into existence at the horizon of a black hole, such that one partner is trapped inside the horizon while the other is formed outside. The exterior particle could still fall in and annihilate with its partner, but it’s also possible for tidal forces to increase their separation so much that both particles go on-shell. That is, the black hole prevents them from recombining and annihilating, with the result that two real particles are created. Since this is an energy conserving process, and we started with vacuum, one particle has positive energy — which escapes to infinity and contributes to the Hawking radiation — while the other has a negative energy of equal magnitude—which falls into the black hole, and thereby decreases its mass.

Why did we posit that the positive energy particle escaped, rather than that with negative energy? The answer has to do with the thermodynamic instability of black holes. Their specific heat is negative, and therefore absorbing energy causes it to increase size and become colder; this makes it thermodynamically favourable to absorb even more energy from the ambient bath, and so on; a similar uncontrolled spiral occurs in the opposite direction. Therefore, if the black hole were to spontaneously emit negative energy (equivalent to absorbing some positive mass), its temperature would decrease, causing the emission of ever-more negative energy. With nothing to stop it, it would eventually radiate a magnitude of energy far greater than that which it originally possessed—a violation of energy conservation on the grandest scales. The situation is neatly resolved by allowing the outgoing partner mode to carry only positive energy when it goes on-shell; the corresponding energy of the black hole it carries away in the process is compensated for by the negative energy mode that fell in. The process has a natural limit when all the mass of the black hole has been carried away, whereupon the hole has completely evaporated (modulo certain quantum-gravitational concerns at the end-point, which are irrelevantly beyond our current scope). Thus the book-keeping works out naturally, and we have a convenient mental image to help our primitive minds grasp such an esoteric concept.

Hawking radiation à la pair production is what’s known in physics as a “toy model”, where the adjective emphasizes the fact that it isn’t intended to have ontic status, but merely to elucidate certain aspects of a problem in a more controlled manner. This is immensely useful in a large number of fields, where the full theory may be unsolvable with current techniques, but where certain simplified — and explicitly solvable — models that capture certain core features can still be used to gain a great deal of insight. The catch lies in bearing in mind which features of the model reflect those in the underlying (real, physical) theory, and which are purely epistemic.

The pre-Copernican model of the solar system is a good example. The geocentric model of nested crystalline spheres was purely epistemic: it sufficed to make predictions of eclipses and the like, and indeed its utility in this regard was one of the reasons it was so hard to overthrow (sure, Mars was a bit tricky to get right, but what’s a few more epicycles?). But it had no ontic value: for all its predictive power, it bore no resemblance to reality. Its fundamental explanations were ontologically wrong.

A toy model denotes something which, from the start, we intend to be purely epistemic, however accurate its reflection of certain physical features. The purpose of science — the growth of knowledge — is in ever-amending our imperfect models to agree more closely with reality—in maximizing the ratio of ontic to epistemic, if you will. There are no ontic toys.

Pair production, as a model of Hawking radiation, is precisely such a purely epistemic model. In Susskind’s words, it’s merely “a cartoon Hawking invented to explain his calculation to children.” The actual calculation is performed in momentum space, and in Hawking’s original work relies on a Bogoliubov transformation between ingoing and outgoing modes in the presence of collapsing matter. Although one begins in vacuum (plus the mass that creates the black hole), the final state does not correspond to the initial state due to the large blueshift caused by the collapsing body. Thus in computing the expectation value of the number operator of outgoing modes, one finds (given certain assumptions, such as adiabaticity) it to have a thermal spectrum with temperature ${T=\kappa(2\pi)^{-1}}$.

This thermal radiation must correspond with the emission of physical particles, hence the interpretation of Hawking “pairs” above. In particular, an observer at infinity decomposes the scalar field into ingoing and outgoing modes. Information about the former are lost behind the horizon, which leads to a thermal spectrum precisely as in the case of a Rindler observer. An infalling observer, in contrast, would make no such discontinuous decomposition—her modes are continuous when propagated back across the horizon. They suffer a blueshift relative to her position, but are still purely positive frequency; they do not lead to thermal spectrum, and she observes no particle creation.

A great deal of literature/debate on black holes, particularly in the context of firewalls, relies on the notion — closely associated to the analysis above — of pairwise entangled modes. Note the importance of distinguishing “mode” and “particles” here. The former is perfectly fine, and indeed other arguments — such as the requirement that the vacuum remain smooth across Rindler horizons — demand that the modes be pairwise entangled. But particles aren’t, hence the cartoonish nature of interpreting Hawking pairs in this manner. The reason for this is that the analysis above is performed in free field theory, in which an ${n}$-particle state

$\displaystyle |p_1,\ldots p_n\rangle=a^\dagger(p_1)\ldots a^\dagger (p_n)|0\rangle \ \ \ \ \ (1)$

is sharply localized in momentum space, but completely delocalized in position space. One builds localized particle states by constructing wavepackets, which are usually Gaussian integrals over momentum space. The idea of particles as local excitations of the field is therefore technically rather inaccurate.

However, while from a field-theoretic perspective, treating the Hawking radiation in terms of pairwise entangled modes appears entirely kosher, there are other reasons to believe that the notion of eternally persisting (at least up to the singularity), pairwise entangled modes should be modified by interactions, or else break down in some other (perhaps highly non-local) way. Perhaps the simplest is to realize that the Hawking modes have wavelength ${\sim M^{-1}}$. Individual modes simply can’t be localized within a Schwarzschild radius from the horizon (and it’s rather difficult to imagine how the interior mode can fit at all). Of course, as Freivogel has pointed out, it is possible to localize wave packets within the zone, and even to entangle them across Rindler horizons, but the implications for Hawking radiation in this case are less immediately clear (the radiation does not, as far as we know, come out in conveniently localized Gaussian packets). However, there is a deeper, non-model-specific reason to suspect that the pair picture breaks down, which goes by the name of “scrambling.”

In basic terms, scrambling refers to the chaotic loss of information by a complex system. The crucial modifier is “chaotic”. There is no information loss in any fundamental sense. There’s no loss of unitarity when a butterfly flaps its wings to create a hurricane, but tracking backwards to find the initial perturbation is practically impossible, simply because the system is chaotic. More generally, consider a complex chaotic system with ${N}$ degrees of freedom. One can compute the reduced density matrix for any subsystem ${n<, which will approach thermal equilibrium as the system thermalizes. This is just the statement that entropy approaches its maximum value. We say that the total system has “scrambled” when any subsystem with ${n has maximum entanglement entropy. The reason for the terminology is that at this point, no information is recoverable from less than half the total degrees of freedom.

Note that this is subtly but crucially different from saying that the total system has thermalized. The latter implies a complete loss of correlations, i.e. an exactly Planckian spectrum. Scrambling merely implies that any information in the initial pure state is delocalized over at least half the system, but is in principle still recoverable. Thermalization implies a loss of unitarity; scrambling does not.

Why half? The reason goes back to the work of Page, who showed that any subsystem with ${n will look approximately thermal. In other words, for generic systems (insert details about canonical ensembles and whatnot here), any less-than-half portion of the system contains no information. Scrambling is merely the state at which this becomes exactly (rather than approximately) true.

Suppose you make a small perturbation to a scrambled system by adding a single degree of freedom. If you try to measure it — to recover the information — immediately afterwards, you only need to measure a single degree of freedom. But if you wait a short time, the information begins to diffuse, until soon the system has returned to a scrambled state, and recovering the information about that initial one-bit perturbation will require a very non-local measurement. The time you have to wait before information becomes scrambled is called the scrambling time, denoted ${t_*}$, and depends on the system under study. But the key point for our discussion is that black holes are the fastest scramblers in the universe, with ${t_*\sim\beta\ln S}$ (where ${\beta}$ is the inverse Hawking temperature, and the entropy ${S\sim N}$). In such a system, every degree of freedom is directly coupled to every other, so that information diffuses maximally rapidly. (Several papers by Susskind and collaborators discuss this in more detail).

For a black hole, ${S\sim M^2\implies t_*\sim M\log M}$. This is the amount of time it takes for the infalling partner mode to become entangled with the entire black hole. This is fast, about ${0.0004}$ seconds for a solar mass black hole. Clearly, it makes no sense to speak of pairwise entangled particles for more than an instant, let alone over the lifetime of the black hole, against which the current age of the universe is nothing. Therefore the outgoing Hawking mode must be entangled, not with its partner, but with the entire black hole.

Except that since we’re working in free field theory, this can’t happen. The modes are pairwise entangled across the horizon, and remain so as they propagate blithely out to infinity (or into the singularity). No one knows exactly how black holes scramble information, but it doesn’t appear consistent with this picture.

A key point in this issue is the question of Hilbert space factorization (which suffers its own deep troubles). A 2011 paper by Mathur and Plumberg provides a prime example. For the black hole, the authors explicitly assume a Hilbert space factorization of the form

$\displaystyle \mathcal{H}=\mathcal{H}_M\otimes\mathcal{H}_P~, \ \ \ \ \ (2)$

where ${\mathcal{H}_M}$ is the Hilbert space of the initial (pure state) matter that formed the hole (equivalently, the hole before any Hawking pairs are emitted), and ${\mathcal{H}_P}$ is the Hilbert space of created pairs. The states in this latter space are of the form

$\displaystyle |\Psi\rangle=\frac{1}{2^{n/2}}\prod_{i=1}^n\left(|0\rangle_{c_i}|0\rangle_{b_i}+|1\rangle_{c_i}|1\rangle_{b_i}\right)~, \ \ \ \ \ (3)$

where the product is a tensor product over created Bell pairs of entangled interior (${c}$) and exterior (${b}$) modes. After ${n}$ pairs are created, the entanglement between the exterior modes ${b_i}$ and the interior, which consists of ${M}$ and ${c}$, is

$\displaystyle S=n\ln 2~, \ \ \ \ \ (4)$

which grows linearly with the number of emitted modes, in accordance with the (wrong) curve on the Page diagram, and the concomitant information paradox.

The problem with this model is that, even if one assumes such a fictitious factorization of the Hilbert space, due to scrambling one expects the total Hilbert space to be something like

$\displaystyle \mathcal{H}=\mathcal{H}_{\tilde M}\otimes\mathcal{H}_E~, \ \ \ \ \ (5)$

with ${\mathcal{H}_{\tilde M}}$ subsuming the interior partner, while ${\mathcal{H}_E}$ contains the exterior mode. States in this total Hilbert space have the form

$\displaystyle |\Psi\rangle=\frac{1}{\sqrt{2}}\left(|M0\rangle|0\rangle_{b_i}+|M1\rangle|1\rangle_{b_i}\right) \ \ \ \ \ (6)$

where ${|M0\rangle,|M1\rangle\in\mathcal{H}_{\tilde M}}$ represent the state of the black hole where the partner mode is either 0 or 1, respectively. Crucially, black holes states in this factorization contain one-less bit than before the photon was emitted (since one bit moved from ${\mathcal{H}_{\tilde M}}$ to ${\mathcal{H}_E}$). Hence at both the start and end of evaporation, the entropy will be zero, since all the bits are in the same place (in or out of the hole, respectively). Thus one obtains the correct Page curve, which maxes-out at half the lifetime of the black hole and then decreases again to zero. Mathur and Plumberg actually obtain this behavior with a model of a burning piece of paper, where one has (e.g., kinetic) interactions between molecules to modify the entanglement structure appropriately. Intuitively, scrambling must do something similar, but free field theory seems to prohibit it.

Of course, there are many more subtle and sophisticated arguments for the existence of firewalls, particularly in AdS/CFT, and it’s not obvious that a modification of the entanglement structure via scrambling or the like will be sufficient to resolve the paradox. However, the pair-production picture of Hawking radiation is notorious for causing more problems than it solves. One must clearly bear in mind the implicit assumptions beneath any physical model, and beware any theory that takes its ontic toys too seriously.

Posted in Philosophy, Physics | 3 Comments

## Action integrals and partition functions

There’s a marvelous — and by now quite well-known — paper (paywalled) by Gibbons and Hawking, in which they compute the entropy of black holes from what is essentially a purely geometrical argument. This relies on the fact that the partition function has an expression in terms of both a path integral and a statistical ensemble. The former allows one to solve for the gravitational action, and the latter endows this with a standard thermodynamic interpretation.

In the path integral formulation, one expresses the generating functional of correlation functions as

$\displaystyle Z=\int\mathcal{D}g\mathcal{D}\phi e^{iI[\phi]} \ \ \ \ \ (1)$

where ${I}$ is the action functional of the fields ${\phi}$ (to avoid confusion, we’ll use ${I}$ for action in order to reserve ${S}$ for entropy). Gibbons and Hawking begin by pointing out that, in the case of black holes, the presence of spacetime singularities prevents one from evaluating the action. However, one can side-step this difficulty by Wick rotating to Euclidean signature, whereupon the geometry pinches off smoothly at the event horizon, thus providing one with a non-singular, compact manifold on which to evaluate the action. Let’s see how this works.

In ${3+1}$ dimensions, the gravitational action is

$\displaystyle I=\frac{1}{16\pi}\int\mathrm{d}^4x\sqrt{-g}R~. \ \ \ \ \ (2)$

However, the Ricci scalar ${R}$ contains second-order derivatives with respect to the metric:

$\displaystyle R=2g^{\mu\nu}\left(\Gamma^\rho_{\mu\left[\nu,\rho\right]}+\Gamma^\sigma_{\mu\left[\nu\right.}\Gamma^\rho_{\left.\rho\right]\sigma}\right)~,\;\;\; \Gamma_{\rho\mu\nu}=\frac{1}{2}\left( g_{\rho\mu,\nu}+g_{\rho\nu,\mu}-g_{\mu\nu,\rho}\right)~, \ \ \ \ \ (3)$

which implies that the action suffers from an Ostrogradski instability, and is thus unsuitable for the path integral approach. One can remedy this via partial integration, but that requires that we properly account for boundary terms. A more complete expression for the above action is therefore

$\displaystyle I=\frac{1}{16\pi}\int_M\mathrm{d}^4x\sqrt{-g}R+\frac{1}{8\pi}\int_{\partial M}\mathrm{d}^3x\sqrt{-h}K~. \ \ \ \ \ (4)$

where ${M}$ is the spacetime manifold with boundary ${\partial M}$. The second term is known as the Gibbons-Hawking-York boundary term, where ${h_{\mu\nu}}$ is the induced metric on ${\partial M}$ and ${K}$ is the trace of the second fundamental form. (Recall that the first fundamental form is the inner product induced on the tangent space of a surface ${\mathcal{S}}$ in ${\mathbb{R}^3}$ by the dot product on the latter. The second fundamental form is a quadratic form on the tangent plane of ${\mathcal{S}}$. Collectively these allow the definition of extrinsic curvature invariants.)

Two quick technical notes are in order. First, in the above expression, we have the freedom to add a constant ${C}$ to the boundary term that depends only on the induced metric ${h_{\mu\nu}}$. However, since this is independent of ${g_{\mu\nu}}$, it can be absorbed in the normalization of the measure on the space of all metrics. For convenience, we choose this constant so that in the asymptotically flat spacetimes with which we’re concerned, ${I=0}$ for the flat-space metric ${\eta_{\mu\nu}}$. Therefore ${K}$ should be understood to mean the difference in the trace of the second fundamental form of ${\partial M}$ in the metrics ${g_{\mu\nu}}$ and ${\eta_{\mu\nu}}$.

Secondly, the second fundamental form is a ${(0,2)}$-tensor, and hence does not have a trace in the strictest sense (since the trace should not be coordinate dependent, while ${\mathrm{tr}K=\sum_i K_{ii}}$ clearly is). Thus when one speaks of the trace of ${K}$ in the metric ${g_{\mu\nu}}$, one means the trace over or with respect to the metric, i.e. ${\mathrm{tr}_gK\equiv\mathrm{tr}\left( g^{-1}K\right)=\sum_{ij}g^{ij}K_{ij}=K^i_{~i}}$ (this last is a ${(1,1)}$-tensor, and is therefore coordinate independent, as desired).

We will now proceed to evaluate the above action for the Schwarzschild black hole, given by the intimately familar metric

$\displaystyle \mathrm{d} s^2=-f(r)\mathrm{d} t^2+\frac{1}{f(r)}\mathrm{d} r^2+r^2\mathrm{d}\Omega^2~,\;\;\;f(r)=1-\frac{r_s}{r}~, \ \ \ \ \ (5)$

where the event horizon ${r_s=2M}$. We shall henceforth suppress the 2-sphere ${\mathrm{d}\Omega^2}$, as it plays no role in the following. As is well-known, the Schwarzschild metric has singularities at ${r=0}$ and ${r=r_s}$, but the latter is merely a coordinate singularity and can by removed by transforming to (among other things) Kruskal coordinates:

$\displaystyle \mathrm{d} s^2=\frac{4r_s^2}{r^3}e^{-r/r_s}\left(-\mathrm{d} T^2+\mathrm{d} X^2\right) \ \ \ \ \ (6)$

where

$\displaystyle T^2-X^2=f(r)e^{r/r_s}~,\;\;\;\frac{X+T}{X-T}=e^{t/r_s}~. \ \ \ \ \ (7)$

These coordinates are regular throughout the whole spacetime; in particular, although the signs of ${T}$ and ${X}$ change as one crosses to the interior, there’s nothing pathological at the horizon itself. Additionally, note that the curvature singularity ${r=0}$ has been mapped to the hyperbola ${T^2-X^2=1}$.

Now comes the magic. We Wick rotate to Euclidean signature by defining a new coordinate ${\xi=iT}$. Aside from the obvious sign change in the metric, this changes the definition of ${r}$ to

$\displaystyle \xi^2+X^2=\left(\frac{r}{r_s}-1\right) e^{r/r_s}, \ \ \ \ \ (8)$

and thus if we restrict to ${\xi,X\in\mathbb{R}}$, we must have ${r\ge r_s}$. In other words, the Euclidean black hole has no interior; the geometry stops at the horizon!

There’s a cute way to visualize this geometry rather easily. Go back to the Schwarzschild metric above and zoom in near the horizon:

$\displaystyle f(r)\big|_{r_s}=f(r_s)+(r-r_s)\partial_rf(r)\big|_{r_s}+O(r^2)=\frac{r-r_s}{r_s} \ \ \ \ \ (9)$

where in the last step we’ve dropped higher-order terms. It will then be convenient to rewrite the metric in terms of the proper distance ${\rho}$ near the horizon, which is found as follows:

$\displaystyle \frac{\mathrm{d} r^2}{f(r)}=\mathrm{d}\rho^2\implies\frac{\mathrm{d} r}{\sqrt{\frac{r}{r_s}-1}}=\mathrm{d}\rho\implies \frac{r}{r_s}-1=\frac{\rho^2}{4r_s^2}~. \ \ \ \ \ (10)$

Thus, after Wick rotating to ${\tau=it}$, we have

$\displaystyle \mathrm{d} s^2=\mathrm{d}\rho^2+\frac{\rho^2}{4r_s^2}\mathrm{d}\tau^2~. \ \ \ \ \ (11)$

But this is merely the line element in polar coordinates! And in polar coordinates, we must identify ${\frac{\tau}{2r_s}\sim\frac{\tau}{2r_s}+2\pi}$ to avoid a conical deficit. This is one of many ways to see that Wick rotation leads to periodicity in imaginary time.

The following is a sketch of the resulting geometry. The radial coordinate increases towards the right, while the periodic ${\tau}$ coordinate forms the circumference of the “cigar”. Each point on the surface is an ${S^2}$, from the suppressed 2-sphere ${\mathrm{d}\Omega}$. Note that, as explained above, the geometry pinches off smoothly at the horizon: the end-point at the left is at ${r_s}$.

As an aside, the Euclidean vacuum is known as the Hartle-Hawking vacuum state, which is subtly yet crucially different than the (more physically relevant) Unruh vacuum. This is an important point when one wishes to discuss thermodynamic effects like Hawking radiation, but the distinction is a subject for another post.

This is the crucial enabling factor that allows one to compute the action: the Euclidean section is non-singular, and hence ${S}$ can be evaluated on a region ${M}$ bounded by some surface ${r=r_0>2M}$, whose boundary ${\partial M}$ has compact topology ${S^1\times S^2}$ (periodic time cross the suppressed ${\mathrm{d}\Omega^2}$).

Since the Ricci scalar vanishes in the Schwarzschild metric, the action is entirely determined by the Gibbons-Hawking-York boundary term. Rewriting the integration measure as an area element ${\mathrm{d}\Sigma}$, this becomes

$\displaystyle I=\frac{1}{8\pi}\int\!K\mathrm{d}\Sigma =\frac{1}{8\pi}\int \nabla_\mu n^\mu\mathrm{d}\Sigma~, \ \ \ \ \ (12)$

where the appropriate normal vector ${n^\mu}$ is found from the bulk metric ${g_{\mu\nu}}$ by simply normalizing with respect to the ${\mathrm{d} r^2}$ component ${f(r)^{-1}}$:

$\displaystyle n^\mu=-\frac{\delta^{\mu r}}{\sqrt{f(r)^{-1}}}\implies n^r=-\sqrt{1-\frac{r_s}{r}}~. \ \ \ \ \ (13)$

This is an integral over the boundary ${S^1\!\times\!S^2}$ at ${r=r_0}$, with induced metric

$\displaystyle \mathrm{d} s^2=\left(1-\frac{r_s}{r}\right)\mathrm{d}\tau^2+r^2\mathrm{d}\Omega^2\;\;\; \implies\;\;\; \sqrt{-h}=ir^2\sin\theta\sqrt{1-\frac{r_s}{r}}~, \ \ \ \ \ (14)$

Now, there are two ways to evaluate this integral. The most straightforward option is to directly compute ${K=\nabla_\mu n^\mu}$. Since the covariant derivative generally involves non-trivial Christoffel symbols, I’m going to call this the brute-force method. Of course, these are well-known for the Schwarzschild metric, so in this case we may simply write down the relevant components:

$\displaystyle \Gamma_{tr}^t=-\Gamma_{rr}^r=\frac{r_s}{2r^2}\left(1-\frac{r_s}{r}\right)^{-1}~,\qquad \Gamma_{\theta r}^\theta=\Gamma_{\phi r}^\phi=\frac{1}{r}~. \ \ \ \ \ (15)$

We therefore have

\displaystyle \begin{aligned} K&=\nabla_\mu n^\mu=\partial_r n^r+\Gamma_{\mu r}^\mu n^r =-\left(\partial_r+\frac{2}{r}\right)\sqrt{1-\frac{r_s}{r}}\\ &=-\frac{4r-3r_s}{2r^2}\left(1-\frac{r_s}{r}\right)^{-1/2}~. \end{aligned} \ \ \ \ \ (16)

Substituting (14) and (16) into the integral expression (12), we have

\displaystyle \begin{aligned} 8\pi I&=\int\!K\mathrm{d}\Sigma =\int\!\mathrm{d}^3x\sqrt{-h}\,K\\ &=-i\frac{4r-3r_s}{2}\int_0^\beta\!\mathrm{d}\tau\int_0^{2\pi}\!\mathrm{d}\phi\int_0^\pi\!\mathrm{d}\theta\sin\theta\\ &=-2i\pi\beta\left(4r-3r_s\right) \end{aligned} \ \ \ \ \ (17)

Alternatively, a more elgant method that avoids the need to compute the curvature (read: Christoffel symbols) is to integrate by parts. After recognizing the directional derivative ${n^\mu\nabla_\mu=\partial_n}$, and using the fact that ${n^\mu\cdot\mathrm{d}\Sigma=0}$ by definition, one obtains

$\displaystyle 8\pi I=\partial_n\int\mathrm{d}\Sigma =-\sqrt{1-\frac{r_s}{r}}\,\partial_r\left(4\pi\beta ir^2\sqrt{1-\frac{r_s}{r}}\right) =-2i\pi\beta\left(4r-3r_s\right)~. \ \ \ \ \ (18)$

We’re not quite done though: we want to ensure that our geometrical result includes only the contribution of the black hole geometry, not any flat space contribution, so we need to renormalize by subtracting the latter. To do so, we’ll push the surface ${r_0}$ to infinity, where any deviation from Minkowski space is due to the ${U(1)}$ symmetry (analagous to the global effect of a conical deficit). We alluded to this above, when we used ${K}$ as a shorthand notation for ${\mathrm{tr}_g(K)-\mathrm{tr}_\eta(K)}$. We just calculated the first term; now let’s do the second.

We have the same induced metric, but embedding this boundary geometry in flat space means that the covariant derivative reduces to the divergence of the normal vector, and we can use the original form of the integral expression directly. The unit normal is ${n^r=-1}$, so ${K=r^{-2}\partial_r\left( r^2(-1)\right)=-2/r~}$. Plugging these components into the action, and evaulating with the same limits as above, we have the flat-space contribution ${I_0}$:

$\displaystyle (8\pi)I_0=-8i\pi\beta r\sqrt{1-\frac{r_s}{r}}~. \ \ \ \ \ (19)$

Finally, taking the difference of (18) and (19), and Taylor expanding around ${r_0\rightarrow\infty}$ (keep in mind that ${r}$ in the above expressions is really ${r_0}$, the boundary of the integration region on the cigar), one finds

$\displaystyle I=i\pi r_s^2+O\!\left(\frac{1}{r_0}\right)~. \ \ \ \ \ (20)$

Using the fact that ${\beta=8\pi M}$ and ${r_s=2M}$, we thus have the leading-order contribution

$\displaystyle I\approx i\pi r_s^2= 4\pi i M^2=\frac{i\beta^2}{16\pi}~. \ \ \ \ \ (21)$

It’s essential to note that we’ve computed the dominant saddle point here. The path integral is dominated by the metric ${g}$ and fields ${\phi}$ that satisfy the classical field equations, since these extremize the action by definition. We then expand around these values such that ${g=g_0+\tilde g}$, ${\phi=\phi_0+\tilde\phi}$, and

$\displaystyle I[g,\phi]=I[g_0,\phi_0]+\ldots \ \ \ \ \ (22)$

where the ellipsis denotes terms that are quadratic and higher in fluctuations about the background values. The leading-order contribution to the partition function is therefore

$\displaystyle Z=e^{iI[g_0,\phi_0]}\implies \ln Z=iI[g_0,\phi_0]=-\frac{\beta^2}{16\pi} \ \ \ \ \ (23)$

where we’ve expressed the result in terms of ${\beta}$ to fascillitate subsequent manipulations.

Gibbons and Hawking go on to compute the action for more general black holes as well, including the Reissner-Nordström solution, but we’re more concerned with the underlying physics than the mathematical details here, so let’s jump ahead to the second part of the paper, where we’ll see just what wonders these seemingly innocuous manipulations have wrought.

In fact, the answer is forshadowed already in the expression above: ${\pi r_s^2}$ is precisely a fourth the area of the horizon. But so far this is an expression for an action, not an entropy. To make the connection betwixt them, we’ll need some thermodynamics.

First, recall that the total energy of a thermodynamic system in the canonical ensemble is found by summing over the microstates (that is, energy eigenstates ${E_i}$) weighted by their probabilities ${P_i}$:

$\displaystyle \left=\sum_iE_iP_i=\frac{1}{Z}\sum_iE_ie^{-\beta E_i}=-\frac{1}{Z}\partial_\beta Z=-\partial_\beta\ln Z~. \ \ \ \ \ (24)$

We shall use this to obtain a particular expression for the entropy,

$\displaystyle S=-\partial_TA=\beta^2\partial_\beta A \ \ \ \ \ (25)$

where in the second equality we’ve simply rewritten the thermal derivative via ${T=\beta^{-1}}$. ${A}$ is the Helmholz free energy,

$\displaystyle A=\left-TS=-\beta^{-1}\ln Z~. \ \ \ \ \ (26)$

The derivation is quite simple:

\displaystyle \begin{aligned} S&=-\sum_i P_i\ln P_i=-\sum_i\frac{1}{Z}e^{-\beta E_i}\ln\left(\frac{1}{Z}e^{-\beta E_i}\right)\\ &=-\sum_i\frac{1}{Z}e^{-\beta E_i}\left(-\beta E_i-\ln Z\right) =\frac{\beta}{Z}\sum_iE_ie^{-\beta E_i}+\ln Z\sum_i\frac{1}{Z}e^{-\beta E_i}\\ &=-\frac{\beta}{Z}\partial_\beta Z+\ln Z =-\beta\partial_\beta \ln Z+\ln Z\\ &=\beta\left+\ln Z \end{aligned} \ \ \ \ \ (27)

where in going to the third line we’ve used the fact that ${\sum_i P_i=1}$. Now, from the definition of the Helmholtz free energy,

\displaystyle \begin{aligned} A&=-T\ln Z\implies\\ \partial_TA&=-\ln Z-T\partial_T\ln Z\implies\\ -\beta^2\partial_\beta A&=-\ln Z+\beta\partial_\beta\ln Z\implies\\ \beta^2\partial_\beta A&=\ln Z+\beta\left \end{aligned} \ \ \ \ \ (28)

Thus

$\displaystyle S=\beta\left+\ln Z=\beta^2\partial_\beta A \ \ \ \ \ (29)$

as desired.

Now our famous result is more-or-less immediate. From our above expression for the Lorentzian action, we have the dominant contribution to the path integral,

$\displaystyle \ln Z=-\frac{\beta^2}{16\pi}~, \ \ \ \ \ (30)$

and therefore the free energy is

$\displaystyle A=-\beta^{-1}\ln Z=\frac{\beta}{16\pi}~. \ \ \ \ \ (31)$

Substituting this into above expression for entropy, one finds

$\displaystyle S=\frac{\beta^2}{16\pi}=\pi r_s^2=\frac{\mathrm{Area}}{4} \ \ \ \ \ (32)$

voilà!

Several comments are in order. First, one might be concerned that in the course of the saddle point approximation, we missed out on important corrections. This is not the case: as explained in the paper, the higher order terms merely correspond to contributions from thermal gravitions and matter quanta (which technically requires the Gibbs, rather than Helmholtz, free energy to properly take into account the non-zero chemical potential). But we’re only interested in the “background” contribution from the horizon itself; as alluded in the introductory paragraph, this is a purely geometrical effect, which is entirely represented in the leading order term.

Though the path integral is in some sense an inherently quantum mechanical object, it is remarkable that we were able to obtain this result by otherwise appealing solely to classical geometry and thermodynamics. In particular, the fixed point of the ${U(1)}$ symmetry (${\tau\sim\tau+\beta}$) featured crucially in the analysis. An extension of this method, where the fixed points form a conical deficit, can be shown to yield the same result. Indeed, the same feature lies at the heart of the recent proof of the Ryu-Takayanagi proposal by Lewkowycz and Maldacena, where the above geometrical computation of horizon entropy is extended to holography. Given the inextensibility (so far) of most of the other myriad ways of computing black hole entropy (e.g. Hawking pairs, string microstates, Noether charge) to arbitrary spacetime horizons, this further suggests a deep connection between entropy and geometry; one which we are only beginning to unravel.

Posted in Physics | 6 Comments