Insofar as quantum mechanics can be regarded as an extension of (classical) probability theory, most of the concepts developed in the previous two parts of this sequence can be extended to quantum information theory as well, thus giving rise to quantum information geometry.
Let’s first review a few basic quantum mechanical notions; we’ve covered this in more detail before, but it will be helpful to establish present notation/language.
Quantum mechanics is the study of linear operators acting on some complex Hilbert space . We shall take this to be finite-dimensional, and hence identify , the space of complex matrices, where . In the usual bra-ket notation, the adjoint of an operator is defined by for all vectors . The density operator , which describes a quantum state, is a postive semidefinite hermitian operator of unit trace. is a pure state if it is of rank 1 (meaning it admits a representation of the form ), and mixed otherwise.
A measurement is a set of operators on which satisfy
whose action on the state yields the outcome with probability
Note that by virtue of (1), is a probability distribution:
If we further impose that consist of orthogonal projections, i.e.,
then the above describes a “simple measurement”, or PVM (projection-valued measure). More generally of course, a measurement is described by a POVM (positive operator-valued measure). We covered this concept in an earlier post, but the distinction will not be relevant here.
An observable is formally defined as the pair , where is a real number associated with the measurement outcome . Since are orthogonal projectors, we may represent the observable as a (hermitian) operator acting on whose associated eigenvalues are , i.e.,
The expectation and variance of in the state are then
We shall denote the set of all operators on by
Then the set of all density operators is a convex subset of :
Note the similarities with the embedding formalism introduced in the previous post. The set of density operators admits a partition based on the rank of the matrix,
Pure states are then elements of , which form the extreme points of the convex set (i.e., they cannot be represented as convex combinations of other points, which is the statement that ). Geometrically, the pure states are the vertices of the convex hull, while the (strictly positive) mixed states lie in the interior.
We need one more preliminary notion, namely the set of unitary operators on :
This forms a Lie group, whose action on is given by
(The first expression says that the group multiplication by sends elements of to , while the second specifies how this mapping acts on individual elements). Since the matrix rank is preserved, each is closed under the action of . It follows from (13) that maps measurements and observables to
and that the probability distributions and expectation values are therefore invariant, i.e.,
Of course, this last is the familiar statement that the unitarity of quantum mechanics ensures that probabilities are preserved.
Now, on to (quantum information) geometry! Let us restrict our attention to for now; we shall consider mixed states below. Recall that pure states are actually rays, not vectors, in Hilbert space. Since , we therefore identify (i.e., the pure states are in one-to-one correspondence with the rays in , where we’ve projected out by the physically irrelevant phase). We now wish to associate a metric to this space of states; in particular, we seek a Riemannian metric which is invariant under unitary transformations . It turns out that, up to an overall constant, this is uniquely provided by the Fubini-Study metric. Recall that when considering classical distributions, we found that the Fisher metric was the unique Riemannian metric that preserved the inner product. The Fubini-Study metric is thus the natural extension of the Fisher metric to the quantum mechanical case (for pure states only!). Unfortunately, Amari & Nagaoka neither define nor discuss this entity, but it’s reasonably well-known in the quantum information community, and has recently played a role in efforts to define holographic complexity in field theories. A particularly useful expression in this context is
where parametrizes the unitary that performs the transformation between states in .
In the remainder of this post, we shall follow Amari & Nagaoka (chapter 7) in considering only mixed states; accordingly, for convenience, we hence forth denote by simply , i.e.,
This is an open subset of , and thus we may regard it as a real manifold of dimension . Since the dimension of is given by the number of linearly independent basis vectors, minus 1 for the trace constraint, . The tangent space at a point is then identified with
Note that this precisely parallels the embedding formalism used in defining the m-representation. Hence we denote a tangent vector by , and call it the m-representation of in analogy with the classical case. (We’ll come to the quantum analogue of the e-representation later).
The PVM introduced above can also be represented in geometrical terms, via the submanifold
That is, is the intersection of two sets. The first, , is simply the set of all operators with unit trace. From (5), we recognize the second as the spectral representation of operators in the eigenbasis of . Imposing the trace constraint then implies that the eigenvalues of are weighted probabilities, where and is the totality of positive probability distributions with . To see this, observe that
since ; this underlies the second equality in the expression for above.
Lastly, the action of a unitary at a point results in a new vector, given by the mapping (13), which lives in (a subspace of) the tangent space . Such elements may be written
where is a curve in the space of unitaries with , and is its derivative at (note that corresponds to in the expression for the Fubini-Study metric (16) above).
With the above notions in hand, we may now proceed to introduce a dual structure on , and thereby the quantum analogues of the -connections, divergence, etc.
In part 2 of this sequence, we introduced the f-divergence, which satisfies a number of desireable properties such as monotonicity. It also serves as the super-class to which the -divergence belongs; the latter is particularly important, since it induces the dual structure consisting of the Fisher metric and -connections, and reduces to the familiar Kullback-Leibler divergence (a.k.a. relative entropy) for . We shall now introduce the quantum analogue of the f-divergence.
Denote by the set of all (not necessarily hermitian) operators on . Then given two strictly postive density operators , the relative modular operator is defined by
Since may be viewed as a hermitian and positive-definite operator on the Hilbert space, we may promote an arbitrary function to a hermitian operator on said space such that
We then define the quantum f-divergence as
To see how this is consistent with (that is, reduces to) the classical f-divergence defined previously, consider the spectral representations
where , are the eigenvalues for the PVMs , . Then from the definition of the relative modular operator above,
and hence for simple measurements,
To compare with the classical expression given in part 2 (cf. eq. (18)), we must take into account the fact that and are only independently orthonormal (that is, it does not follow that ). Accordingly, consider performing sequential measurement followed by , i.e., with . This can be used to constructed the POVM , where
where we have used the orthonormality of . Applying von Neumann’s projection hypothesis, this takes the initial state to the final state with probability
where in the last step we have again used the orthonormality condition (4), and the cyclic property of the trace. Similarly, by inverting this sequence of measurements, we may construct the POVM with elements . Let us denote the probabilities associated with the outcomes of acting on as . Solving for the eigenvalues and in these expressions then enables one rewrite (27) in terms of the probability distributions , , whereupon we find
which is the discrete analogue of the classical divergence referenced above.
Now, recall that every dual structure is naturally induced by a divergence. In particular, it follows from Chentsov’s theorem that the Fisher metric and -connections are induced from the f-divergence for any smooth convex function which satisfies
Thus, to find the formal quantum analogues of these objects, we restrict to the class of functions which satisfy the above conditions, whereupon the quantum f-divergence induces the dual structure , where . In fact, for a simple measurement which diagonalizes , the restriction of this triple to coincides precisely with the Fisher metric and -connections. Additionally, note that is invariant under unitary transformations, in the sense that ; in other words, for arbitrary vector fields , , we have
where is formally understood as the vector field such that . Thus , , and are indeed invariant under the action of , which is the characteristic property of the classical analogues we wanted to preserve.
As in the classical case, we can further restrict the class of functions to satisfy eq.~(20) in part 1, which defines the quantum -divergence . Then for , we obtain
cf. eq. (21) and (22) ibid. In obtaining these expressions, we used the fact that the relative modular operator satisfies
where the -power and logarithm of an operator with eigenvalues are defined such that these become and , respectively, with the same (original) orthonormal eigenvectors. Of course, the quantum analogue of the Kullback-Leibler divergence (34) is none other than the quantum relative entropy!
Many of the classical geometric relations can also be extended to the quantum case. In particular, is dually flat with respect to , and the canonical divergence is the (quantum) relative entropy . And as in the discussion of (classical) exponential families, we can parameterize the elements of an arbitrary -autoparallel submanifold as
where now are hermitian operators, and is an -valued function on the canonical parameters . As in the classical case, these form a 1-affine coordinate system, and hence the dual -affine coordinates are given by , which naturally extends the classical definition in terms of the expectation value of , cf. eq. (36) of part 2. Recall that and are related through the dual potential , defined via the Legendre transform
Taking a derivative of the bracketed quantity, the maximum occurs when , where the derivative is with respect to . But this condition precisely corresponds to the definition of above, cf. eq. (37) in part 2. Hence
where we have identified the von Neumann entropy , and used the fact that (that is, ).
As one might expect from the appearance of the von Neumann entropy, several other important concepts in quantum information theory emerge naturally from this framework. For example, the monotonicity of the classical f-divergence can be extended to the quantum f-divergence as well, whence satisfies the monotonicity relation
for any completely positive trace-preserving map (i.e., a quantum channel), and operator convex function ; see section 7.2 for more details. Since the -divergence is a special case of the f-divergence, this result implies the monotonicity of relative entropy (a detailed analysis can be found in ).
To close, let us comment briefly on some relations to Tomita-Takesaki modular theory. In this context, the relative modular operator is defined through the relative Tomita-Takesaki anti-linear operator , and provides a definition of relative entropy even for type-III factors. For finite-dimensional systems, one can show that this reduces precisely to the familiar definition of quantum relative entropy given above. For more details, I warmly recommend the recent review by Witten . Additionally, the fact that satisfies monotonicity (40) corresponds to the statement that the relative entropy is monotonic under inclusions, which can in turn be used to demonstrate positivity of the generator of (half-sided) modular inclusions; see  for an application of these ideas in the context of the eternal black hole.
- A. Mueller-Hermes and D. Reeb, “Monotonicity of the quantum relative entropy under positive maps,” arXiv:1512.06117 [quant-ph]
- E. Witten, “Notes on some entanglement properties of quantum field theory,” arXiv:1803.04993 [hep-th]
- R. Jefferson, “Comments on black hole interiors and modular inclusions,” arXiv:1811.08900 [hep-th]
hi, thank you very much for your beautiful and useful posts, I have a question, do methods of information geometry have applications in Ads/CFT or holographic quantum error correction or computational complexity which is the focus of your work
Thank you, I’m very glad you enjoyed them!
In fact, one of the reasons I became interested in information geometry was because of its potential applications to holographic complexity. In the original paper with Rob Myers — in which we put forth a first definition of complexity for quantum field theories — “complexity” is basically identified with the minimum distance between quantum states. In order for “distance” to have a well-defined meaning, we first had to figure out how to define a geometry on the space of states, after which we could compute geodesics. But for technical reasons, our analysis was limited to free theories, and extending this concept to interacting theories has proven challenging.
Information geometry does something very similar: one gets a geometry on the space of probability distributions. Hence if one works at the level of wavefunctions, it’s natural to ask whether this provides a more general way of inducing a geometry on the space of states than the approach from circuit complexity mentioned above. In particular, I’d like to know whether this technology can be used to move beyond the class of Gaussian states. To really make contact with holography, we need a definition of complexity for strongly interacting CFTs, since this is the class of theories known to have good bulk duals.
More generally, as you seem well aware, the past few years have seen the rise of a fascinating interplay between quantum information theory and high energy physics, particularly in the context of emergent spacetime or “it from qubit”. In AdS/CFT, entanglement entropy has taken center stage in efforts to reconstruct the bulk spacetime. Relative entropy, for example, has proven particularly important—and it emerges quite naturally in information geometry as the Kullback-Leibler divergence. Given these deepening connections, my personal sense is that information geometry may provide a powerful framework for furthering our understanding of this interplay, but I have no idea whether it will ultimately be of any practical use. That’s why we call it “research”. 😉