# A cute trace inequality

Theorem. If $A$ and $B$ are positive semi-definite matrices then

$\mathrm{Tr} (AB)^2 \leq \mathrm{Tr} (A^2 B^2)$

Note that both sides contain the same matrices the same number of times, just in a different order ($ABAB$ versus $AABB$). Here is a cute proof due to Bellman.

Proof. First, note that

$(AB-BA)^2 = (AB)^2 + (BA)^2 - A B^2 A - B A^2 B.$

Next, note that the commutator $AB-BA$ is skew-Hermitian, that is, $(AB-BA)^\dagger = -(AB-BA)$. In other words, $AB-BA = iH$ for some Hermitian matrix $H$. This means that all eigenvalues of $AB-BA$ are purely imaginary. When we square the matrix, they become negative, i.e., $(AB-BA)^2$ is negative semi-definite. From this we get

$0 \geq \mathrm{Tr} (AB-BA)^2 = \mathrm{Tr} \Bigl[ (AB)^2 + (BA)^2 - A B^2 A - B A^2 B \Bigr].$

Using the cyclic property of the trace, this simplifies to

$0 \geq 2 \mathrm{Tr}(AB)^2 - 2 \mathrm{Tr} (A^2 B^2)$

and we are done.

Here is an interesting open question.

Christmas Conjecture. Let $A$ and $B$ be $n \times n$ density matrices with fixed spectrum: the eigenvalues of $A$ are $\alpha_1 \geq \alpha_2 \geq \dots \geq \alpha_n$ while the eigenvalues of $B$ are $\beta_1 \geq \beta_2 \geq \dots \geq \beta_n$. Then

$\max_{A,B} \mathrm{Tr} \Bigl[ AB + A^2 B^2 - (AB)^2 \Bigr] = \sum_{i=1}^n \alpha_i \beta_i.$

where the maximum is over all $A,B$ with the given spectrum.

If the last two terms $A^2 B^2 - (AB)^2$ would not be there, then showing this would be quite straightforward. Indeed, the left-hand side becomes simply

$\max_P \sum_{i,j=1}^n \alpha_i P_{ij} \beta_j$

where the maximum is over doubly-stochastic matrices $P$. By the Birkhoff–von Neumann theorem (see Theorem 13.1 in John Watrous’ lecture notes), the maximal value is achieved when $P$ is a permutation matrix. In fact, by the rearrangement theorem, $P$ must be the identity matrix, in which case the two lists of eigenvalues are sorted in the same order.

However, we are dealing with a slightly more complicated expression in the conjecture. As we just saw above, $\mathrm{Tr} \bigl[ A^2 B^2 - (AB)^2 \bigr] \geq 0$, so the last two terms can only increase the left-hand side and thus make the inequality harder to prove. Nevertheless, I conjecture that this stronger inequality still holds, and I can prove it for $n = 2$.

Update: I recently received an e-mail from Jianxin Chen containing a proof of the Christmas Conjecture!

# Quantum walks can find a marked element on any graph

This post is about my paper Quantum walks can find a marked element on any graph with Hari Krovi, Frédéric Magniez, and Jérémie Roland. We wrote it in 2010, but after spotting a subtle mistake in the original version, we have recently substantially revised it. It went from 15 to 50 pages after we added much more details and background material, as well as corrected some small bugs and addressed one major bug.

Problem statement

Imagine a huge graph with many vertices, some of which are marked. You are able to move around this graph and query one vertex at a time to figure out if it is marked or not. Your goal is to find any of the marked vertices.

Given an instance of such problem, a typical way to solve it is by setting up a random walk on the graph. You can imagine some probabilistic procedure that systematically moves from one vertex to another and looks for a marked one.

More formally, you would define a stochastic matrix $P$ whose entry $P_{xy}$ describes the probability to move from vertex $x$ to $y$. Starting from some randomly chosen initial vertex, your goal is to end up in one of the marked vertices in the set $M$.

Our result

We show that any classical algorithm that is based on such random walk can be turned into a quantum walk algorithm that finds a marked vertex quadratically faster. To state this more formally, let us define the hitting time of a random walk:

Definition. The hitting time of $P$ with respect to a set of marked vertices $M$ is the expected number of steps the walk $P$ takes to find a marked vertex, starting from a random unmarked vertex chosen from a distribution which is proportional to the stationary distribution of $P$. Let us denote this quantity by $\mathrm{HT}(P,M)$.

Our main result is as follows:

Theorem. Quantum walk can find a marked vertex in $\sqrt{\mathrm{HT}^+(P,M)}$ steps.

As you can see, this is a very general result—no matter how cleverly the transition probabilities for the random walk $P$ are chosen, we can always cook up a related quantum walk that beats the classical walk quadraticaly!

Note that this is much more general than what is achieved by Grover’s algorithm. Grover’s algorithm corresponds to the special case, where the underlying graph is complete and the random walk moves from any vertex to any other with equal probability.

It’s weaker than before…

You may have noticed a little “+” that appeared next to $\mathrm{HT}$ in the statement of the theorem. Indeed, this has to do with the subtle mistake we spotted in our earlier proof. It turns out that if $|M| = 1$ (i.e., there is a unique marked vertex) then

$\mathrm{HT}^+(P,M) = \mathrm{HT}(P,M)$

and we indeed achieve a quadratic quantum speedup. Unfortunately, when $|M| > 1$ it can happen that

$\mathrm{HT}^+(P,M) > \mathrm{HT}(P,M)$

and thus we don’t get a fully quadratic speedup. Hence our result is weaker than what we claimed previously. On the other hand, the new version is more correct!

Proof idea

Our algorithm is based on Szegedy’s method for turning random walks into quantum walks. It constructs two unitary matrices, each corresponding to a reflection with respect to a certain subspace. These subspaces are defined by

1. the standard basis vectors corresponding to marked vertices,
2. the unit vectors obtained by taking entry-wise square roots of the rows of $P$.

Together these two reflections define one step of the quantum walk.

Our contribution consists in modifying the original walk $P$ before we quantize it. We define a semi-absorbing walk $P(s)$ that leaves a marked vertex with probability $1-s$ even when one is found. This might seem like a bad idea, but one can check that at least classically it does not make things worse by too much. In fact, the $s \to 1$ limit of $P(s)$ corresponds to the classical algorithm that never leaves a marked vertex once it is found.

Glitch in the previous proof

Our proof makes extensive use of a certain quantity $\mathrm{HT}(s)$ which we associate to $P(s)$ and call interpolated hitting time. Then $\mathrm{HT}^+(P,M)$, the extended hitting time that appears in the above theorem, is defined as the limit

$\mathrm{HT}^+(P,M) := \lim_{s \to 1} \mathrm{HT}(s)$

When $|M| = 1$, taking this limit is straightforward and one can easily see that it gives $\mathrm{HT}(P,M)$, the regular hitting time.

It is tempting to guess that the same happens also when $|M| > 1$. Indeed, it is far from obvious why in this case the answer does not come out the same way as in the $|M| = 1$ case. This is exactly what was overlooked in the earlier version of our paper. Computing the limit properly when $|M| > 1$ is much harder (the expression contains inverse of some matrix that is singular at $s = 1$). This is done in detail in the final appendix of our paper.

Open problems

Here are some open questions:

• Why is it that our algorithm has a harder time to find a needle in a haystack when there are several needles rather than just one?
• What is the operational interpretation of the interpolated hitting time $\mathrm{HT}(s)$?
• Can quadratic speedup for finding be achieved also when there are multiple marked vertices?
• How can we efficiently prepare the initial state on which the walk is applied?

One might get some insight in the first two questions by observing that our algorithm actually solves a slightly harder problem than just finding a marked vertex—it samples the marked vertices according to a specific distribution (proportional to $P(s^*)$ for some $s^*$). When there is only one marked vertex, finding it is the same as sampling it. However, for multiple marked vertices this is equivalence does not hold and in general it should be harder to sample.

# Exact quantum query algorithms

Andris Ambainis, Jānis Iraids, and Juris Smotrovs recently have obtained some interesting quantum query algorithms [AIS13]. In this blog post I will explain my understanding of their result.

Throughout the post I will consider a specific type of quantum query algorithms which I will refer to as MCQ algorithms (the origin of this name will become clear shortly). They have the following two defining features:

• they are exact (i.e., find answer with certainty)
• they measure after each query

Quantum effects in an MCQ algorithm can take place only for a very short time — during the query. After the query the state is measured and becomes classical. Thus, answers obtained from two different queries do not interfere quantumly. This is very similar to deterministic classical algorithms that also find answer with certainty and whose state is deterministic after each query.

Basics of quantum query complexity

Our goal is to evaluate some (total) Boolean function $f(x)$ on an unknown input string $x \in \{1,-1\}^n$ (we assume for convenience that binary variables take values +1 and -1). We can access $x$ only by applying oracle matrix

$Q_x = \begin{pmatrix} x_1 & & & \\ & x_2 & & \\ & & \ddots & \\ & & & x_n \end{pmatrix}$

to some quantum state. The minimum number of queries needed to determine the value of $f(x)$ with certainty is called the exact quantum query complexity of $f$.

Each interaction with oracle in an MCQ algorithm can be described as follows:

1. prepare some state $|\psi\rangle$
2. apply query matrix $Q_x$
3. apply some unitary $U$
4. measure in the standard basis

Intuitively, this interaction is a quantum question (specified by $|\psi\rangle$ and $U$) which produces a classical answer (measurement outcome $i$ that appears with probability $|\langle i|UQ_x|\psi\rangle|^2$).

Since each of the answers reveal some property of $x$, it is convenient to identify the collection of these properties with the question itself (I think of it as a “quantum Multiple Choice Question”, hence MCQ). Of course, not every collection of properties constitutes a valid quantum question — only those for which there exists a corresponding $|\psi\rangle$ and $U$. (We will see some examples soon.)

MCQ algorithms

Simply put, an MCQ algorithm is a decision tree: each its leaf contains either 0 or 1 (the value of $f(x)$ for corresponding $x$), and each of the remaining nodes contains a quantum question and children correspond to answers.

Classical deterministic decision trees are very similar to MCQ algorithms, except that their nodes contain classical questions — at each node we can only ask one of the $n$ variables $x_i$. Quantumly, we have a larger variety of questions — for example, we can ask XOR of two variables (as in Deutsch’s algorithm). Another difference is that a quantum question does not have a unique answer: if several answers are consistent with the input string $x$, we will get one of them at random.

An obvious question regarding MCQ algorithms is this:

How can we exploit the quantum oracle to find $f(x)$ with less queries than classically?

Surprisingly, until recently essentially no other way of exploiting the quantum oracle was known, other than Deutsch’s XOR trick (see [MJM11] by Ashley Montanaro, Richard Jozsa, and Graeme Mitchison for more details). What is interesting about the [AIS13] paper is that it provides a new trick!

Query, measure, recurse!

All algorithms discussed in [AIS13] are MCQ and recursive. They proceed as follows:

1. query
2. measure
3. recurse

In the last step, depending on the measurement outcome, either $f(x)$ is found or the problem is reduced to a smaller instance and we proceed recursively. Let me explain how this works for two functions which I will call BALANCED and MAJORITY (they are special cases of EXACT and THRESHOLD discussed in [AIS13]).

BALANCED

$\mathrm{BALANCED}_{2k} (x_1, x_2, \dotsc, x_{2k}) = 1$ iff exactly $k$ of the variables $x_i$ are equal to 1. An MCQ algorithm asks a quantum question that can reveal the following properties of the input string $x$:

1. $x$ is not balanced (the number of +1s and -1s is not equal)
2. $x_i$ is not equal to $x_j$ (for some $i \neq j$)

If we get the first answer then $\mathrm{BALANCED}_{2k}(x) = 0$ and we are done. If we get the second answer for some $i \neq j$, we can ignore the variables $x_i$ and $x_j$ and recursively evaluate $\mathrm{BALANCED}_{2(k-1)}$ on the remaining variables. In total, we need at most $k$ queries (which can be shown to be optimal).

It remains to argue that the above is a valid quantum question. Alternatively, we can show how to prepare the following (unnormalized) quantum state:

$\sum_{i=1}^{2k} x_i |0\rangle + \sum_{i

(It lives in the space spanned by $|0\rangle$ and $|ij\rangle$ for all $i.) This can be easily done by taking

$|\psi\rangle = \sum_{i=0}^{2k} |i\rangle$

and $U$ that acts as

$U |i\rangle = \frac{1}{\sqrt{2k}} \Bigl( |0\rangle + \sum_{j>i} |ij\rangle - \sum_{j

To check that $U$ is a valid isometry, notice that it maps $|i_1\rangle$ and $|i_2\rangle$ to paths that overlap in exactly one cell (rows are labeled by $i$ and columns by $j$):

Since both states also have $|0\rangle$ in common, they are orthogonal.

MAJORITY

$\mathrm{MAJORITY}_{2k+1} (x_1, x_2, \dotsc, x_{2k+1}) = 1$ iff at least $k+1$ of the variables $x_i$ are equal to 1. This time the quantum question has answers

1. $x$ is not balanced when $x_i$ is omitted (for some $i$)
2. $x_i$ is not equal to $x_j$ (for some $i \neq j$)

If we get the first answer for some $i$, we omit $x_i$ and any other variable. If we get the second answer for some $i \neq j$, we ignore the variables $x_i$ and $x_j$. In both cases we proceed by recursively evaluating $\mathrm{MAJORITY}_{2k-1}$ on the remaining variables. When only one variable is left, we query it to determine the answer. This requires at most $k+1$ queries in total (which again can be shown to be optimal).

The corresponding (unnormalized) state in this case is

$\sum_{i=1}^{2k+1} \sum_{j \neq i} x_i |j\rangle + \sqrt{2k-1} \sum_{i

(It lives in the space spanned by $|j\rangle$ for all $j$ and $|ij\rangle$ for all $i.) It can be obtained by choosing

$|\psi\rangle = \sum_{i=1}^{2k+1} |i\rangle$

and $U$ acting as

$U |i\rangle = \frac{1}{2k} \Bigl( \sum_{j \neq i} |j\rangle + \sqrt{2k-1} \sum_{j>i} |ij\rangle - \sqrt{2k-1} \sum_{j

One can check that $U$ is an isometry using a similar picture as above.

Open questions

The problem of finding a quantum query algorithm with a given number of queries and a given success probability can be formulated as a semi-definite program. This was shown by Howard Barnum, Michael Saks, and Mario Szegedy in [BSS03] and can be used to obtain exact quantum query algorithms numerically. Unfortunately, this approach does not necessarily give any insight of why and how the obtained algorithm works. Nevertheless, it would be interesting to know if there is a similar simple characterization of MCQ algorithms.

The algorithms from [AIS13] described above are relatively simple. However, that does not mean that they were simple to find. In fact, the SDP corresponding to $\mathrm{BALANCED}_4$ had already been solved numerically in [MJM11]. Unfortunately, it did not provide enough insight to obtain an algorithm for $\mathrm{BALANCED}_{2k}$ for any $k$. A similar situation is now with $\mathrm{EXACT}^6_{2,4}$ (which it is true if exactly two or four out of the six variables are true). From [MJM11] we know that it has an exact 3-query algorithm. Unfortunately, we do not have enough understanding to describe it in a simple way or generalize it. Besides, I wonder if $\mathrm{EXACT}^6_{2,4}$ has a 3-query MCQ algorithm, or do we actually need interference between the queries to find the answer so fast?

Finally, it would be interesting to know if there is any connection between exact quantum query algorithms and non-local games or Kochen–Specker type theorems.

# SIC-POVM sickness

I contracted the SIC-POVM sickness a month ago after reading a paper by Gelo Tabia and Marcus Appleby who explore the qutrit case in great detail. I got over it in a week or so just to contract it again today when a paper by Gilad Gour came out. It has a very promising title that ends with “…symmetric informationally complete measurements exist in all dimensions”. Unfortunately, it begins with “General…”. Let’s see what the paper is about. Update: Chris Ferrie has pointed out to me that a similar result has already been obtained by Marcus Appleby (see Sect. 4 of this paper).

Intuition

The geometric intuition of this result can be summarized as follows:

1. Choose an arbitrary orthonormal basis for the linear space of all $d \times d$ Hermitian matrices.
2. Construct a simplex out of these basis vectors in some specific way (see below).
3. Shrink it sufficiently small so that it fits inside the positive-semidefinite cone.

The last step always works, because the convex body formed by all density matrices contains a ball around the maximally mixed state. The required amount of shrinking is determined by parameter $t$ in Theorem 1 (it is also related via Eq. (7) to parameter $a$ that measures the “purity” of the resulting SIC-POVM).

Details and example

The second step can be described more precisely as follows. Let $\{f_1, \dotsc, f_{d^2-1}\}$ be an arbitrary orthonormal basis of $\mathbb{R}^{d^2-1}$ (think of it as the space of all $d \times d$ traceless Hermitian matrices). Then a regular simplex with vertices $p_1, \dotsc, p_{d^2}$ can be obtained as follows:

$p_i := f - d(d+1) f_i$

$p_{d^2} := (d+1) f$

where $i = 1, \dotsc, d^2-1$ and $f$ is the sum of all $f_i$ (these expressions correspond to Eq. (5) in the paper). For example, if $d=2$ and $f_i := e_i$ is the i-th standard basis vector, then vectors $p_i$ are the rows of the following matrix:

$P = \begin{pmatrix} -5 & 1 & 1 \\ 1 & -5 & 1 \\ 1 & 1 & -5 \\ 3 & 3 & 3 \end{pmatrix}$

They look like this in 3D (the orange arrows):

Note that

$P \cdot P^{\mathsf{T}}= \begin{pmatrix} 27 & -9 & -9 & -9 \\ -9 & 27 & -9 & -9 \\ -9 & -9 & 27 & -9 \\ -9 & -9 & -9 & 27 \end{pmatrix}$

hence vectors $p_1, p_2, p_3, p_4$ indeed form a simplex in $\mathbb{R}^3$. In fact, this construction works in any dimension — there is nothing special about it being of the form $d^2-1$.

Open questions

The hard question (which is still open) is how to choose the basis $f_i$ so that you need to shrink the simplex as little as possible. In other words, you want the matrices associated to $p_i$ to have rank one. All we know about this case is that the matrices $F_i$ associated to the basis vectors $f_i$ must have certain eigenvalues given by Eq. (11) in the paper. How close do we get to this if $F_i$ are generalized Pauli matrices or Haar random?

It would be interesting to know if anything extra can be said about the matrices $F_i$ if the SIC-POVM obeys Weyl-Heisenberg symmetry (in prime dimensions this is without loss of generality due to Huangjun Zhu).

# Nonlocality without entanglement

After much work we finally uploaded our paper titled “A framework for bounding nonlocality of state discrimination” on the arXiv (this is joint work with Andrew Childs, Debbie Leung, and Laura Mančinska).

Let me informally explain what quantum nonlocality without entanglement is about and briefly summarize the main results of our paper.

Can you tell an apple from an orange?

Let me describe a slightly strange game. Imagine that some crazy biotechnology corporation like Monsanto comes up with a way to grow a new genetically modified fruit called abo. An abo fruit consists of two halves: each of the halves is either an apple (A), banana (B), or orange (O).

In total there are 9 different possible abo fruits: apple-apple, apple-banana, apple-orange, banana-apple, banana-banana, etc. Moreover, let’s assume that each combination is equally likely.

Now, two friends Amandine and Bobby that live in two different places decide to play the following game. They ask somebody to buy an abo fruit, cut it in two halves and mail one half to each of them. Their task is to determine which of the 9 possible combinations it is. (We assume that abo fruits don’t go bad, since they have been properly sprayed with Monsanto’s chemicals.)

Since Amandine and Bobby live in different places, we allow them to talk over Skype to make their decision. In fact, they can even use the video mode to show each other the half of the fruit they have. Technically, this is called LOCC (local operations and classical communication).

If you think about this for a bit, you should realize that this is clearly a very silly game! Each of the parties can easily identify their half of the fruit as either an apple, orange, or banana, and inform the other party. Thus, no fun in the classical case.

Quantum fruits?

Imagine that Monsanto’s technology advances to the point when they can create superpositions of different plant DNA and produce quantum abo fruits, where each half of the fruit is an arbitrary combination of an apple, banana, and orange. For example,

$\displaystyle \frac{|\text{apple}\rangle + |\text{banana}\rangle}{\sqrt{2}} \otimes |\text{orange}\rangle$

is a valid quantum abo fruit (Amandine has something between an apple and banana, but Bobby has an orange). If you buy it at supermarket, it might look something like this:

To be fair, we should allow only 9 different kinds of quantum abo fruits to be produced. Moreover, we require that they can be discriminated globally, i.e., when Amandine and Bobby are together and perform a joint observation of the whole fruit (in other words, the corresponding quantum states form an orthonormal product basis).

Now the question is: can Amandine and Bobby always tell the 9 quantum abo fruits apart over Skype? Surprisingly, the answer to this question is “No”.

Sausage states…

Let us consider a specific construction (also known as domino states). If we replace fruits by numbers, we can write these 9 states as

$|\psi_1\rangle = |1\rangle \otimes |1\rangle$
$|\psi_{2,3}\rangle = |0\rangle \otimes |0 \pm 1\rangle$
$|\psi_{4,5}\rangle = |2\rangle \otimes |1 \pm 2\rangle$
$|\psi_{6,7}\rangle = |1 \pm 2\rangle \otimes |0\rangle$
$|\psi_{8,9}\rangle = |0 \pm 1\rangle \otimes |2\rangle$

where $|i \pm j\rangle := (|i\rangle \pm |j\rangle) / \sqrt{2}$. Clearly, these all are product states, and one can easily check that they are mutually orthogonal.

Alternatively, we can depict them as “sausages” on a grill, where Amandine has the first half (rows), but Bobby has the second half of the state (columns):

These states were introduced by Charles Bennett, David DiVincenzo, Christopher Fuchs, Tal Mor, Eric Rains, Peter Shor, John Smolin, and William Wootters in 1998 in their paper “Quantum nonlocality without entanglement”.

They show that these states cannot be locally discriminated even if Amandine and Bobby are allowed to talk on Skype as long as they want. No matter what they do, they will always have at least some small amount (at least 0.00000531 bits) of uncertainty left about what their joint state is. (Technically, we say that these states cannot be discriminated by LOCC asymptotically.) Thus, despite there being no entanglement, these states exhibit some nonlocal properties. That’s why this phenomenon is called quantum nonlocality without entanglement.

Intuitively, the problem is that Amandine cannot just identify her half of the state as being either 0, 1, or 2, since this would destroy the superposition in case the state was, say $|\psi_8\rangle$ or $|\psi_9\rangle$, and a similar argument holds for Bobby. In other words, the reduced states on Amandine’s side are not orthonormal, so there is no preferred basis in which to perform the first measurement. However, it is very hard to make this argument rigorous and obtain a quantitative estimate of the degree of failure.

Our contributions and the proof idea

We provide a framework for bounding the amount of nonlocality in a given set of bipartite quantum states in terms of a lower bound on the probability of error in any LOCC discrimination protocol. The main idea is to establish a trade-off between disturbance and information gain, i.e., to show that the more information we learn about a quantum system, the more we disturb it.

To explain this more formally, let us consider the problem of discriminating arbitray n bipartite states $|\psi_1\rangle, \dotsc, |\psi_n\rangle$. Assume that Amandine and Bobby execute some protocol for discriminating these states, and we stop them at some point during the protocol. Let $p_1, \dotsc, p_n$ be the posterior probability distribution and $|\phi_1\rangle, \dotsc, |\phi_n\rangle$ be the corresponding post-measurement states.

Definition. We say that states $|\psi_i\rangle$ satisfy a trade-off between disturbance and information gain with constant $\eta$ if

$\eta \, \varepsilon \leq \delta$

holds in every branch when the protocol has been stopped, where

$\varepsilon = \max_k p_k - \frac{1}{n}$

is the information gain and

$\delta = \max_{i \neq j} |\langle\phi_i|\phi_j\rangle|$

is the disturbance. The largest constant $\eta$ that satisfies this inequality we call the nonlocality constant of the given states.

Intuitively, $\varepsilon$ measures how far the posterior probability distribution is from uniform, but $\delta$ measures how nonorthogonal the post-measurement states have become.

Now our main result can be stated as follows:

Theorem. Any LOCC protocol for discriminating states drawn uniformly from $|\psi_1\rangle, \dotsc, |\psi_n\rangle$ errs with probability

$p_{\text{error}} \geq \displaystyle \frac{2}{27} \, \frac{\eta^2}{n^5}$

where $\eta$ is the nonlocality constant of these states.

Our proof of this theorem is based on Helstrom bound. Our second main contribution is a systematic method for bounding the nonlocality constant $\eta$ for a large class of product bases. In particular, we obtain specific bounds for domino states, rotated domino states, and a more general family of domino-type states.

For more details see our paper.

# Three myths about quantum computing — Part 3: Teleportation and superdense coding

This is the last post in the series on myths about quantum computing.

One of the most exciting things about quantum information is quantum teleportation—the ability to transmit quantum data by sending only classical bits. Superdense coding is another surprising protocol which lets you transmit two classical bits by sending only one qubit.

It is often mistakenly believed that these two features of quantum information do not have a classical equivalent. The goal of this post is to explain why this is not the case, and to clarify other related misconceptions.

Bell states

Let us first briefly discuss some simple facts that are useful for explaining how quantum teleportation works. Let $z, x \in \{0,1\}$. Then the following four two-qubit states are orthonormal and form a basis:

$|\beta_{zx}\rangle = \dfrac{|0,x\rangle + (-1)^z |1,\bar{x}\rangle}{\sqrt{2}}$

This is known as Bell basis. One can prepare $|\beta_{zx}\rangle$ from the standard basis state $|z,x\rangle$ as follows:

$|\beta_{zx}\rangle = \mathrm{CNOT} \cdot (H \otimes I) \cdot |z,x\rangle$

One of the most important properties of Bell states is that any Bell state can be mapped to any other by applying local Pauli matrices on only one of the systems:

$|\beta_{zx}\rangle = (Z^z X^x \otimes I) \cdot |\beta_{00}\rangle = (I \otimes X^x Z^z) \cdot |\beta_{00}\rangle$

Quantum teleportation

The setting for quantum teleportation is as follows. Assume that Alice has a single-qubit quantum state $|\psi\rangle$ that she wants to send to Bob. Moreover, they have met beforehand and established a joint two-qubit quantum state

$|\beta_{00}\rangle = \dfrac{|0,0\rangle + |1,1\rangle}{\sqrt{2}}$

known as EPR pair, shared between them. Here is a schematic diagram of how quantum teleportation works:

Here “Bell” denotes the measurement in the Bell basis whose outcomes are classical bits z and x, and $Z^z X^x$ is the correction operator that Bob has to apply to recover the original state.

Quantum teleportation works because of the following identity:

$|\psi\rangle \otimes |\beta_{00}\rangle = \displaystyle \frac{1}{2} \sum_{z,x \in \{0,1\}} |\beta_{zx}\rangle \otimes X^x Z^z |\psi\rangle$

which holds for any single qubit state $|\psi\rangle$. From this identity we see that when Alice performs the Bell measurement, she gets two uniformly random bits z and x, and Bob’s state collapses to $X^x Z^z |\psi\rangle$ which is a distorted version of $|\psi\rangle$. Once Bob receives z and x, he can apply $Z^z X^x$ to recover $|\psi\rangle$. Note that an adversary who intercepts z and x cannot learn anything about $|\psi\rangle$, since both bits are uniformly random.

The usual argument why quantum teleportation is surprising, is that it allows to transmit a quantum state $|\psi\rangle$ with two continuous degrees of freedom (say, the angles in the Bloch sphere) by sending only two classical bits. This may seem quite paradoxical, since it appears as if two real numbers have been transmitted by sending just two bits. However, this is not the case at all, since the two parameters describing $|\psi\rangle$ cannot be recovered (to any reasonable degree of precision) from a single copy of $|\psi\rangle$. For example, we know by Holevo’s theorem that one cannot learn more than one bit of information by measuring $|\psi\rangle$.

Classical teleportation

What is the classical equivalent of the above procedure? Let us first set up some terminology and notation. Since we will be dealing with probability distributions, let

$\mu = \dfrac{1}{2} [0] + \dfrac{1}{2} [1]$

denote the uniform distribution over {0,1}. Similarly, let

$\beta = \dfrac{1}{2} [0,0] + \dfrac{1}{2} [1,1]$

be the classical version of the EPR state $|\beta_{00}\rangle$.

Intuitively, one should think of probability distributions as a way of describing a coin that has been flipped and (without looking at it) put inside a sealed envelope. Note that one can perform operations on such coin, even though its exact state is not known. For example, by flipping the envelope around one can perform the logical NOT. One could also imagine some more complicated procedures for performing joint operations on two coins in a joint unknown state.

We will use the term pbit to refer to a probabilistic bit that describes a coin inside the envelope. Note that a pbit has one degree of freedom. However, once the envelope has been opened, the state of the coin becomes certain, i.e., either [0] or [1], so we will describe it by a deterministic bit or dbit. Note that this is analogous to how measurements work in the quantum case, except that in the classical case there is only one measurement basis—the standard basis.

Now we are ready to describe the classical teleportation. Our task is the following: we would like to transmit one pbit by sending one dbit. In other words, we want to transmit one degree of freedom per one classical bit being sent, just as in the quantum case.

At first, this might seem trivial—can’t we just send the bit over and be done? Unfortunately, not. Recall that we want to transmit a pbit or an “unobserved coin”, but we are allowed to send only a dbit. In other words, your envelope will always be opened and its content revealed, just as if you were a journalist sending an e-mail from China. Let us depict this situation with the following diagram:

Here the dark pipe represents a pbit in state $\pi$, but the white pipe a dbit obtained by observing $\pi$. The dashed line between Alice and Bob represents Chinese government.

To make this scheme work, we will use a shared resource between Alice and Bob as in the quantum case. A natural classical equivalent of the EPR state $|\beta_{00}\rangle$ is the probability distribution $\beta$ defined above. To preserve our pbit $\pi$, we will XOR it with Alice’s half of $\beta$, and send the result over to Bob. Even though the pbit $\pi$ is turned into a dbit due to Chinese government spying on Alice, Bob can still XOR it with his half of $\beta$ and recover the original pbit $\pi$:

This indeed gives the correct result, since the original pbit essentially gets XORed with the same value twice. Intuitively, one can think of it being transmitted “back in time” through the black pipe that represents $\beta$. Just as in the quantum case, the party who intercepts the transmitted bit learns nothing about $\pi$, since the transmitted bit is uniformly random. In fact, this scheme is equivalent to one-time pad.

Quantum superdense coding

Quantum superdense coding is the dual protocol of quantum teleportation (this can be made more precise by considering coherent communication). It allows to send two classical bits by transmitting a single qubit and consuming one shared EPR pair.

Initially Alice and Bob share an EPR state $|\beta_{00}\rangle$. Alice encodes her two classical bits z and x in her half of $|\beta_{00}\rangle$ by performing $Z^z X^x$. This maps the joint state of Alice and Bob to Bell state $|\beta_{zx}\rangle$ as discussed above. Then Alice sends her qubit over to Bob, who can recover both bits by performing a measurement in the Bell basis:

Quantum superdense coding works because of the following identity:

$(H \otimes I) \cdot \mathrm{CNOT} \cdot (Z^z X^x \otimes I) \cdot |\beta_{00}\rangle = |z,x\rangle$

This immediately follows from the properties of Bell states discussed above. Since all Bell states are maximally entangled, their reduced states are completely mixed, so the transmitted qubit contains no information about the two encoded classical bits z and x.

Classical “supersparse” coding

Classical superdense coding is very similar to classical teleportation, except the roles of dbit and pbit are reversed, i.e., Alice wants to transmit a dbit by sending a pbit. This seems to be even simpler than teleportation, since we are given more resources and asked to perform a simpler task! In fact, there is nothing “superdense” about this task, as it only wastes resources. In this sense it would be more appropriate to call it “supersparse” coding!

The only catch is that for complete analogy with the quantum case, the transmitted pbit should be uniformly random, so that a potential eavesdropper could learn nothing about the original message. Here is a protocol that achieves the task of transmitting a dbit b in the desired way:

Note that the only difference between this picture and the one for classical teleportation is the color of pipes.

Conclusion

Quantum teleportation should not seem more surprising than the classical one, since in both cases one degree of freedom is transmitted per one classical bit being sent. The only quantitative difference is a factor of two in the amount of resources consumed: one ebit is consumed for sending two degrees of freedom in the quantum case versus one shared random bit per single degree of freedom in the classical case. Recall that we observed the same factor-of-two difference in the case of the amount of information needed to specify a quantum versus a classical probabilistic state within the exponential state space.

Thus, given the existence of a classical equivalent, quantum teleportation should not seem too surprising. At least, no more than by a factor of two!

p.s. As Matthew Leifer has pointed out to me, these and many other analogies between quantum entanglement and secret classical correlations have been described in the paper “A classical analogue of entanglement” by Daniel Collins and Sandu Popescu.

# Three myths about quantum computing — Part 2: No-cloning theorem

This is the second post in the series on myths about quantum computing.

One of the first things we learn about quantum information is that it cannot be copied. This may seem rather surprising and counter-intuitive, since classical information stored on your computer can be easily copied (in fact, it is extremely challenging to come up with methods that would prevent this). One might even conclude that there must be something special about quantum information that does not allow us to copy it.

My goal in this post is to argue that this is not the case. The main point is that classical probability distributions cannot be copied either. This observation should make the quantum no-cloning theorem seem less surprising.

Quantum no-cloning theorem

Let us consider the standard argument why quantum states cannot be copied—the main idea is that unitary transformations have to preserve inner products.

Theorem 1 (Quantum no-cloning theorem). Transformation that acts as

$|0\rangle \otimes |\psi\rangle \mapsto |\psi\rangle \otimes |\psi\rangle$

for any quantum state $|\psi\rangle$, is not unitary.

Proof. Let $|\psi_1\rangle$ and $|\psi_2\rangle$ be two quantum states, and assume that U is a unitary operation that implements the desired transformation. Then

$U \bigl( |0\rangle \otimes |\psi_1\rangle \bigr) = |\psi_1\rangle \otimes |\psi_1\rangle$

$U \bigl( |0\rangle \otimes |\psi_2\rangle \bigr) = |\psi_2\rangle \otimes |\psi_2\rangle$

If we take the inner product of the corresponding sides of these two equations, we get

$\langle\psi_1|\psi_2\rangle = \langle\psi_1|\psi_2\rangle^2$

where we used the assumption that U is unitary. When we solve this, we get that either $\langle\psi_1|\psi_2\rangle = 0$ or $\langle\psi_1|\psi_2\rangle = 1$. This means that using a unitary transformation we can only copy states from an orthonormal set. However, the set of all quantum states is not orthonormal, so there is no unitary transformation that would copy an arbitrary unknown quantum state. $\blacksquare$

Classical no-cloning theorem

Recall from the previous post that quantum states and classical probability distributions are not that much different. Let us see if we can come up with a no-cloning argument that would also work in the classical case.

First, observe that independent probability distributions are combined in exact same way as quantum states—using the Kronecker product. (There is nothing “quantum” about the $\otimes$ operation!) In probabilistic classical computing the set of allowed transformations are those that map probability distributions to probability distributions. Such transformations are called stochastic. In general, they do not preserve inner products, so our previous argument will not go through.

Let us consider an alternative argument, based on linearity. For simplicity, let us consider only probability distributions over {0,1}.

Theorem 2 (Classical no-cloning theorem). Transformation that acts as

$\begin{pmatrix} 1 \\ 0 \end{pmatrix} \otimes \pi \mapsto \pi \otimes \pi$

for any probability distribution $\pi$, is not stochastic.

Proof. Let

$\pi = \begin{pmatrix} p \\ q \end{pmatrix}$

be an arbitrary probability distribution over {0,1}. We would like to have a stochastic operation that implements the transformation

$\begin{pmatrix} p \\ q \\ 0 \\ 0 \end{pmatrix} \mapsto \begin{pmatrix} p^2 \\ pq \\ qp \\ q^2 \end{pmatrix}$

for any p and q. However, this is not possible, since the above transformation is clearly not linear. $\blacksquare$

Note that the only property we needed in this proof was linearity, so one can use exactly the same argument also for the quantum case.

# Three myths about quantum computing — Part 1: Exponential size state space

This is the first post in the series on myths about quantum computing. In this post I will discuss the first misconception about quantum computers—that the exponential size of their state space is the main feature that distinguishes them from their classical cousins—and explain why this should not be used as an argument to suggest that quantum computers are likely to be more powerful than the classical ones.

Consider a system S with states labelled by n-bit strings. Assume that S is initialized in all-zeroes state “00…0” and undergoes some kind of physical evolution. Let us consider three scenarios.

Deterministic classical world

If the evolution is deterministic, then one can specify the current state of the system by giving the corresponding label. Thus, one needs n bits of information to describe the state in this case.

Quantum world

If S is a quantum system, it can be in a superposition of all possible states. To describe this superposition, one has to assign a complex number to each state. Thus, in total one has to specify 2n complex numbers (say, to some finite precision), which is exponentially more information than in the deterministic case!

If you have never heard of this before (or remember your excitement when you heard it the first time), this definitely is mind-boggling. In fact, this is often mentioned as the main obstacle for efficiently simulating quantum computers, and also as a hand-waving argument to convince wider public that quantum computers are likely to be exponentially more powerful than classical ones.

Probabilistic classical world

One should not get too excited about the above observation, but instead think more carefully whether this comparison of deterministic and quantum computers is fair. Since the output of a quantum computer is intrinsically probabilistic, one should be looking at probabilistic evolution in the classical case, not deterministic.

If the transitions between states of the system S are described by probabilities, then the resulting state is a probability distribution over {0,1}n. To describe such distribution, one needs to specify 2n real numbers—one for each n-bit string. This is also exponentially more information than in the deterministic case!

Thus, with this comparison the only difference in the amount of information needed to describe classical and quantum states is a factor of two (since probability can be specified by a single real number whereas two real numbers are needed to specify one complex quantum amplitude).

Conclusion

The fact that an exponential amount of information is needed to describe the state space of a particular computational model does not immediately imply that it is exponentially more powerful. To be more specific, let us consider the classes of decision problems that can be solved in polynomial time by the three computational models discussed above:

The following inclusions are trivial:

$\mathrm{P} \subseteq \mathrm{BPP} \subseteq \mathrm{BQP}$

However, as of now, none of these inclusions is known to be strict.

Thus, one should not use the exponential size of the state space as an argument to claim that quantum computers are likely to be hard to simulate by deterministic classical computers, since the same argument could also be used in favor of probabilistic classical computers.

To me it seems that the main difference between probabilistic and quantum computers is the underlying field of numbers (non-negative real numbers versus complex numbers). It is a major open problem to show if this subtle difference can be used to separate the corresponding two complexity classes.

# Three myths about quantum computing

In this series of posts I will discuss some misconceptions about quantum computing.

One often hears people saying that quantum physics is “strange” and that our classical intuition “breaks down” in the quantum world. I will try to argue that one should not brag about the weirdness of quantum mechanics without carefully investigating the corresponding phenomena in the classical world. In fact, many quantum phenomena no longer seem surprising once their lesser cousins in the classical probabilistic world are understood.

To illustrate my point, I will consider three basic properties of quantum information

and discuss how they translate to the classical case.

These topics are usually covered in the first class of an introductory quantum computing course, so none of what I discuss should come as a surprise to somebody who has taken such a course or read an introductory textbook on the topic. However, I have to admit that I was equally shocked both, when I learned these basic facts about quantum information in class, as when I later realized that they have a corresponding classical counterpart.

I would like to acknowledge all my “non-quantum” friends with whom I have had conversations about quantum physics. I would probably still believe in these misconceptions if I had not tried to explain you in simple terms why quantum physics is supposed to be so strange. Maybe it is not so strange after all…

p.s. For those of you who have never attempted to explain what you are doing to somebody who does not have a PhD degree in your area of expertise, consider the following three quotes by the founders of quantum mechanics:

Most of the fundamental ideas of science are essentially simple, and may, as a rule, be expressed in a language comprehensible to everyone.
—Albert Einstein

The physicist may be satisfied when he has the mathematical scheme and knows how to use it for the interpretation of the experiments. But he has to speak about his results also to non-physicists who will not be satisfied unless some explanation is given in plain language. Even for the physicist, the description in plain language will be the criterion of the degree of understanding that has been reached.
—Werner Heisenberg

If you cannot—in the long run—tell everyone what you have been doing, your doing has been worthless.
—Erwin Schrödinger

# Unitary equivalence of purifications

Purifications

Let $\rho$ be a mixed quantum state on system $\mathcal{A}$. We say that a pure quantum state $|\psi\rangle$ on a composite system $\mathcal{A} \otimes \mathcal{B}$ is a purification of $\rho$ if

$\rho = \mathrm{Tr}_B (|\psi\rangle \langle\psi|)$

The following theorem is a basic result in quantum information theory.

Theorem 1 (unitary equivalence of purifications). If $|\psi\rangle$ and $|\phi\rangle$ are two purifications of $\rho$ then

$|\phi\rangle = (I_A \otimes U) |\psi\rangle$

for some unitary $U$ that acts only on system $\mathcal{B}$.

Schmidt decomposition

The above result is usually proved using the following theorem.

Theorem 2 (Schmidt decomposition). Any bipartite quantum state $|\psi\rangle$ on $\mathcal{A} \otimes \mathcal{B}$ can be written as

$|\psi\rangle = \sum_i \sqrt{p_i} |\alpha_i\rangle |\beta_i\rangle$

for some orthonormal bases $\{|\alpha_i\rangle\}$ and $\{|\beta_i\rangle\}$ on systems $A$ and $B$, respectively, and a probability distribution $\{p_i\}$.

However, standard proofs of Theorem 1 are often based on the following (wrong) argument: given two spectral decompositions of the same matrix, the corresponding eigenvalues and eigenvectors in both decompositions must be equal. This is true only if the given matrix has a non-degenerate spectrum. Otherwise the eigenvectors corresponding to a repeated eigenvalue are not uniquely determined (any orthonormal basis of the corresponding subspace is a valid set of eigenvectors).

A simple proof of Theorem 1

Here is an elementary proof of Theorem 1 which does not run into problems in case of a degenerate spectrum. By Theorem 2 we can write the first state as

$|\psi\rangle = \sum_i \sqrt{p_i} |\alpha_i\rangle |\beta_i\rangle$

Let us expand the first register of the second state $|\phi\rangle$ in the basis $\{|\alpha_i\rangle\}$:

$|\phi\rangle = \sum_i |\alpha_i\rangle |\tilde{\gamma}_i\rangle$

where $\{|\tilde{\gamma}_i\rangle\}$ are some arbitrary (non-normalized) vectors. Then we have

$\mathrm{Tr}_B (|\psi\rangle \langle\psi|) = \sum_{i,j} \sqrt{p_i p_j} |\alpha_i\rangle \langle\alpha_j| \mathrm{Tr} (|\beta_i\rangle \langle\beta_j|)$

$= \sum_{i,j} \sqrt{p_i p_j} |\alpha_i\rangle \langle\alpha_j| \delta_{ij}$

and

$\mathrm{Tr}_B (|\phi\rangle \langle\phi|) = \sum_{i,j} |\alpha_i\rangle \langle\alpha_j| \mathrm{Tr} (|\tilde{\gamma}_i\rangle \langle\tilde{\gamma}_j|)$

$= \sum_{i,j} |\alpha_i\rangle \langle\alpha_j| \langle\tilde{\gamma}_j|\tilde{\gamma}_i\rangle$

By assumption, both partial traces are equal. Since $\{|\alpha_i\rangle\}$ is an orthonormal basis, the coefficients in both expressions must be the same:

$\sqrt{p_i p_j} \delta_{ij} = \langle\tilde{\gamma}_j|\tilde{\gamma}_i\rangle$

This means that $|\tilde{\gamma}_i\rangle$ are pairwise orthogonal and $\||\tilde{\gamma}_i\rangle\| = \sqrt{p_i}$ (it could happen that $p_i = 0$ for some $i$, but this is not a problem). Equivalently, this equation says that we can find an orthonormal basis $\{|\gamma_i\rangle\}$ such that $|\tilde{\gamma}_i\rangle = \sqrt{p_i} |\gamma_i\rangle$. Then we see that $|\phi\rangle = (I \otimes U) |\psi\rangle$, where $U = \sum_i |\gamma_i\rangle \langle\beta_i|$ is the unitary change of basis from $\{|\beta_i\rangle\}$ to $\{|\gamma_i\rangle\}$.