A cute trace inequality

Theorem. If A and B are positive semi-definite matrices then

\mathrm{Tr} (AB)^2 \leq \mathrm{Tr} (A^2 B^2)

Note that both sides contain the same matrices the same number of times, just in a different order (ABAB versus AABB). Here is a cute proof due to Bellman.

Proof. First, note that

(AB-BA)^2 = (AB)^2 + (BA)^2 - A B^2 A - B A^2 B.

Next, note that the commutator AB-BA is skew-Hermitian, that is, (AB-BA)^\dagger = -(AB-BA). In other words, AB-BA = iH for some Hermitian matrix H. This means that all eigenvalues of AB-BA are purely imaginary. When we square the matrix, they become negative, i.e., (AB-BA)^2 is negative semi-definite. From this we get

0 \geq \mathrm{Tr} (AB-BA)^2    = \mathrm{Tr} \Bigl[ (AB)^2 + (BA)^2 - A B^2 A - B A^2 B \Bigr].

Using the cyclic property of the trace, this simplifies to

0 \geq 2 \mathrm{Tr}(AB)^2 - 2 \mathrm{Tr} (A^2 B^2)

and we are done.

Here is an interesting open question.

Christmas Conjecture. Let A and B be n \times n density matrices with fixed spectrum: the eigenvalues of A are \alpha_1 \geq \alpha_2 \geq \dots \geq \alpha_n while the eigenvalues of B are \beta_1 \geq \beta_2 \geq \dots \geq \beta_n. Then

\max_{A,B} \mathrm{Tr} \Bigl[ AB + A^2 B^2 - (AB)^2 \Bigr]  = \sum_{i=1}^n \alpha_i \beta_i.

where the maximum is over all A,B with the given spectrum.

If the last two terms A^2 B^2 - (AB)^2 would not be there, then showing this would be quite straightforward. Indeed, the left-hand side becomes simply

\max_P \sum_{i,j=1}^n \alpha_i P_{ij} \beta_j

where the maximum is over doubly-stochastic matrices P. By the Birkhoff–von Neumann theorem (see Theorem 13.1 in John Watrous’ lecture notes), the maximal value is achieved when P is a permutation matrix. In fact, by the rearrangement theorem, P must be the identity matrix, in which case the two lists of eigenvalues are sorted in the same order.

However, we are dealing with a slightly more complicated expression in the conjecture. As we just saw above, \mathrm{Tr} \bigl[ A^2 B^2 - (AB)^2 \bigr] \geq 0, so the last two terms can only increase the left-hand side and thus make the inequality harder to prove. Nevertheless, I conjecture that this stronger inequality still holds, and I can prove it for n = 2.

Update: I recently received an e-mail from Jianxin Chen containing a proof of the Christmas Conjecture!


Quantum walks can find a marked element on any graph

This post is about my paper Quantum walks can find a marked element on any graph with Hari Krovi, Frédéric Magniez, and Jérémie Roland. We wrote it in 2010, but after spotting a subtle mistake in the original version, we have recently substantially revised it. It went from 15 to 50 pages after we added much more details and background material, as well as corrected some small bugs and addressed one major bug.

Problem statement

Imagine a huge graph with many vertices, some of which are marked. You are able to move around this graph and query one vertex at a time to figure out if it is marked or not. Your goal is to find any of the marked vertices.

Given an instance of such problem, a typical way to solve it is by setting up a random walk on the graph. You can imagine some probabilistic procedure that systematically moves from one vertex to another and looks for a marked one.

More formally, you would define a stochastic matrix P whose entry P_{xy} describes the probability to move from vertex x to y. Starting from some randomly chosen initial vertex, your goal is to end up in one of the marked vertices in the set M.

Random walk on a graph

Our result

We show that any classical algorithm that is based on such random walk can be turned into a quantum walk algorithm that finds a marked vertex quadratically faster. To state this more formally, let us define the hitting time of a random walk:

Definition. The hitting time of P with respect to a set of marked vertices M is the expected number of steps the walk P takes to find a marked vertex, starting from a random unmarked vertex chosen from a distribution which is proportional to the stationary distribution of P. Let us denote this quantity by \mathrm{HT}(P,M).

Our main result is as follows:

Theorem. Quantum walk can find a marked vertex in \sqrt{\mathrm{HT}^+(P,M)} steps.

As you can see, this is a very general result—no matter how cleverly the transition probabilities for the random walk P are chosen, we can always cook up a related quantum walk that beats the classical walk quadraticaly!

Note that this is much more general than what is achieved by Grover’s algorithm. Grover’s algorithm corresponds to the special case, where the underlying graph is complete and the random walk moves from any vertex to any other with equal probability.

It’s weaker than before…

You may have noticed a little “+” that appeared next to \mathrm{HT} in the statement of the theorem. Indeed, this has to do with the subtle mistake we spotted in our earlier proof. It turns out that if |M| = 1 (i.e., there is a unique marked vertex) then

\mathrm{HT}^+(P,M) = \mathrm{HT}(P,M)

and we indeed achieve a quadratic quantum speedup. Unfortunately, when |M| > 1 it can happen that

\mathrm{HT}^+(P,M) > \mathrm{HT}(P,M)

and thus we don’t get a fully quadratic speedup. Hence our result is weaker than what we claimed previously. On the other hand, the new version is more correct!

Proof idea

Our algorithm is based on Szegedy’s method for turning random walks into quantum walks. It constructs two unitary matrices, each corresponding to a reflection with respect to a certain subspace. These subspaces are defined by

  1. the standard basis vectors corresponding to marked vertices,
  2. the unit vectors obtained by taking entry-wise square roots of the rows of P.

Together these two reflections define one step of the quantum walk.

Our contribution consists in modifying the original walk P before we quantize it. We define a semi-absorbing walk P(s) that leaves a marked vertex with probability 1-s even when one is found. This might seem like a bad idea, but one can check that at least classically it does not make things worse by too much. In fact, the s \to 1 limit of P(s) corresponds to the classical algorithm that never leaves a marked vertex once it is found.

Glitch in the previous proof

Our proof makes extensive use of a certain quantity \mathrm{HT}(s) which we associate to P(s) and call interpolated hitting time. Then \mathrm{HT}^+(P,M), the extended hitting time that appears in the above theorem, is defined as the limit

\mathrm{HT}^+(P,M) := \lim_{s \to 1} \mathrm{HT}(s)

When |M| = 1, taking this limit is straightforward and one can easily see that it gives \mathrm{HT}(P,M), the regular hitting time.

It is tempting to guess that the same happens also when |M| > 1. Indeed, it is far from obvious why in this case the answer does not come out the same way as in the |M| = 1 case. This is exactly what was overlooked in the earlier version of our paper. Computing the limit properly when |M| > 1 is much harder (the expression contains inverse of some matrix that is singular at s = 1). This is done in detail in the final appendix of our paper.

Open problems

Here are some open questions:

  • Why is it that our algorithm has a harder time to find a needle in a haystack when there are several needles rather than just one?
  • What is the operational interpretation of the interpolated hitting time \mathrm{HT}(s)?
  • Can quadratic speedup for finding be achieved also when there are multiple marked vertices?
  • How can we efficiently prepare the initial state on which the walk is applied?

One might get some insight in the first two questions by observing that our algorithm actually solves a slightly harder problem than just finding a marked vertex—it samples the marked vertices according to a specific distribution (proportional to P(s^*) for some s^*). When there is only one marked vertex, finding it is the same as sampling it. However, for multiple marked vertices this is equivalence does not hold and in general it should be harder to sample.

Philip Glass: Opening and Mad Rush

As the subtitle of this blog suggests, it is supposed to be about math, music, and me. So far, I have been writing mostly about math (which is indeed my main passion). However, I also have a great appreciation for music—especially for contemporary classical music. So far, music has been the main topic only of one blog post. To revive this theme, I am dedicating this blog post to my favorite composer—Philip Glass—and two of his piano pieces: Opening and Mad Rush. I also included recordings of me performing them (scroll down to listen).

Philip Glass

Philip Glass is an American composer who has spent most of his life in New York City and is still living there. He describes himself as a composer of “music with repetitive structures” (which, I think, is the most accurate description one could possibly come up with) and is credited as one of the founders of the minimalism movement in music.

Philip Glass performing in Milan

Philip Glass performing in Milan

I came to know his music during a summer I spent in Princeton, where I shared house with several young composers studying at Princeton University (Troy Herion and Mark Dancingers were among them). I remember one evening they had a very heated debate about the music of Philip Glass and whether it should be considered music at all. After all, his music is so repetitive, he is not ashamed of copying himself, and some of his pieces consist of simply counting 1,2,3,4,… (if you start listening to this particular piece, I recommend you listen to it till the end).

At the time I did not know who Philip Glass is, so I decided to learn more about his music. I came to really love it and now consider him my favorite composer. I appreciate his music both emotionally and intellectually. In fact, I plan to have the 1,2,3,4,… piece (called Knee 5) performed at my wedding.

Māris playing piano

Māris performing Knee 5 at his engagement
blessing in Elevation Church, Waterloo

As a mathematician, I also really enjoy his unusual rhythmic structures, which I will briefly discuss in this post.


Opening is a piece for piano and is named so because it is the opening piece of Glassworks—an album that introduces his music to a general audience (it serves this purpose very well and I highly recommend it to those who are not familiar with his music). This piece does not have a clearly distinguishable melody. Instead, the melody arises from simple repetitive patterns that overlap and intertwine with each other. You can listen my rendition of this piece here (including a wrong key I hit in the beginning):

The piece is based on a simple periodic pattern: the right hand does three equally spaced beats while the left hand does only two within the same period of time. To understand how this works, you can try to tap the following rhythm with your hands—repeat it several times and count 1-2-3-4-5-6 while you are tapping (you can also say the phrase “not dif-fi-cult” or “cold cup of tea” instead):

1 2 3 4 5 6
Right hand: X · X · X ·
Left hand: X · · X · ·

Once you have mastered this, here is the next exercise! Use your thumb and middle finger of both hands to tap this rhythm (again, count 1-2-3-4-5-6 out loud while you do it):

1 2 3 4 5 6
Right middle finger: · · X · · · X · · · X ·
Right thumb: X · · · X · · · X · · ·
Left thumb: · · · X · · · · · X · ·
Left middle finger: X · · · · · X · · · · ·

What you end up doing is simply oscillate the two fingers of both hands at different speeds! It might help if you note that some taps are in-between the counts, and that on 1 and 4 both hands synchronize (on 1 you tap the right thumb and the left middle finger at the same time, but on 4 you tap both middle fingers simultaneously).

It is quite tricky to master this, but once you get it, you are ready to play the Opening! Indeed, you just have to put your hands on piano and repeat the same pattern on different keys and with different fingers. This is how the first line of the score looks like:

Opening by Philip Glass

This is the simplest example of what is known as polyrhythm—when melodies with different meter are played at the same time. This particular polyrhythm (two beats with one hand and three with the other) is a trademark of Philip Glass—it appears in many of his piano pieces (including Mad Rush discussed below).

Glass’s understanding or rhythm and his distinctive musical style was strongly influenced by Indian classical music during his collaborations with the famous sitar player Ravi Shankar and tabla player Alla Rakha. Glass was also deeply impressed by Steve Reich’s Piano Phase when he first heard its performance. Given all this, it is no surprise that polyrhythmic structures are so prevalent in Glass’s music.

Mad Rush

Philip Glass wrote this piece on the occasion of the first public appearance of the 14th Dalai Lama in New York City in 1979. At the time it was not clear exactly how long the wait before his arrival would be, so Glass was asked to compose a piece of somewhat indefinite length (which is not a problem for him…). Here is my rendition of this piece:

As you are listening to it, you can clearly notice that it alternates between two themes (you can even see this in the wave diagram above!). One theme is peaceful and meditative, but the other is fast and a bit frantic. Glass himself explained that the two themes represent the play of the wrathful and peaceful deities in Tibetan Buddhism.

Glass went to India in the 60s and came in contact with Tibetan refugees. During this time he started to gravitate towards Buddhism. Practicing Buddhism and meditation has become an important part of his life. He is also known to be a strong supporter of Tibetan independence.

Let’s take a look at the sheet music again. You can immediately recognize Glass’s signature in the peaceful theme—it is also based on the 2:3 polyrhythm that we are already familiar with (in fact, if you have learned how to play Opening, then playing this part is relatively straightforward):

Peaceful theme of Mad Rush

The wrathful theme, however, consists of very fast arpeggios that are played symmetrically with both hands:

Wrathful theme of Mad Rush

This differs from the peaceful theme not only in tempo, but also in both hands being completely synchronized (which is not the case at all when you are playing a polyrhythm). In fact, they are so synchronized that you even use the same finger with both hands on all notes!

To understand what difference this makes, you can do a simple experiment: try to play the above arpeggios (or just tap your fingers) so that the melody goes up/down for both hands at the same time. Alternatively, you can just play the top line with both hands simultaneously. While both hands are still synchronized, in this case you will always be using opposite fingers (e.g., when the right hand uses the thumb, the left hand uses the little finger). You may notice that this is slightly harder than the original mirror symmetric arpeggios! Indeed, it turns out that human motor system has a preference for mirror symmetric movements (especially at high speeds).

In any case, this part of the piece reminds me of New York City and it represents the title Mad Rush very well. If you listen carefully, you might even notice that the last two arpeggios break the rhythm (they include two extra notes). This is how the score looks like (notice also the unusual 14/16 time signature):

Wrathful theme of Mad Rush (last part)

When I listen to or play these two bars, it almost feels as if my heart skips (or does an extra!) beat, because things get totally out of sync. It’s like running while carrying a big box and almost tripping over. When this tipping point is reached, the piece stops and goes back to the peaceful part again…

More Glass?

I hope you enjoyed reading this post and listening to the music. If you are interested to find out more about the music of Philip Glass, you can start with Glassworks. In fact, you might already have heard some of his music, since he has written lots of music for films (e.g., The Hours). If you like piano music, you can also check out Metamorphosis. I highly recommend his Qatsi trilogy: Koyaanisqatsi, Powaqqatsi, and Naqoyqatsi (all of which should be watched as movies if possible). If you are more adventurous, you can even try to listen to all of Einstein on the Beach in a single sitting (which I did a few years ago when I saw this opera in Toronto).


I am grateful to Steve Tulloch for giving me the opportunity to perform these two pieces at The View From Here and to Steve Errey for recording my performance. You can check out Steve Errey’s Studio S page on SoundCloud for more recordings of other talented young artists. Finally, I also want to thank Mohammad and Jesslyn for the photos, and Amandine for her feedback and the Scholarpedia reference on motor coordination.

Māris performing at Elevation Church in Waterloo

Māris performing at Elevation Church in Waterloo

Exact quantum query algorithms

Andris Ambainis, Jānis Iraids, and Juris Smotrovs recently have obtained some interesting quantum query algorithms [AIS13]. In this blog post I will explain my understanding of their result.

Throughout the post I will consider a specific type of quantum query algorithms which I will refer to as MCQ algorithms (the origin of this name will become clear shortly). They have the following two defining features:

  • they are exact (i.e., find answer with certainty)
  • they measure after each query

Quantum effects in an MCQ algorithm can take place only for a very short time — during the query. After the query the state is measured and becomes classical. Thus, answers obtained from two different queries do not interfere quantumly. This is very similar to deterministic classical algorithms that also find answer with certainty and whose state is deterministic after each query.

Basics of quantum query complexity

Our goal is to evaluate some (total) Boolean function f(x) on an unknown input string x \in \{1,-1\}^n (we assume for convenience that binary variables take values +1 and -1). We can access x only by applying oracle matrix

Q_x =    \begin{pmatrix}      x_1 & & & \\      & x_2 & & \\      & & \ddots & \\      & & & x_n    \end{pmatrix}

to some quantum state. The minimum number of queries needed to determine the value of f(x) with certainty is called the exact quantum query complexity of f.

Quantum questions with classical answers

Each interaction with oracle in an MCQ algorithm can be described as follows:

  1. prepare some state |\psi\rangle
  2. apply query matrix Q_x
  3. apply some unitary U
  4. measure in the standard basis

Intuitively, this interaction is a quantum question (specified by |\psi\rangle and U) which produces a classical answer (measurement outcome i that appears with probability |\langle i|UQ_x|\psi\rangle|^2).

Since each of the answers reveal some property of x, it is convenient to identify the collection of these properties with the question itself (I think of it as a “quantum Multiple Choice Question”, hence MCQ). Of course, not every collection of properties constitutes a valid quantum question — only those for which there exists a corresponding |\psi\rangle and U. (We will see some examples soon.)

MCQ algorithms

Simply put, an MCQ algorithm is a decision tree: each its leaf contains either 0 or 1 (the value of f(x) for corresponding x), and each of the remaining nodes contains a quantum question and children correspond to answers.

Classical deterministic decision trees are very similar to MCQ algorithms, except that their nodes contain classical questions — at each node we can only ask one of the n variables x_i. Quantumly, we have a larger variety of questions — for example, we can ask XOR of two variables (as in Deutsch’s algorithm). Another difference is that a quantum question does not have a unique answer: if several answers are consistent with the input string x, we will get one of them at random.

An obvious question regarding MCQ algorithms is this:

How can we exploit the quantum oracle to find f(x) with less queries than classically?

Surprisingly, until recently essentially no other way of exploiting the quantum oracle was known, other than Deutsch’s XOR trick (see [MJM11] by Ashley Montanaro, Richard Jozsa, and Graeme Mitchison for more details). What is interesting about the [AIS13] paper is that it provides a new trick!

Query, measure, recurse!

All algorithms discussed in [AIS13] are MCQ and recursive. They proceed as follows:

  1. query
  2. measure
  3. recurse

In the last step, depending on the measurement outcome, either f(x) is found or the problem is reduced to a smaller instance and we proceed recursively. Let me explain how this works for two functions which I will call BALANCED and MAJORITY (they are special cases of EXACT and THRESHOLD discussed in [AIS13]).


\mathrm{BALANCED}_{2k} (x_1, x_2, \dotsc, x_{2k}) = 1 iff exactly k of the variables x_i are equal to 1. An MCQ algorithm asks a quantum question that can reveal the following properties of the input string x:

  1. x is not balanced (the number of +1s and -1s is not equal)
  2. x_i is not equal to x_j (for some i \neq j)

If we get the first answer then \mathrm{BALANCED}_{2k}(x) = 0 and we are done. If we get the second answer for some i \neq j, we can ignore the variables x_i and x_j and recursively evaluate \mathrm{BALANCED}_{2(k-1)} on the remaining variables. In total, we need at most k queries (which can be shown to be optimal).

It remains to argue that the above is a valid quantum question. Alternatively, we can show how to prepare the following (unnormalized) quantum state:

\sum_{i=1}^{2k} x_i |0\rangle + \sum_{i<j} (x_i - x_j) |ij\rangle

(It lives in the space spanned by |0\rangle and |ij\rangle for all i<j.) This can be easily done by taking

|\psi\rangle = \sum_{i=0}^{2k} |i\rangle

and U that acts as

U |i\rangle = \frac{1}{\sqrt{2k}}    \Bigl( |0\rangle + \sum_{j>i} |ij\rangle - \sum_{j<i} |ji\rangle \Bigr)

To check that U is a valid isometry, notice that it maps |i_1\rangle and |i_2\rangle to paths that overlap in exactly one cell (rows are labeled by i and columns by j):
Since both states also have |0\rangle in common, they are orthogonal.


\mathrm{MAJORITY}_{2k+1} (x_1, x_2, \dotsc, x_{2k+1}) = 1 iff at least k+1 of the variables x_i are equal to 1. This time the quantum question has answers

  1. x is not balanced when x_i is omitted (for some i)
  2. x_i is not equal to x_j (for some i \neq j)

If we get the first answer for some i, we omit x_i and any other variable. If we get the second answer for some i \neq j, we ignore the variables x_i and x_j. In both cases we proceed by recursively evaluating \mathrm{MAJORITY}_{2k-1} on the remaining variables. When only one variable is left, we query it to determine the answer. This requires at most k+1 queries in total (which again can be shown to be optimal).

The corresponding (unnormalized) state in this case is

\sum_{i=1}^{2k+1} \sum_{j \neq i} x_i |j\rangle    + \sqrt{2k-1} \sum_{i<j} (x_i - x_j) |ij\rangle

(It lives in the space spanned by |j\rangle for all j and |ij\rangle for all i<j.) It can be obtained by choosing

|\psi\rangle = \sum_{i=1}^{2k+1} |i\rangle

and U acting as

U |i\rangle = \frac{1}{2k}    \Bigl(      \sum_{j \neq i} |j\rangle      + \sqrt{2k-1} \sum_{j>i} |ij\rangle      - \sqrt{2k-1} \sum_{j<i} |ji\rangle    \Bigr)

One can check that U is an isometry using a similar picture as above.

Open questions

The problem of finding a quantum query algorithm with a given number of queries and a given success probability can be formulated as a semi-definite program. This was shown by Howard Barnum, Michael Saks, and Mario Szegedy in [BSS03] and can be used to obtain exact quantum query algorithms numerically. Unfortunately, this approach does not necessarily give any insight of why and how the obtained algorithm works. Nevertheless, it would be interesting to know if there is a similar simple characterization of MCQ algorithms.

The algorithms from [AIS13] described above are relatively simple. However, that does not mean that they were simple to find. In fact, the SDP corresponding to \mathrm{BALANCED}_4 had already been solved numerically in [MJM11]. Unfortunately, it did not provide enough insight to obtain an algorithm for \mathrm{BALANCED}_{2k} for any k. A similar situation is now with \mathrm{EXACT}^6_{2,4} (which it is true if exactly two or four out of the six variables are true). From [MJM11] we know that it has an exact 3-query algorithm. Unfortunately, we do not have enough understanding to describe it in a simple way or generalize it. Besides, I wonder if \mathrm{EXACT}^6_{2,4} has a 3-query MCQ algorithm, or do we actually need interference between the queries to find the answer so fast?

Finally, it would be interesting to know if there is any connection between exact quantum query algorithms and non-local games or Kochen–Specker type theorems.

SIC-POVM sickness

I contracted the SIC-POVM sickness a month ago after reading a paper by Gelo Tabia and Marcus Appleby who explore the qutrit case in great detail. I got over it in a week or so just to contract it again today when a paper by Gilad Gour came out. It has a very promising title that ends with “…symmetric informationally complete measurements exist in all dimensions”. Unfortunately, it begins with “General…”. Let’s see what the paper is about. Update: Chris Ferrie has pointed out to me that a similar result has already been obtained by Marcus Appleby (see Sect. 4 of this paper).


The geometric intuition of this result can be summarized as follows:

  1. Choose an arbitrary orthonormal basis for the linear space of all d \times d Hermitian matrices.
  2. Construct a simplex out of these basis vectors in some specific way (see below).
  3. Shrink it sufficiently small so that it fits inside the positive-semidefinite cone.

The last step always works, because the convex body formed by all density matrices contains a ball around the maximally mixed state. The required amount of shrinking is determined by parameter t in Theorem 1 (it is also related via Eq. (7) to parameter a that measures the “purity” of the resulting SIC-POVM).

Details and example

The second step can be described more precisely as follows. Let \{f_1, \dotsc, f_{d^2-1}\} be an arbitrary orthonormal basis of \mathbb{R}^{d^2-1} (think of it as the space of all d \times d traceless Hermitian matrices). Then a regular simplex with vertices p_1, \dotsc, p_{d^2} can be obtained as follows:

p_i := f - d(d+1) f_i

p_{d^2} := (d+1) f

where i = 1, \dotsc, d^2-1 and f is the sum of all f_i (these expressions correspond to Eq. (5) in the paper). For example, if d=2 and f_i := e_i is the i-th standard basis vector, then vectors p_i are the rows of the following matrix:

P =    \begin{pmatrix}      -5 &  1 &  1 \\       1 & -5 &  1 \\       1 &  1 & -5 \\       3 &  3 &  3    \end{pmatrix}

They look like this in 3D (the orange arrows):
Note that

P \cdot P^{\mathsf{T}}=    \begin{pmatrix}      27 & -9 & -9 & -9 \\      -9 & 27 & -9 & -9 \\      -9 & -9 & 27 & -9 \\      -9 & -9 & -9 & 27    \end{pmatrix}

hence vectors p_1, p_2, p_3, p_4 indeed form a simplex in \mathbb{R}^3. In fact, this construction works in any dimension — there is nothing special about it being of the form d^2-1.

Open questions

The hard question (which is still open) is how to choose the basis f_i so that you need to shrink the simplex as little as possible. In other words, you want the matrices associated to p_i to have rank one. All we know about this case is that the matrices F_i associated to the basis vectors f_i must have certain eigenvalues given by Eq. (11) in the paper. How close do we get to this if F_i are generalized Pauli matrices or Haar random?

It would be interesting to know if anything extra can be said about the matrices F_i if the SIC-POVM obeys Weyl-Heisenberg symmetry (in prime dimensions this is without loss of generality due to Huangjun Zhu).

Ctrl+Z for Mathematica!

As embarrassing as it may be, the world’s most sophisticated computer algebra system Mathematica lacks a proper undo feature!


It took over 20 years and millions of users until eventually Reema Al-Aifari, a mathematician from Austria, stood up and demanded that Ctrl+Z is finally properly implemented! If you are among these millions, you probably know exactly what I’m talking about, so please consider signing her petition!

My experience

The rest of this post is a rant about my own experience with Mathematica. I have been using it for 10 years almost daily starting from version 5.0 onward. I think it is a great tool and I have to admit that nowadays I program almost exclusively in Mathematica. The reason is that typically, for the kind of problems I need to solve,

writing the program takes more time than running it.

And even if I have to wait a bit for the answer, it is still OK, since often I need to run the program only once. In other words, the strength of Mathematica is the ability to develop code fast. However, it’s not all just wonderful and pink… (or orange?)


Throughout these years I have acquired a habit of working with Mathematica as if there would not be any undo feature at all. In fact, even if you try to use undo, there is no way to know exactly how much of your work would be undone (by the only precious undo you have!), so why bother — just make a copy of the cell you’re about to edit before you do anything you might regret. Another option, of course, is to follow the official documentation of the undo feature and just close the file (without saving the changes!!!) and then reopen whatever you had when you last saved! Now, how ridiculous is that?


Of course, we’re not in 1988 anymore, so some progress has been made. For example, you might have noticed that now you can not only use arrow keys to select more text while holding shift, but also less (in case you selected too much). Why did it take 20 years to implement that?

Bugs (What’s that?)

The thing that annoys me the most is that, by definition,

Mathematica has no bugs.

It only has features! When a new version is released, all the great new features are listed, accompanied by beautiful colorful pictures. But where is the list of bugs that were (or weren’t) fixed? How do I know if a certain function has a known bug and whether it has been fixed in the new release or not? All that is mentioned are some “improvements” and “stability enhancements”, but no details are given.

I understand that Mathematica is not an open source project, but I don’t see what is the advantage of not making the list of known bugs public. I don’t think that it would harm Mathematica‘s reputation. On the contrary, it would be easier to avoid pitfalls that others have fallen into and thus get more reliable answers to your computation. After all, this program is for doing mathematics and therefore it is important to know when it gives correct answers and when it doesn’t.

I believe that most Mathematica users genuinely want the product to be better and some even report the bugs they have found. At least I did. But I kind of gave up on it, since it did not really seem like Wolfram appreciates any feedback from users (maybe things have improved more recently). This is in sharp contrast with the situation in the web browsers market, where a special hacking contest called Pwn2Own is organized, and researchers can actually get significant monetary prizes for finding security flaws in web browsers such as IE, Chrome or Firefox. Finding bugs in complicated software such as Mathematica is also highly nontrivial, but there is zero incentive to do so.


Mathematica is an extremely powerful tool with a significant and wide range of applications. However, the consequences of it not working correctly can also be significant. Hence, it would be good to know when that can happen. Also, if users desperately want some feature, maybe they have a point and it would be worthwhile to implement it.

I don’t know if the petition will make any difference. But I also don’t know any Mathematica user who would not be complaining to his or her colleagues about how frustrating Mathematica can sometimes be.

So, Stephen, please don’t be a stubborn child. We don’t need no new kind of science, just admit what’s wrong with your program!

Thank you!

Nonlocality without entanglement

After much work we finally uploaded our paper titled “A framework for bounding nonlocality of state discrimination” on the arXiv (this is joint work with Andrew Childs, Debbie Leung, and Laura Mančinska).

Let me informally explain what quantum nonlocality without entanglement is about and briefly summarize the main results of our paper.

Can you tell an apple from an orange?

Let me describe a slightly strange game. Imagine that some crazy biotechnology corporation like Monsanto comes up with a way to grow a new genetically modified fruit called abo. An abo fruit consists of two halves: each of the halves is either an apple (A), banana (B), or orange (O).

In total there are 9 different possible abo fruits: apple-apple, apple-banana, apple-orange, banana-apple, banana-banana, etc. Moreover, let’s assume that each combination is equally likely.

Now, two friends Amandine and Bobby that live in two different places decide to play the following game. They ask somebody to buy an abo fruit, cut it in two halves and mail one half to each of them. Their task is to determine which of the 9 possible combinations it is. (We assume that abo fruits don’t go bad, since they have been properly sprayed with Monsanto’s chemicals.)

Since Amandine and Bobby live in different places, we allow them to talk over Skype to make their decision. In fact, they can even use the video mode to show each other the half of the fruit they have. Technically, this is called LOCC (local operations and classical communication).

If you think about this for a bit, you should realize that this is clearly a very silly game! Each of the parties can easily identify their half of the fruit as either an apple, orange, or banana, and inform the other party. Thus, no fun in the classical case.

Quantum fruits?

Imagine that Monsanto’s technology advances to the point when they can create superpositions of different plant DNA and produce quantum abo fruits, where each half of the fruit is an arbitrary combination of an apple, banana, and orange. For example,

\displaystyle    \frac{|\text{apple}\rangle + |\text{banana}\rangle}{\sqrt{2}}    \otimes    |\text{orange}\rangle

is a valid quantum abo fruit (Amandine has something between an apple and banana, but Bobby has an orange). If you buy it at supermarket, it might look something like this:

To be fair, we should allow only 9 different kinds of quantum abo fruits to be produced. Moreover, we require that they can be discriminated globally, i.e., when Amandine and Bobby are together and perform a joint observation of the whole fruit (in other words, the corresponding quantum states form an orthonormal product basis).

Now the question is: can Amandine and Bobby always tell the 9 quantum abo fruits apart over Skype? Surprisingly, the answer to this question is “No”.

Sausage states…

Let us consider a specific construction (also known as domino states). If we replace fruits by numbers, we can write these 9 states as

|\psi_1\rangle = |1\rangle \otimes |1\rangle
|\psi_{2,3}\rangle = |0\rangle \otimes |0 \pm 1\rangle
|\psi_{4,5}\rangle = |2\rangle \otimes |1 \pm 2\rangle
|\psi_{6,7}\rangle = |1 \pm 2\rangle \otimes |0\rangle
|\psi_{8,9}\rangle = |0 \pm 1\rangle \otimes |2\rangle

where |i \pm j\rangle := (|i\rangle \pm |j\rangle) / \sqrt{2}. Clearly, these all are product states, and one can easily check that they are mutually orthogonal.

Alternatively, we can depict them as “sausages” on a grill, where Amandine has the first half (rows), but Bobby has the second half of the state (columns):

These states were introduced by Charles Bennett, David DiVincenzo, Christopher Fuchs, Tal Mor, Eric Rains, Peter Shor, John Smolin, and William Wootters in 1998 in their paper “Quantum nonlocality without entanglement”.

They show that these states cannot be locally discriminated even if Amandine and Bobby are allowed to talk on Skype as long as they want. No matter what they do, they will always have at least some small amount (at least 0.00000531 bits) of uncertainty left about what their joint state is. (Technically, we say that these states cannot be discriminated by LOCC asymptotically.) Thus, despite there being no entanglement, these states exhibit some nonlocal properties. That’s why this phenomenon is called quantum nonlocality without entanglement.

Intuitively, the problem is that Amandine cannot just identify her half of the state as being either 0, 1, or 2, since this would destroy the superposition in case the state was, say |\psi_8\rangle or |\psi_9\rangle, and a similar argument holds for Bobby. In other words, the reduced states on Amandine’s side are not orthonormal, so there is no preferred basis in which to perform the first measurement. However, it is very hard to make this argument rigorous and obtain a quantitative estimate of the degree of failure.

Our contributions and the proof idea

We provide a framework for bounding the amount of nonlocality in a given set of bipartite quantum states in terms of a lower bound on the probability of error in any LOCC discrimination protocol. The main idea is to establish a trade-off between disturbance and information gain, i.e., to show that the more information we learn about a quantum system, the more we disturb it.

To explain this more formally, let us consider the problem of discriminating arbitray n bipartite states |\psi_1\rangle, \dotsc, |\psi_n\rangle. Assume that Amandine and Bobby execute some protocol for discriminating these states, and we stop them at some point during the protocol. Let p_1, \dotsc, p_n be the posterior probability distribution and |\phi_1\rangle, \dotsc, |\phi_n\rangle be the corresponding post-measurement states.

Definition. We say that states |\psi_i\rangle satisfy a trade-off between disturbance and information gain with constant \eta if

\eta \, \varepsilon \leq \delta

holds in every branch when the protocol has been stopped, where

\varepsilon = \max_k p_k - \frac{1}{n}

is the information gain and

\delta = \max_{i \neq j} |\langle\phi_i|\phi_j\rangle|

is the disturbance. The largest constant \eta that satisfies this inequality we call the nonlocality constant of the given states.

Intuitively, \varepsilon measures how far the posterior probability distribution is from uniform, but \delta measures how nonorthogonal the post-measurement states have become.

Now our main result can be stated as follows:

Theorem. Any LOCC protocol for discriminating states drawn uniformly from |\psi_1\rangle, \dotsc, |\psi_n\rangle errs with probability

p_{\text{error}} \geq \displaystyle    \frac{2}{27} \, \frac{\eta^2}{n^5}

where \eta is the nonlocality constant of these states.

Our proof of this theorem is based on Helstrom bound. Our second main contribution is a systematic method for bounding the nonlocality constant \eta for a large class of product bases. In particular, we obtain specific bounds for domino states, rotated domino states, and a more general family of domino-type states.

For more details see our paper.