Matrix times vector: dense vs. sparse vs. symbolic

Introduction

In this tutorial we are going to benchmark and test the performance of matrix times vector operations and compare different representations of the matrix. So in general we care about,

\ket{\psi} = \hat{M} \ket{\psi},

where the equality here is being used in the assignment sense. I chose to write $\hat{M}$ to indicate that this can really be any matrix constructed from a quantum operator expression. Updating $\ket{\psi}$ directly is a common operation when we don't need to know the input and output vector in our algorithm. Sometimes we do care about this however and we would like to track both the input and output,

\ket{\phi} = \hat{M} \ket{\psi},

In this tutorial we will focus on operations where we do not need to store the input vector. The conclusions of this tutorial do not change if you explore the first or second case, but I encourage you to attempt the benchmarking if you care about tracking both vectors. Before we get started we should discuss the limitations of dense, sparse and symbolic representations of matrices.

Storage limitations of matrix representations

Let's assume we have a Hamiltonian represented in terms of a sum of local operators, on a chain of $L$ sites,

\hat{H} = \sum_{i=0}^{L-1} \hat{h}_i,

where $\hat{h}_i$ are local operators representing nearest neighbour interactions. Our dense vectors for this system (without resolving any symmetries) will have $2^L$ entries. Depending on our needs these might be real or complex vectors. Real numbers require 8 bytes of memory while complex numbers require 16 bytes. For the purpose of this tutorial we will assume the worst case and do our memory estimations with complex numbers. Meaning a single vector in our program will require $16\times 2^L$ bytes.

Dense Representation

First let's discuss dense matrices. These matrices are typical in a number of articles, and represent the easiest to construct matrices if you are starting from scratch in a language like python. Commonly they are constructed with Kronecker (or tensor) products. Dense matrices store every entry, even if the entry is zero, meaing our memory overhead is $16 \times 2^{2L}$ bytes. A quick calculation reveals that just $L=16$ gives us a burden of $\sim 68.7$ GB, meaning the average laptop cannot handle storing this matrix in memory. It is possible to use larger compute nodes to handle this size, but for laptop computation about $L=14,15$ are safely the limit.

Sparse Representation

Second let's discuss the scaling of sparse matrices. We partially introduced the structure of $\hat{H}$ to make this reasonable to discuss because locality inevitably reveals itself in the form of zeros or sparsity when you store your operator expression in the form of a matrix. The key to understanding your storage requirements for sparse matrices is to understand the behavior of $\hat{h}_i$ when it acts on a computational basis vector. In the case of a spin- $\frac{1}{2}$ XXZ model we would have,

\hat{h}_i = \hat{X}_i \hat{X}_{i+1} + \hat{Y}_i \hat{Y}_{i+1} + \Delta \hat{Z}_i \hat{Z}_{i+1},

the $\hat{X}_i \hat{X}_{i+1} + \hat{Y}_i \hat{Y}_{i+1}$ will take in a computational basis state, and output a single state (it is non-branching), and the $\hat{Z}_i \hat{Z}_{i+1}$ will take in a vector, and scale it. The $\hat{Z}_i \hat{Z}_{i+1}$ terms contribute to the diagonal of a sparse matrix which is $2^L$ entries. The off-diagonal components will in general spawn $\sim L$ new vectors (one for each $\hat{h}_i$ ), with the precise amount depending on the boundary conditions. Our assumption of just $L$ is importantly informed by the simplicity of the off-diagonal contribution of $\hat{h}_i$ . In general there could be a constant factor in front of $L$ which greatly impacts our storage requirements. So our total storage will be roughly $\sim 16 \times L \times 2^L$ bytes. This is really the best case scenario for our scaling in the Hamiltonian context. Sparse matrices give us a major advantage in memory overhead. For $L=16$ we can estimate that we only need about $0.02$ GB to store our matrix, well within the limitations of any laptop on the market. So when do sparse matrices with structure like ours begin to become too heavy? This kicks in around $L=26$ with our current structure resulting in about $28$ GB of storage. In general the off-diagonal contribution could be larger for a number of reasons. More complicated nearest neighbor interactions, longer range interactions and so on will increase the memory overhead to store the sparse matrix representation.

Symbolic Representation

Lastly we can talk about symbolic expressions. Symbolic expressions have close to zero memory overhead. Operator expressions are typically a "name" like $X$ stored internally as an enum (like an integer) and along with integers defining its support in terms of linear indices. Integers in Aleph are $8$ bytes, so even for very complex operator expressions you will pay very little memory overhead cost. The dominant storage is then in the dense vector storage for our benchmark, which is $16\times 2^{L}$ . This means that if you can store a dense matrix for a system, a symbolic expression, even without symmetries, can double your maximum system size without much effort.

note

There are edge cases like symbolic Heisenberg time evolution of operators where symbolic expressions can grow exponentially large, and also contribute significantly to storage requirements.

So we have a clear hierarchy in terms of storage cost: dense > sparse > symbolic, where dense is by far the heaviest in terms of memory. The advantage you get from the sparse representation will always be problem dependent, and symbolic expressions have little memory cost. In fact without the dense vector memory, symbolic representations of operators can handle extremely large system sizes.

Benchmarking infrastructure

Now that we understand the storage requirements for our problems we want to write some quick infrastructure to benchmark the performance of $\ket{\psi} = \hat{M} \ket{\psi}$ . First let's define two functions,

// symbolic operators have specialized in-place kernels in a lot of cases
def bench_op(integer L, psi, op)
{
    op >> psi
}
// matrices do not have special in-place kernels
def bench_mat(integer L, psi, mat)
{
    psi = mat*psi
}
// if you have access to a larger machine you can increase this to 16.
var Lmax = 14

for(L:[8..Lmax+1][::2])
{
    // example contexts to benchmark.
    //var myop = X(0) 
     var myop = ZZNN([0..L])
   // var myop = haar_circuit(L)
    //var myop = J1J2Model(L)

    // spawning a haar vector to be out |psi>
    var psi =  haar_vector(2**L)
    // spawning  dense and sparse matrices. 
    var mat = matrix(myop,L)
    var smat = matrix(mat, as_sparsematrix)
    smat.prune()

    print("System size: " + string(L))
    print(benchmark(fun[L,psi,myop]() { bench_op(L,psi,myop) },20).mean().seconds(as_real))
    print(benchmark(fun[L,psi,smat]() { bench_mat(L,psi,smat) },20).mean().seconds(as_real))
    print(benchmark(fun[L,psi,mat]() { bench_mat(L,psi,mat) },20).mean().seconds(as_real))
}

Symbolic operators get a speedup in most, but not all, cases if you use the in-place operator >>. So we will use bench_op for our symbolic operators and bench_mat for sparse and dense matrices. We have two examples about that are not built in functions, namely building a brickwork Haar random circuit, and a J1J2 model. I will leave the code to construct these examples at the bottom.

Results for ZZNN

We will report the results for ZZNN here. To see the definition of this operator see here. We ran this test on a machine with $512$ GB of ram and included $L=16$ . All times are reported in seconds.

L	Symbolic	Sparse	Dense
$10$	$0.00026$	$0.0223$	$0.0043$
$12$	$8.3975e-05$	$0.17267$	$0.0077$
$14$	$7.4257e-05$	$1.5381$	$0.0228$
$16$	$7.9425e-05$	$21.1002$	$0.26189$

The conclusion of this benchmark is pretty clear, the symbolic operations are much faster at updating vectors than dense or sparse matrices. Speeds will depend on the exact machine you run this on, but in general the conclusions will be the same. This is compounded by the fact that symbolic operators also carry a memory overhead advantage. This implies that you should always use symbolic operators, unless you absolutely need the dense matrix representation of an operator for some linear algebra routine like solving for every eigenvalue.

note

The symbolic operator benchmarking exhibits strange scaling behavior with system size here. This is due to thread spawning overhead. We are experimenting internally to fix this behavior. The scaling becomes predictable and stable for larger system sizes.

What's next

Try the other benchmarks on your machine
Modify this script to remove dense matrices and probe larger system sizes for sparse and symbolic matrix representations
Modify this script to remove matrices all together and observe the upper limits of symbolic operators multiplied into dense vectors
How do our conclusions change if we instead care about the benchmarking case $|\phi \rangle = \hat{M} |\psi\rangle$ .

Helper functions

def haar_circuit(integer L)
{
    var circ_haar_odd = operator_prod()
    var circ_haar_even = operator_prod()
    // only select even m
    for(m: [0..L][::2])
    {
        // we generate a new Haar random unitary for each coupling
        var mat = haar_matrix(4)
        circ_haar_even *= operator_dense(mat,[m,(m+1)%L])
    }
    for(m: [1..L][::2])
    {
        // we generate a new Haar random unitary for each coupling
        var mat = haar_matrix(4)
        circ_haar_odd *= operator_dense(mat,[m,(m+1)%L])
    }
    return circ_haar_odd*circ_haar_even;

}

def J1J2Model(integer L)
{
   var ham = operator_sum()

    var s1; var s2;
    var s3;

    var J1 = 1.0;
    var J2 = -0.5;
    var gamma1 = 0.5;
    var gamma2 = 0.5;

// we are doing periodic boundaries
    for(m : [0..L])
    {
        s1 = m;
        s2 = (m+1)%L
        s3 = (m+2)%L
        // XXPYY is a faster subroutine to do XX+YY calculations
        ham += J1*XXPYY(s1,s2) + (J1*gamma1)*ZZ(s1,s2) + J2*XXPYY(s1,s3) + (J2*gamma2)*ZZ(s1,s3);
    } 
    return ham
}

Matrix times vector: dense vs. sparse vs. symbolic

Introduction

Storage limitations of matrix representations

Dense Representation

Sparse Representation

Symbolic Representation

Benchmarking infrastructure

Results for ZZNN

What's next

Helper functions

Documentation Contributors

Jonathon Riddell

Introduction​

Storage limitations of matrix representations​

Dense Representation​

Sparse Representation​

Symbolic Representation​

Benchmarking infrastructure​

Results for ZZNN​

What's next​

Helper functions​

Documentation Contributors

Jonathon Riddell

Introduction

Storage limitations of matrix representations

Dense Representation

Sparse Representation

Symbolic Representation

Benchmarking infrastructure

Results for ZZNN

What's next

Helper functions