11/22/2013

Constrained linear least squares in Python using scipy and cvxopt. Matlabs lsqlin and lsqnonneg in Python with sparse matrices.

So Matlab has handy functions to solve non-negative constrained linear least squares(lsqnonneg), and optimization toolbox has even more general linear constrained least squares(lsqlin). The great thing about these functions, is that they can efficiently solve problems with large sparse matrices. Well, Python has scipy.optimize.nnls that can handle non-negative least squares as well, but there is no built-in lsqlin alternative, and nnls can't handle sparse matrices. However, you can formulate it as quadratic programming problem, and use scipy.optimize.fmin_slsqp to solve it, but scipy SLSQP implementation can't solve the problem for sparse matrices as well. Luckily there is really great optimization package for Python, called CVXOPT, that can solve quadratic programming problems with sparse matrices. Let's wrap it all up, and even add $\ell_2$ regularization.
The problem of constrained linear least squares is usually stated in following way:

$$\min_x \; \frac12\|Cx-d\|_2^2 + \lambda \|x\|_2^2, $$ $$s.t. \;\; Ax \leq a, \;\;\; Bx = b. $$

using the fact that $\|x\|_2^2 = x^\top x$: $$\frac12\|Cx-d\|_2^2 + \lambda \|x\|_2^2 = \frac12 (Cx - d)^\top(Cx-d) + \lambda x^\top x =$$ $$=\frac12(x^\top C^\top - d^\top)(Cx-d) + \lambda x^\top I x = \frac12 x^\top (C^\top Cx + \lambda I)x - d^\top Cx + \frac12 d^\top d.$$ While minimizing over $x$ we don't care about $d^\top d$, taking $Q = C^\top C + \lambda I$, and $r=d^\top C$ we can rewrite our initial optimization problem as follows:

$$\min_x \; \frac12 x^\top Qx + rx, $$ $$s.t. \;\; Ax \leq a, \;\;\; Bx = b. $$

Now it is obvious, that we have formulated a quadratic program, that can be solved by cvxopt.solvers.qp. Stacking box inequality constraints(Matlab notation lb and ub such that lb<=x<=ub) to matrix $A$ and vector $a$, we can fully emulate lsqlin and lsqnonneg with sparse and dense matrices.

Testing lsqnonneg:
Here is Matlab reference code:

C = [0.0372, 0.2869; 0.6861, 0.7071; ...
     0.6233, 0.6245; 0.6344, 0.6170];
d = [0.8587, 0.1781, 0.0747, 0.8405]';
x = lsqnonneg(C, d)
% output:
% x = [0, 0.6929]
And here is Python code:
import lsqlin
import numpy as np
C = np.array([[0.0372, 0.2869], [0.6861, 0.7071], \
              [0.6233, 0.6245], [0.6344, 0.6170]]);
d = np.array([0.8587, 0.1781, 0.0747, 0.8405]);
ret = lsqlin.lsqnonneg(C, d, {'show_progress': False})
print ret['x'].T
#output:
#[ 2.50e-07  6.93e-01]
So it works perfectly! Disregard tiny value of x[0]=2.5e-07 this can be fixed, by imposing small regularization.

Testing lsqlin:
Matlab reference code:

C = [0.9501    0.7620    0.6153    0.4057
     0.2311    0.4564    0.7919    0.9354
     0.6068    0.0185    0.9218    0.9169
     0.4859    0.8214    0.7382    0.4102
     0.8912    0.4447    0.1762    0.8936];
d = [0.0578, 0.3528, 0.8131, 0.0098, 0.1388]';
A =[0.2027    0.2721    0.7467    0.4659
    0.1987    0.1988    0.4450    0.4186
    0.6037    0.0152    0.9318    0.8462];
b =[0.5251, 0.2026, 0.6721]';
lb = -0.1*ones(4,1);
ub = 2*ones(4,1);
x = lsqlin(C, d, A, b, [], [], lb, ub)
% output:
% x = [-0.1000, -0.1000, 0.2152, 0.3502]
Python code:
import lsqlin
import numpy as np

C = np.array(np.mat('''0.9501,0.7620,0.6153,0.4057;
0.2311,0.4564,0.7919,0.9354;
0.6068,0.0185,0.9218,0.9169;
0.4859,0.8214,0.7382,0.4102;
0.8912,0.4447,0.1762,0.8936'''))

A = np.array(np.mat('''0.2027,0.2721,0.7467,0.4659;
0.1987,0.1988,0.4450,0.4186;
0.6037,0.0152,0.9318,0.8462'''))
d = np.array([0.0578, 0.3528, 0.8131, 0.0098, 0.1388])
b =  np.array([0.5251, 0.2026, 0.6721])
lb = np.array([-0.1] * 4)
ub = np.array([2] * 4)

ret = lsqlin.lsqlin(C, d, 0, A, b, None, None, \
         lb, ub, None, {'show_progress': False})
print ret['x'].T
# output:
# [-1.00e-01 -1.00e-01  2.15e-01  3.50e-01]
Seems to work fine as well.

For sparse matrices in my case(about hundreds columns, tens thousands rows, 3 non-zero elements in each row) both Matlab and Python provided same solutions, and Python implementation seemed to work a bit faster. However I can not provide any timings or memory consumption test. Just hope it can help somebody someday.

lsqlin.py can be downloaded from here, or just copypasted from below. You can use numpy dense matrices, scipy sparse matrices or cvxopt matrices as inputs. Module is documented:

#!/usr/bin/python

# See http://maggotroot.blogspot.ch/2013/11/constrained-linear-least-squares-in.html for more info
'''
    A simple library to solve constrained linear least squares problems
    with sparse and dense matrices. Uses cvxopt library for 
    optimization
'''

__author__ = 'Valeriy Vishnevskiy'
__email__ = 'valera.vishnevskiy@yandex.ru'
__version__ = '1.0'
__date__ = '22.11.2013'
__license__ = 'WTFPL'

import numpy as np
from cvxopt import solvers, matrix, spmatrix, mul
import itertools
from scipy import sparse


def scipy_sparse_to_spmatrix(A):
    coo = A.tocoo()
    SP = spmatrix(coo.data, coo.row.tolist(), coo.col.tolist())
    return SP

def spmatrix_sparse_to_scipy(A):
    data = np.array(A.V).squeeze()
    rows = np.array(A.I).squeeze()
    cols = np.array(A.J).squeeze()
    return sparse.coo_matrix( (data, (rows, cols)) )

def sparse_None_vstack(A1, A2):
    if A1 is None:
        return A2
    else:
        return sparse.vstack([A1, A2])

def numpy_None_vstack(A1, A2):
    if A1 is None:
        return A2
    else:
        return np.vstack([A1, A2])
        
def numpy_None_concatenate(A1, A2):
    if A1 is None:
        return A2
    else:
        return np.concatenate([A1, A2])

def get_shape(A):
    if isinstance(C, spmatrix):
        return C.size
    else:
        return C.shape

def numpy_to_cvxopt_matrix(A):
    if A is None:
        return A
    if sparse.issparse(A):
        if isinstance(A, sparse.spmatrix):
            return scipy_sparse_to_spmatrix(A)
        else:
            return A
    else:
        if isinstance(A, np.ndarray):
            if A.ndim == 1:
                return matrix(A, (A.shape[0], 1), 'd')
            else:
                return matrix(A, A.shape, 'd')
        else:
            return A

def cvxopt_to_numpy_matrix(A):
    if A is None:
        return A
    if isinstance(A, spmatrix):
        return spmatrix_sparse_to_scipy(A)
    elif isinstance(A, matrix):
        return np.array(A).squeeze()
    else:
        return np.array(A).squeeze()
        

def lsqlin(C, d, reg=0, A=None, b=None, Aeq=None, beq=None, \
        lb=None, ub=None, x0=None, opts=None):
    '''
        Solve linear constrained l2-regularized least squares. Can 
        handle both dense and sparse matrices. Matlab's lsqlin 
        equivalent. It is actually wrapper around CVXOPT QP solver.

            min_x ||C*x  - d||^2_2 + reg * ||x||^2_2
            s.t.  A * x <= b
                  Aeq * x = beq
                  lb <= x <= ub

        Input arguments:
            C   is m x n dense or sparse matrix
            d   is n x 1 dense matrix
            reg is regularization parameter
            A   is p x n dense or sparse matrix
            b   is p x 1 dense matrix
            Aeq is q x n dense or sparse matrix
            beq is q x 1 dense matrix
            lb  is n x 1 matrix or scalar
            ub  is n x 1 matrix or scalar

        Output arguments:
            Return dictionary, the output of CVXOPT QP.

        Dont pass matlab-like empty lists to avoid setting parameters,
        just use None:
            lsqlin(C, d, 0.05, None, None, Aeq, beq) #Correct
            lsqlin(C, d, 0.05, [], [], Aeq, beq) #Wrong!
    '''
    sparse_case = False
    if sparse.issparse(A): #detects both np and cxopt sparse
        sparse_case = True
        #We need A to be scipy sparse, as I couldn't find how 
        #CVXOPT spmatrix can be vstacked
        if isinstance(A, spmatrix):
            A = spmatrix_sparse_to_scipy(A)
            
    C =   numpy_to_cvxopt_matrix(C)
    d =   numpy_to_cvxopt_matrix(d)
    Q = C.T * C
    q = - d.T * C
    nvars = C.size[1]

    if reg > 0:
        if sparse_case:
            I = scipy_sparse_to_spmatrix(sparse.eye(nvars, nvars,\
                                          format='coo'))
        else:
            I = matrix(np.eye(nvars), (nvars, nvars), 'd')
        Q = Q + reg * I

    lb = cvxopt_to_numpy_matrix(lb)
    ub = cvxopt_to_numpy_matrix(ub)
    b  = cvxopt_to_numpy_matrix(b)
    
    if lb is not None:  #Modify 'A' and 'b' to add lb inequalities 
        if lb.size == 1:
            lb = np.repeat(lb, nvars)
    
        if sparse_case:
            lb_A = -sparse.eye(nvars, nvars, format='coo')
            A = sparse_None_vstack(A, lb_A)
        else:
            lb_A = -np.eye(nvars)
            A = numpy_None_vstack(A, lb_A)
        b = numpy_None_concatenate(b, -lb)
    if ub is not None:  #Modify 'A' and 'b' to add ub inequalities
        if ub.size == 1:
            ub = np.repeat(ub, nvars)
        if sparse_case:
            ub_A = sparse.eye(nvars, nvars, format='coo')
            A = sparse_None_vstack(A, ub_A)
        else:
            ub_A = np.eye(nvars)
            A = numpy_None_vstack(A, ub_A)
        b = numpy_None_concatenate(b, ub)

    #Convert data to CVXOPT format
    A =   numpy_to_cvxopt_matrix(A)
    Aeq = numpy_to_cvxopt_matrix(Aeq)
    b =   numpy_to_cvxopt_matrix(b)
    beq = numpy_to_cvxopt_matrix(beq)

    #Set up options
    if opts is not None:
        for k, v in opts.items():
            solvers.options[k] = v
    
    #Run CVXOPT.SQP solver
    sol = solvers.qp(Q, q.T, A, b, Aeq, beq, None, x0)
    return sol

def lsqnonneg(C, d, opts):
    '''
    Solves nonnegative linear least-squares problem:
    
    min_x ||C*x - d||_2^2,  where x >= 0
    '''
    return lsqlin(C, d, reg = 0, A = None, b = None, Aeq = None, \
                 beq = None, lb = 0, ub = None, x0 = None, opts = opts)


if __name__ == '__main__':
    # simple Testing routines
    C = np.array(np.mat('''0.9501,0.7620,0.6153,0.4057;
    0.2311,0.4564,0.7919,0.9354;
    0.6068,0.0185,0.9218,0.9169;
    0.4859,0.8214,0.7382,0.4102;
    0.8912,0.4447,0.1762,0.8936'''))
    sC = sparse.coo_matrix(C)
    csC = scipy_sparse_to_spmatrix(sC)

    A = np.array(np.mat('''0.2027,0.2721,0.7467,0.4659;
    0.1987,0.1988,0.4450,0.4186;
    0.6037,0.0152,0.9318,0.8462'''))
    sA = sparse.coo_matrix(A)
    csA = scipy_sparse_to_spmatrix(sA)

    d = np.array([0.0578, 0.3528, 0.8131, 0.0098, 0.1388])
    md = matrix(d)

    b =  np.array([0.5251, 0.2026, 0.6721])
    mb = matrix(b)

    lb = np.array([-0.1] * 4)
    mlb = matrix(lb)
    mmlb = -0.1

    ub = np.array([2] * 4)
    mub = matrix(ub)
    mmub = 2

    #solvers.options[show_progress'] = False
    opts = {'show_progress': False}

    for iC in [C, sC, csC]:
        for iA in [A, sA, csA]:
            for iD in [d, md]:
                for ilb in [lb, mlb, mmlb]:
                    for iub in [ub, mub, mmub]:
                        for ib in [b, mb]:
                            ret = lsqlin(iC, iD, 0, iA, ib, None, None, ilb, iub, None, opts)
                            print ret['x'].T
    print 'Should be [-1.00e-01 -1.00e-01  2.15e-01  3.50e-01]'
    
    #test lsqnonneg
    C = np.array([[0.0372, 0.2869], [0.6861, 0.7071], [0.6233, 0.6245], [0.6344, 0.6170]]);
    d = np.array([0.8587, 0.1781, 0.0747, 0.8405]);
    ret = lsqnonneg(C, d, {'show_progress': False})
    print ret['x'].T
    print 'Should be [2.5e-07; 6.93e-01]'


10/11/2013

Pairwise distances in Matlab and Octave

It is always rewarding, to spend some time vectorizing computations in matlab: code runs faster, you are happy to see how all these ugly for-cycles roll up and turn into elegant linear algebra formulas, sometimes you even can exploit hidden logic in your computations(see covariance matrix estimation).

Of course you shouldn't be too fanatic about it: code can come less readable, memory consumption can grow unproportionally: see ndgrid example below, or recall some repmat constructions. I would like to have octave and python -style broadcasting implemented in matlab, but they claim, that bsxfun, arrayfun, and JITA loops implementation are fast enough. I strongly disagree.

Anyway. Suppose you want to compute table of pairwise distances for two set of vectors in $d$-dimensional space: $$(P)_{ij} = \|x_i - y_j\|^2_2, \;\; i=1,\dots,n_1,\; j = 1,\dots,n_2, \text{ and } P\in\mathbb{R}^{n_1\times n_2}.$$ We store $x_i$ and $y_i$ as columns in matrices $X$ and $Y$ respectively: $$X = \left[ x_1, \dots, x_{n_1}\right] \in \mathbb{R}^{d\times n_1},$$ $$Y = \left[ y_1, \dots, y_{n_2}\right] \in \mathbb{R}^{d\times n_2}.$$ We can expand $\ell_2$-norm: $$(P)_{ij} = \|x_i - y_j\|^2_2 = x_i^\top x_i + y_j^\top y_j - 2 x_i^\top y_j,$$ now expression for matrix $P$ can be written using matrix multiplications: $$P = a\mathbb{1}_{(n_2\times 1)}^\top + \mathbb{1}_{(n_1\times 1)} b - 2 X^\top Y,$$ where $a$ and $b$ are vectors containing $\ell_2$ norms for every vector $x_i$ and $y_j$: $a_i = \|x_i\|_2^2, \;\; a\in\mathbb{R}^{n_1\times 1}$, and $b_j = \|y_j\|_2^2, \;\; b\in\mathbb{R}^{n_2\times 1}$.
$\mathbb{1}_{(m\times 1)} = \left[1, \dots, 1\right]^\top \, \in \mathbb{R}^{m\times 1}.$
Finally, this formula can be written in vectorized matlab expression:

a = sum(X.^2, 1);
b = sum(Y.^2, 1);
P = a' * ones(1, N2)  + ones(N1, 1) * b - 2 * (X' * Y); 

This is quite fast and elegant implementation, that has quite the same(can be twice faster, can be twice slower) running time, as matlabs built-in pdist2. And it is much faster and more memory efficient, than ndgrid-based(see below) pairwise distance computation. Take a look at code, comparing 3 different implementations for this problem. Notation is consistent with formulas above.

%generate data
d = 100; % dimensionality
N1 = 500; % number of vectors
N2 = 400; % number of vectors
X = rand(d, N1);
Y = rand(d, N2);

%built-in
tic();
pd = pdist2(X', Y');
fprintf('"built-in":\t%.5f seconds\n', toc());

%ndgrid
tic();
[gx, gy] = ndgrid(1 : N1, 1 : N2);
P_grid = reshape(sqrt(sum( (X(:, gx) - Y(:, gy)).^2, 1)), [N1, N2]);
t = toc();
fprintf('"ndgrid":\t%.5f seconds. Max err: %e\n', t, max(max(abs(P_grid - pd))));

%matrix
tic();
a = sum(X.^2, 1);
b = sum(Y.^2, 1);
%I use abs to get rid of small(1e-15) negative diagoanl values, 
%produced by computational errors
P =  abs(a' * ones(1, N2)  + ones(N1, 1) * b - 2 * (X' * Y)); 
P = sqrt(P); %to be consistent with pdist2
t = toc();
fprintf('"matrix":\t%.5f seconds. Max err: %e\n', t, max(max(abs(P - pd))));
Output on my machine:
"built-in": 0.02721 seconds
"ndgrid": 0.67575 seconds. Max err: 0.000000e+00
"matrix": 0.01662 seconds. Max err: 1.243450e-14