Table of Contents

introduction
installation
Methods and Main Functions
- FunFEM
- FunLBM
- FunCC
- FunPF
- FunSparse
- FunSAS
- FunLocal
- Bimax
- SSVD
- CVX
References
Contribution

This Page

SSVD¶

Method Description¶

Sparse Data Representation

SSVD operates on high-dimensional data matrices where rows typically represent samples (e.g., subjects) and columns represent variables (e.g., genes). The data matrix is often centered to ensure that each column has zero mean. This preprocessing step helps in identifying interpretable patterns by removing the overall mean effect.

Model Definition

SSVD seeks a low-rank, checkerboard-structured matrix approximation to the data matrix \(X\). The checkerboard structure is achieved by imposing sparsity on both the left and right singular vectors, resulting in many zero entries. Specifically, the model aims to decompose \(X\) into a sum of rank-one matrices \(s_k u_k v_k^T\), where \(u_k\) and \(v_k\) are sparse singular vectors. This structure allows for simultaneous clustering of rows and columns, revealing interpretable row-column associations.

Iterative Algorithm for Computation

The SSVD algorithm alternates between optimizing the left and right singular vectors while imposing sparsity-inducing penalties. The steps are as follows:

Initialization: Apply standard SVD to the data matrix \(X\) to obtain the initial singular vectors.

Update :math:`v`: For fixed \(u\), update \(v\) using a soft-thresholding rule to impose sparsity:

\[\tilde{v}_j = \text{sign}((X^T u)_j) \left( |(X^T u)_j| - \frac{\lambda_v w_{2,j}}{2} \right)_+\]

Update :math:`u`: For fixed \(v\), update \(u\) similarly:

\[\tilde{u}_i = \text{sign}((X v)_i) \left( |(X v)_i| - \frac{\lambda_u w_{1,i}}{2} \right)_+\]

Normalization and Singular Value Update: Normalize \(u\) and \(v\) and update the singular value \(s\).

Convergence Check: Repeat the updates until convergence, typically within 5 to 10 iterations. The penalty parameters \(\lambda_u\) and \(\lambda_v\) are selected using the Bayesian Information Criterion (BIC) to balance sparsity and goodness-of-fit.

Penalty Parameter Selection

The degrees of sparsity of the singular vectors \(u\) and \(v\) are controlled by the penalty parameters \(\lambda_u\) and \(\lambda_v\). These parameters are selected using the BIC, which balances the model complexity (number of non-zero entries) with the fit to the data. The BIC is defined as:

\[\text{BIC}(\lambda) = \frac{\|Y - \hat{Y}\|^2}{nd \cdot \hat{\sigma}^2} + \frac{\log(nd)}{nd} \hat{df}(\lambda),\]

where \(\hat{df}(\lambda)\) is the degree of sparsity (number of non-zero entries) and \(\hat{\sigma}^2\) is the estimated error variance.

Post-processing for Interpretation

After obtaining the SSVD layers, the resulting sparse singular vectors can be used to identify biclusters. For example, in gene expression data, the non-zero entries in \(u\) and \(v\) indicate which samples and genes are associated within each bicluster. The biclusters can be visualized using image plots or scatter plots of the singular vectors, revealing distinct groupings and contrasts between different conditions or classes.

Function¶

This method provides three four functions: ssvd_sim_data for simulation, s4vd_biclus and ssvd_biclus for modeling, and jaccardmat for evaluation. For visualization, we use bcheatmap function which is the same as Bimax to show the clustering result. In this section, we detail their respective usage, as well as parameters, output values and usage examples for each function.

ssvd_sim_data¶

ssvd_sim_data generates a dict contains data matrix and true clustering results.

ssvd_sim_data()

Parameter¶

The simulated data generates data in a fixed way and have no adjustable parameters.

Value¶

data: array, data matrix generated by SSVD model.
res: BiclustResult object, true clustering results.

Example¶

from BiFuncLib.simulation_data import ssvd_sim_data
ssvd_simdata = ssvd_sim_data()

s4vd_biclus & ssvd_biclus¶

The ssvd_biclus function performs biclustering of the data matrix using sparse singular value decomposition, while s4vd_biclus builds upon this by incorporating nested stability selection for enhanced biclustering results.

ssvd_biclus(data, K=10, threu=1, threv=1, gamu=0, gamv=0, merr=1e-4, niter=100)

and

s4vd_biclus(data, steps=100, pcerv=0.1, pceru=0.1, ss_thr=(0.6, 0.65), size=0.5, gamm=0, iters=100, nbiclust=10, merr=1e-3, cols_nc=True,
            rows_nc=True, row_overlap=True, col_overlap=True, row_min=1, col_min=1, pointwise=True, start_iter=3, savepath=False)

Parameter¶

For ssvd_biclus function, the parameters are listed below:

Parameter	Description
data	array, the matrix to be clustered.
K	integer, number of SSVD-layers.
threu	integer 1 or 2, type of penalty (thresholding rule) for the left singular vector, 1 = (Adaptive) LASSO, 2 = hard thresholding. Default is 1.
threv	integer 1 or 2, type of penalty (thresholding rule) for the right singular vector, 1 = (Adaptive) LASSO, 2 = hard thresholding. Default is 1.
gamu	numeric, weight parameter in Adaptive LASSO for the left singular vector, nonnegative constant. Default is 0.
gamv	numeric, weight parameter in Adaptive LASSO for the right singular vector, nonnegative constant. Default is 0.
merr	numeric, threshold to decide convergence. Default is 1e-4.
niter	integer, maximum number of iterations. Default is 100.

For s4vd_biclus function, the parameters are listed below:

Parameter	Description
data	array, the matrix to be clustered.
steps	integer, number of subsamples used to perform the stability selection. Default is 100.
pcerv	numeric, per comparsion wise error rate to control the number of falsely selected right singular vector coefficients (columns/samples). Default is 0.1.
pceru	numeric, per comparsion wise error rate to control the number of falsely selected left singular vector coefficients (rows/genes). Default is 0.1.
ss_thr	tuple, range of the cutoff threshold (relative selection frequency) for the stability selection. Default is (0.6, 0.65).
size	numeric, size of the subsamples used to perform the stability selection. Default is 0.5.
gamm	numeric, weight parameter for the adaptive LASSO, nonnegative constant. Default is 0.
iters	integer, maximal number of iterations to fit a single bicluster. Default is 100.
nbiclust	integer, maximal number of biclusters. Default is 10.
merr	numeric, threshold to decide convergence. Default is 1e-3.
cols_nc	bool, allow for negative correlation of columns (samples) over rows (genes). Default is True.
rows_nc	bool, allow for negative correlation of rows (genes) over columns (samples). Default is True.
row_overlap	bool, allow rows to overlap between biclusters. Default is True.
col_overlap	bool, allow columns to overlap between biclusters. Default is True.
row_min	integer, minimal number of rows. Default is 1.
col_min	integer, minimal number of columns. Default is 1.
pointwise	bool, performs a fast pointwise stability selection instead of calculating the complete stability path. Default is False.
start_iter	integer, number of starting iterations in which the algorithm is not allowed to converge. Default is 3.
savepath	bool, saves the stability path in order plot the path with the stabpathplot function. Default is False.

Value¶

Both functions return a BiclustResult object, which is the same as Bimax.

Example¶

from BiFuncLib.simulation_data import ssvd_sim_data
from BiFuncLib.ssvd_biclus import s4vd_biclus, ssvd_biclus
ssvd_simdata = ssvd_sim_data()
data = ssvd_simdata['data']
res_sim = ssvd_simdata['res']
s4vd_res = s4vd_biclus(data, pcerv=0.5, pceru=0.5, pointwise=False, nbiclust=1)
s4vd_res_pw = s4vd_biclus(data, pcerv=0.5, pceru=0.5, pointwise=True, nbiclust=1)
res2_ssvd = ssvd_biclus(data,K=1)

Other functions¶

For visualization, we use bcheatmap function which can be seen at Bimax to show the clustering results. A heatmap will be displayed:

For evaluation, we use jaccardmat function. Its usage is shown below:

jaccardmat(res1, res2, mode=None)

res1 and res2 parameters are two BiclustResult objects, mode is used to control whether the calculation is performed row-wise (mode=’row’), column-wise (mode=’column’), or over the entire matrix (mode=’None’). This ffunction outputs a matrix contains jaccard indexes.

Example¶

from BiFuncLib.simulation_data import ssvd_sim_data
from BiFuncLib.ssvd_main_func import jaccardmat
from BiFuncLib.ssvd_biclus import s4vd_biclus, ssvd_biclus
from BiFuncLib.bcheatmap import bcheatmap
ssvd_simdata = ssvd_sim_data()
data = ssvd_simdata['data']
res_sim = ssvd_simdata['res']
s4vd_res = s4vd_biclus(data, pcerv=0.5, pceru=0.5, pointwise=False, nbiclust=1)
print(jaccardmat(res_sim, s4vd_res, 'row'))
print(jaccardmat(res_sim, s4vd_res, 'column'))
bcheatmap(data, s4vd_res)
s4vd_res_pw = s4vd_biclus(data, pcerv=0.5, pceru=0.5, pointwise=True, nbiclust=1)
print(jaccardmat(res_sim, s4vd_res_pw))
bcheatmap(data, s4vd_res_pw)
res2_ssvd = ssvd_biclus(data,K=1)
print(jaccardmat(res_sim, res2_ssvd))
bcheatmap(data, res2_ssvd)

Previous: Bimax | Next: CVX

On this page

SSVD
- Method Description
- Function
  - ssvd_sim_data
    - Parameter
    - Value
    - Example
  - s4vd_biclus & ssvd_biclus
    - Parameter
    - Value
    - Example
  - Other functions
    - Example

BiFuncLib's documentation

This Page

Quick search

SSVD¶

Method Description¶

Function¶

ssvd_sim_data¶

Parameter¶

Value¶

Example¶

s4vd_biclus & ssvd_biclus¶

Parameter¶

Value¶

Example¶

Other functions¶

Example¶