BiFuncLib's documentation

BiFuncLib is a Python package that aggregates multiple biclustering methods mainly for functional data.

This Page

FunCC

This page references the official documentation of FunCC.

Method Description

FunCC is a non-parametric, greedy bi-clustering algorithm for matrices whose entries are continuous curves. It extends the Cheng & Church strategy to the functional domain and optionally co-estimates curve alignment (phase registration). The key steps are summarized below.

  1. Functional Data Representation Discrete observations are smoothed into continuous curves \(f_{ij}(t)\), forming an \(n \times m\) functional matrix.

  2. Ideal Bi-Cluster Model & H-score

    Within a bi-cluster \(B(I,J)\) each curve is modeled as

    \[f_{ij}(t) = \mu(t) + \alpha_i(t) + \beta_j(t)\]

    where \(\mu\) is the cluster mean curve and \(\alpha_i,\beta_j\) capture row- and column-specific deviations. The H-score quantifies within-cluster dispersion via the average squared \(L^2\) distance to the fitted template.

  3. Greedy Iterative Search (Functional Cheng & Church)

    • Multiple-node deletion: remove rows/columns whose contribution exceeds \(\theta \cdot H\).

    • Single-node deletion: iteratively drop the worst row/column until \(H < \delta\).

    • Node addition: re-introduce previously removed rows/columns that do not increase \(H\).

    After a cluster is finalized, assigned elements are masked; Bimax locates the largest remaining sub-matrix and the process repeats.

  4. Optional Curve Alignment

    A shift warping \(w_{ij}(t)=t+q_{ij}\) is estimated per curve to minimize the squared \(L^2\) distance to the current template. Alignment and template updates iterate until convergence.

  5. Parameter Tuning

    • \(\delta\) balances cluster quality vs. quantity.

    • \(\theta\) controls the aggressiveness of multiple-node deletion.

  6. Visualization & Interpretation

    FunCC outputs non-exhaustive, non-overlapping bi-clusters described by \(\mu(t)\) and row/column effects, revealing interpretable spatio-temporal patterns.

Function

This method provides four core functions: cc_sim_data, cc_bifunc, cc_bifunc_cv and FDPlot.cc_fdplot. In this section, we detail their respective usage, aswell as parameters, output values and usage examples for each function. Because the parameters of functions cc_bifunc and cc_bifunc_cv are similar while their outputs differ, we will explain the two functions together.

cc_sim_data

cc_sim_data loads simulated data according to the FunCC model.

cc_sim_data()

Parameter

The simulated data are loaded internally and have no adjustable parameters.

Value

An object of class array of dimension 30 x 7 x 240.

Example

from BiFuncLib.simulation_data import cc_sim_data
fun_mat = cc_sim_data()

cc_bifunc & cc_bifunc_cv

cc_bifunc performs model fitting, while cc_bifunc_cv selects the best delta for the algorithm.

cc_bifunc(data, delta, theta = 1, template_type = 'mean', number = 100,
          alpha = 0, beta = 0, const_alpha = False, const_beta = False,
          shift_alignment = False, shift_max = 0.1, max_iter_align = 100)

and

cc_bifunc_cv(data, delta_list, theta = 1.5, template_type = 'mean', number = 100,
             alpha = 0, beta = 0, const_alpha = False, const_beta = False,
             shift_alignment = False, shift_max = 0.1, max_iter_align = 100, plot = True)

Parameter

Parameter

Description

data

array, the data array (n x m x T) where each entry corresponds to the measure of one observation i, i=1,…,n, for a functional variable m, m=1,…,p, at point t, t=1,…,T.

delta (no cross validation)

numeric, maximum of accepted score, should be a real value > 0.

delta_list (with cross validation)

list, a list of delta to be selected.

theta

numeric, scaling factor should be a real value > 1.

template_type

str, type of template required. If template_type=’mean’ the template is evaluated as the average function, if template_type=’medoid’ the template is evaluated as the medoid function. Default is ‘mean’.

number

integer, maximum number of iteration. Default is 100.

alpha

integer 1 or 0, if alpha=1 row shift is allowed, if alpha=0 row shift is avoided. Default is 0.

beta

integer 1 or 0, if beta=1 column shift is allowed, if beta=0 column shift is avoided. Default is 0.

const_alpha

bool, if True, row shift is contrained as constant. Default is False.

const_beta

bool, if True, column shift is contrained as constant. Default is False.

shift_alignment

bool, if True, the shift aligment is performed, if False no alignment is performed. Default is False.

shift_max

numeric between 0 and 1, controls the maximal allowed shift at each iteration, in the alignment procedure with respect to the range of curve domains.

max_iter_align

integer, maximum number of iteration in the alignment procedure.

plot

bool, whether to output graphs showing how each model metric changes with iterations. Default is True.

Value

The function cc_bifunc outputs a dict including clustering results and information of the model.

  • Number: integer, the number of clustering groups.

  • RowxNumber: array of bool, a matrix contains row clustering results.

  • Numberxcol: array of bool, a matrix contains column clustering results.

  • Parameter: dict, a dict containing the parameters setting of the algorithm.

The function cc_bifunc_cv outputs a dataframe including each model metric changes with different delta. Users can select best parameter through the function. If plot=True, then the following graphs will be displayed:

  • Delta v.s. H score

../_images/cc_htot.png
  • Delta v.s. number of not assigned

../_images/cc_notassigned.png
  • Delta v.s. number of cluster

../_images/cc_numclus.png

Example

import numpy as np
from BiFuncLib.simulation_data import cc_sim_data
from BiFuncLib.cc_bifunc import cc_bifunc, cc_bifunc_cv
delta_list = np.linspace(0.1, 20, num = 21)
fun_mat = cc_sim_data()
# Find best delta
cc_result_cv = cc_bifunc_cv(fun_mat, delta_list = delta_list, alpha = 1, beta = 0, const_alpha = True, plot = True)
# Without shift_alignment
cc_result_1 = cc_bifunc(fun_mat, delta = 10, alpha = 1, beta = 0, const_alpha = True, shift_alignment = False)
# With shift_alignment
cc_result_2 = cc_bifunc(fun_mat, delta = 10, alpha = 1, beta = 0, const_alpha = True, shift_alignment = True)

FDPlot.cc_fdplot

FDPlot.cc_fdplot displays the clustered function curves and provides options for mean subtraction, alignment, and warping.

FDPlot(result).cc_fdplot(data, only_mean = False, aligned = False, warping = False)

Parameter

Parameter

Description

result

dict, a clustering result generated by cc_bifunc function.

data

array, same as in cc_bifunc.

only_mean

bool, if True, only the template functions for each bi-cluster is displayed. Default is False.

aligned

bool, if True, the aligned functions are displayed. Default is False.

warping

bool, if True, a figure representing the warping functions are displayed. Default is False.

Value

Here we illustrate the outputs of the plot function in different settings.

  • cluster results

figa

figb

figc

  • aligned function

../_images/cc_aligned.png
  • warping function

../_images/cc_warping.png

Example

import numpy as np
from BiFuncLib.FDPlot import FDPlot
from BiFuncLib.simulation_data import cc_sim_data
from BiFuncLib.cc_bifunc import cc_bifunc, cc_bifunc_cv
delta_list = np.linspace(0.1, 20, num = 21)
fun_mat = cc_sim_data()
# Find best delta
cc_result_cv = cc_bifunc_cv(fun_mat, delta_list = delta_list, alpha = 1, beta = 0, const_alpha = True, plot = True)
# Without shift_alignment
cc_result_1 = cc_bifunc(fun_mat, delta = 10, alpha = 1, beta = 0, const_alpha = True, shift_alignment = False)
FDPlot(cc_result_1).cc_fdplot(fun_mat, only_mean = True, aligned = False, warping = False)
# With shift_alignment
cc_result_2 = cc_bifunc(fun_mat, delta = 10, alpha = 1, beta = 0, const_alpha = True, shift_alignment = True)
FDPlot(cc_result_2).cc_fdplot(fun_mat, only_mean = False, aligned = True, warping = True)

Previous: FunLBM | Next: FunPF