BiFuncLib's documentation

BiFuncLib is a Python package that aggregates multiple biclustering methods mainly for functional data.

This Page

FunSparse

Method Description

Sparse Functional K-means Clustering is an advanced clustering technique designed for functional data, such as time series or curves. It extends traditional K-means clustering by incorporating sparsity to select the most relevant features of the data for clustering, enhancing interpretability and accuracy.

  • Functional Data Representation

Each observed curve is represented in a continuous form, ensuring the data are suitable for clustering.

  • Sparsity Constraint

A sparsity constraint is introduced to select the most relevant parts of the domain, controlled by a parameter m that specifies the measure of the domain where the weighting function is zero.

  • Optimization Problem

The clustering problem is formulated as a variational problem, maximizing the weighted between-cluster sum of squares (BCSS) subject to the sparsity constraint.

  • Iterative Algorithm

An iterative algorithm is used to solve the optimization problem. The algorithm alternates between:

  1. Weighting Function Update: Given the current clustering, the optimal weighting function is computed using the solution to the variational problem. This step identifies the most relevant parts of the domain for clustering.

  2. Clustering Update: Given the weighting function, the optimal clustering is found by applying a functional K-means clustering algorithm, where the distance between functions is weighted according to the weighting function.

  • Parameter Tuning

The sparsity parameter m is tuned using a permutation-based GAP statistics approach to determine the optimal level of sparsity.

  • Visualization and Interpretation

The results are visualized through estimated cluster mean functions and the weighting function, highlighting the most discriminative parts of the domain.

Function

This method provides three core functions: sparse_sim_data, sparse_bifunc and FDPlot.sparse_fdplot. In this section, we detail their respective usage, as well as parameters, output values and usage examples for each function.

sparse_sim_data

sparse_sim_data generates simulated data according to the FunSparse model, and a true clustering result.

sparse_sim_data(n, x, paramC, plot = False)

Parameter

Parameter

Description

n

integer, the number of observations or curves to generate for each cluster.

x

array, the domain over which the functional data is defined. This is typically a set of points (e.g., time points or spatial coordinates) where the functional data is observed.

paramC

numeric, a parameter that controls the overlap between the two clusters. It specifies the proportion of the domain where the mean functions of the two clusters are identical.

plot

bool, whether to plot the generated data. Default is False.

Value

The function sparse_sim_data outputs a matrix contains simulated data, and a sequence contains true clustering result.

  • data: array, a simulated data generated by FunSparse model.

  • cluster: array, true clustering results.

If plot=True, a visualization of simulated data will be displayed.

../_images/sparse_data.png

Example

import numpy as np
from BiFuncLib.simulation_data import sparse_sim_data
paramC = 0.7
n = 100
x = np.linspace(0, 1, 1000)
sparse_simdata = sparse_sim_data(n, x, paramC)['data']

sparse_bifunc

sparse_bifunc performs model fitting.

sparse_bifunc(data, x, K, method = 'kmea', true_clus = None)

Parameter

Parameter

Description

data

array, the functional data to be clustered. This is typically a 2D array where each column represents a functional observation.

x

array, the domain over which the functional data is defined. This is typically a set of points (e.g., time points or spatial coordinates) where the functional data is observed.

k

integer, the number of clusters to form.

method

str, the clustering method to use, ‘hier’ or ‘kmea’. Default is ‘kmea’.

true_clus

numeric or None, The true cluster labels. If known, the Classification Error Rate (CER) will be calculated to evaluate the clustering performance.

Value

The function sparse_bifunc outputs a dict including clustering results and the value of CER (if true_clus=True).

  • result: dict, clustering results for FunSparse model, including:
    1. cluster: array, indicates the cluster assignment for each data point.

    2. iteration: integer, the number of iterations the algorithm has run.

    3. obj: numeric, the value of the objective function.

    4. w: array, weighting function used in sparse clustering.

  • CER: numeric, the Classification Error Rate (CER).

Example

import numpy as np
from BiFuncLib.simulation_data import sparse_sim_data
from BiFuncLib.sparse_bifunc import sparse_bifunc
K = 2
paramC = 0.7
n = 100
x = np.linspace(0, 1, 1000)
sparse_simdata = sparse_sim_data(n, x, paramC)['data']
part_vera = sparse_sim_data(n, x, paramC)['cluster']
sparse_res = sparse_bifunc(sparse_simdata, x, K, true_clus = part_vera)

FDPlot.sparse_fdplot

FDPlot.sparse_fdplot visualizes the result generated by pf_bifunc function.

FDPlot(result).sparse_fdplot(x, data)

Parameter

Parameter

Description

result

dict, a clustering result generated by sparse_bifunc function.

x

array, the domain over which the functional data is defined. This is typically a set of points (e.g., time points or spatial coordinates) where the functional data is observed.

data

array, the functional data to be clustered. This is typically a 2D array where each column represents a functional observation.

Value

The function outputs two graphs.

The first image represents the outcomes of applying sparse functional K-means clustering to functional datasets, with each curve corresponding to a data point and different colors indicating distinct clusters.

../_images/sparse_cluster.png

The second image illustrates the weighting function resulting from the sparse clustering algorithm, highlighting the critical portions of the data domain that are essential for cluster differentiation.

../_images/sparse_weighting.png

Example

import numpy as np
from BiFuncLib.simulation_data import sparse_sim_data
from BiFuncLib.sparse_bifunc import sparse_bifunc
from BiFuncLib.FDPlot import FDPlot
K = 2
paramC = 0.7
n = 100
x = np.linspace(0, 1, 1000)
sparse_simdata = sparse_sim_data(n, x, paramC)['data']
part_vera = sparse_sim_data(n, x, paramC)['cluster']
sparse_res = sparse_bifunc(sparse_simdata, x, K, true_clus = part_vera)
FDPlot(sparse_res).sparse_fdplot(x, sparse_simdata)

Previous: FunPF | Next: FunSAS