cluster

Canopy clustering

class pynetcor.cluster.Canopy(t1: float = 0.2, t2: float = 0.6, max_merge_distance: float = 0.2, stop_criteria: int = 50000, distance_measure: str = 'pearson', random_seed: int = None, threads: int = 8)

Canopy clustering.

Parameters:
  • t2 (float, optional, default 0.2) – Max tight distance (correlation difference). If the distance from a point to the canopyis is less than t2, the point is closed enough to the canopy center and is removed from the dataset.

  • t1 (float, optional, default 0.6) – Max loose distance (correlation difference). If the distance from a point to the canopy is less than t1, the point is considered to be in the canopy.

  • max_merge_distance (float, optional, default 0.2) – The maximum distance (correlation difference) between two canopy centers for merging the canopies should be noted. It is important to mention that the final canopy profiles are calculated after the merge step, and as a result, some final canopies may have profiles that are closer than the specified max_merge_distance.

  • stop_criteria (int, optional, default 50000) – The clustering process will terminate after processing a specified number of seeds according to the stop criteria. Setting it to 0 will disable this particular stopping criterion.

  • distance_measure ({'pearson', 'spearman'}, default 'pearson') – The specified distance measure is utilized for clustering.

  • random_seed (int, optional, default None) – The random seed is utilized to shuffle the data prior to clustering.

  • threads (int, optional, default 8) – The number of threads to use.

property best_labels_

Label of each point. Each point may belongs to a set of canopies, the best label is is determined by minimizing the distance to the cluster center. ndarray

property cluster_centers_

The cluster centers. ndarray.

fit(x)

Compute canopy clustering.

Parameters:

x (array_like) – A 2-D array. Training data to cluster.

fit_predict(x)

Computing the centroids of clusters and predicting the cluster assignment for each sample.

Parameters:

x (array_like) – A 2-D array. Training data to cluster.

Returns:

Cluster assignment for each sample.

Return type:

ndarray

property labels_

Label of each point. Each point may belongs to a set of canopies (soft clustering). List[List[int]]

predict(x)

Predict the cluster that each sample in x is most likely to belong to.

Parameters:

x (array_like) – A 2-D array. New data to predict.

Returns:

Cluster assignment for each sample.

Return type:

ndarray