cor
Full correlation matrix functions
- pynetcor.cor.corrcoef(x, y=None, method: str = 'pearson', nan_action: str = 'auto', threads: int = 1) ndarray
Calculate the correlation coefficient between each row of two arrays.
- Parameters:
x (array_like) – A 1-D or 2-D array.
y (array_like, optional) – A 1-D or 2-D array. y has the same column length as x. If not provided, the correlation coefficient will be calculated between x and itself.
method ({'pearson', 'spearman', 'kendall'}, default 'pearson') – The method used to calculate the correlation coefficient.
nan_action ({'auto', 'ignore', 'fillMean', 'fillMedian'}, default 'auto') –
The action to take when encountering NaN values. Pearson and Spearman recommend using the ignore method and Kendall recommends using the fillMedian method. The ignore method cannot be used for Kendall.
’ignore’: the calculation ignores pairs of elements that contain NaN.
’fillMean’: fills the NaN values in each row with the mean of non-NaN values.
’fillMedian’: fills the NaN values in each row with the median of non-NaN values.
threads (int, default 1) – The number of threads to use.
- Returns:
A 2D array representing the matrix of correlation coefficients.
- Return type:
ndarray
Examples
>>> x = [1, 2, 3] >>> y = [4, 5, 6] >>> corrcoef(x, y) array([[1. , 1. ], [1. , 1. ]])
- pynetcor.cor.cortest(x, y=None, method: str = 'pearson', na_action='auto', approx_pvalue: bool = True, adjust_pvalue: bool = False, adjust_method: str = 'BH', approx_adjust_pvalue: bool = False, threads: int = 1) ndarray
Testing for correlation between each row of two arrays, using one of pearson, spearman or kendall.
- Parameters:
x (array_like) – A 1-D or 2-D array.
y (array_like, optional) – A 1-D or 2-D array. y has the same column length as x. If not provided, the correlation coefficient will be calculated between x and itself.
method ({'pearson', 'spearman', 'kendall'}, default 'pearson') – The method used to calculate the correlation coefficient.
na_action ({'auto', 'ignore', 'fillMean', 'fillMedian'}, default 'auto') –
The action to take when encountering NaN values. Pearson and Spearman recommend using the ignore method and Kendall recommends using the fillMedian method. The ignore method cannot be used for Kendall.
’ignore’: the calculation ignores pairs of elements that contain NaN.
’fillMean’: fills the NaN values in each row with the mean of non-NaN values.
’fillMedian’: fills the NaN values in each row with the median of non-NaN values.
approx_pvalue (bool, default True) – Whether to use the approximation method of p-value calculation.
adjust_pvalue (bool, default False) – Set this parameter to adjust p-value for multiple hypothesis testing.
adjust_method ({'holm', 'hochberg', 'bonferroni', 'BH', 'BY'}, default 'BH') – The method used to adjust p-value for multiple hypothesis testing.
approx_adjust_pvalue (bool, default False) – Whether to use the approximation method of p-value adjustment.
threads (int, default 1) – The number of threads to use.
- Returns:
A 2D array with 4 columns: [index1, index2, r, p], or 5 columns if adjust_pvalue is True: [index1, index2, r, p, p_adjusted]
- Return type:
ndarray
Examples
>>> x = [1, 2, 3] >>> y = [4, 5, 6] >>> cortest(x, y) array([[0. , 0. , 1. , 2.2e-16. ],
Chunked correlation matrix functions
- pynetcor.cor.chunked_corrcoef(x, y=None, method: str = 'pearson', nan_action: str = 'auto', chunk_size: int = 1024, threads: int = 1) CorrcoefIterator
Iterating for correlation between each row of two arrays into chunks.
- Parameters:
x (array_like) – A 1-D or 2-D array.
y (array_like, optional) – A 1-D or 2-D array. y has the same column length as x. If not provided, the correlation coefficient will be calculated between x and itself.
method ({'pearson', 'spearman', 'kendall'}, default 'pearson') – The method used to calculate the correlation coefficient.
nan_action ({'auto', 'ignore', 'fillMean', 'fillMedian'}, default 'auto') –
The action to take when encountering NaN values. Pearson and Spearman recommend using the ignore method and Kendall recommends using the fillMedian method. The ignore method cannot be used for Kendall.
’ignore’: the calculation ignores pairs of elements that contain NaN.
’fillMean’: fills the NaN values in each row with the mean of non-NaN values.
’fillMedian’: fills the NaN values in each row with the median of non-NaN values.
chunk_size (int, default 1024) – Rows number of correlation matrix to be calculated per chunk.
threads (int, default 1) – The number of threads to use.
- Returns:
Iterator over the computation of correlation matrix, utilizing a lazy evaluation approach that processes the data chunk by chunk.
- Return type:
CorrcoefIter
- pynetcor.cor.chunked_cortest(x, y=None, correlation_method: str = 'pearson', na_action: str = 'auto', approx_pvalue: bool = True, adjust_pvalue: bool = False, adjust_method: str = 'BH', chunk_size: int = 1024, threads: int = 1) CortestIterator
Iterating for testing the correlation between each row of two arrays into chunks, using one of pearson, spearman or kendall.
- Parameters:
x (array_like) – A 1-D or 2-D array.
y (array_like, optional) – A 1-D or 2-D array. y has the same column length as x. If not provided, the correlation coefficient will be calculated between x and itself.
correlation_method ({'pearson', 'spearman', 'kendall'}, default 'pearson') – The method used to calculate the correlation coefficient.
na_action ({'auto', 'ignore', 'fillMean', 'fillMedian'}, default 'auto') –
The action to take when encountering NaN values. Pearson and Spearman recommend using the ignore method and Kendall recommends using the fillMedian method. The ignore method cannot be used for Kendall.
’ignore’: the calculation ignores pairs of elements that contain NaN.
’fillMean’: fills the NaN values in each row with the mean of non-NaN values.
’fillMedian’: fills the NaN values in each row with the median of non-NaN values.
approx_pvalue (bool, default True) – Whether to use the approximation method of p-value calculation.
adjust_pvalue (bool, default False) – Set this parameter to approximate adjusted P-value for multiple hypothesis testing. NOTE: chunked function only supports approximate adjusted P-value.
adjust_method ({'holm', 'hochberg', 'bonferroni', 'BH', 'BY'}, default 'BH') – The method used to adjust p-value for multiple hypothesis testing.
chunk_size (int, default 1024) – chunk_size * columns number of x to be calculated per chunk.
threads (int, default 1) – The number of threads to use.
- Returns:
Iterator over the computation of testing correlations, utilizing a lazy evaluation approach that processes the data chunk by chunk.
- Return type:
CortestIter
- class pynetcor.cor.CorrcoefIterator(iter)
- class pynetcor.cor.CortestIterator(iter)
Topk correlation search
- pynetcor.cor.cor_topk(x, y=None, method: str = 'pearson', k: float = 0.01, na_action: str = 'auto', correlation_mode: str = 'both', compute_pvalue: bool = True, approx_pvalue: bool = True, chunk_size: int = 1024, threads: int = 1) ndarray
Searching the global top k correlations between each row of two arrays, using one of pearson, spearman or kendall.
- Parameters:
x (array_like) – A 2-D array.
y (array_like, optional) – A 2-D array. y has the same column length as x. If not provided, the correlation coefficient will be calculated between x and itself.
k (float, default 0.01) – The top k percentage of correlations, where k ranges from 0 to 1, or the top k number of correlations if k exceeds 1.
method ({'pearson', 'spearman', 'kendall'}, default 'pearson') – The method used to calculate the correlation coefficient.
na_action ({'auto', 'ignore', 'fillMean', 'fillMedian'}, default 'auto') –
The action to take when encountering NaN values. Pearson and Spearman recommend using the ignore method and Kendall recommends using the fillMedian method. The ignore method cannot be used for Kendall.
’ignore’: the calculation ignores pairs of elements that contain NaN.
’fillMean’: fills the NaN values in each row with the mean of non-NaN values.
’fillMedian’: fills the NaN values in each row with the median of non-NaN values.
correlation_mode ({'positive', 'negative', 'both'}, default 'both') – The mode for comparing topk correlations.
compute_pvalue (bool, default True) – Whether to calculate the p-value for each correlation.
approx_pvalue (bool, default True) – Whether to use the approximation method of p-value calculation.
chunk_size (int, default 1024) – chunk_size * columns number of x to be calculated per chunk.
threads (int, default 1) – The number of threads to use.
- Returns:
A 2D array with 4 columns: [index1, index2, r, p] or 3 columns: [index1, index2, r] if compute_pvalue is False.
- Return type:
ndarray
- pynetcor.cor.cor_topkdiff(x1, y1, x2=None, y2=None, method: str = 'pearson', k: float = 0.01, na_action: str = 'auto', chunk_size: int = 1024, threads: int = 1) ndarray
Searching the global top k differences in correlation between pairs of features across two states or timepoints, using one of pearson, spearman or kendall.
- Parameters:
x1 (array_like) – A 2-D array.
y1 (array_like) – A 2-D array. y1 has the same feature number as x1.
x2 (array_like) – A 2-D array. x2 has the same column length as x1. If not provided, the correlation coefficient will be calculated between x1 and itself.
y2 (array_like) – A 2-D array. y2 has the same column length as y1. If not provided, the correlation coefficient will be calculated between x2 and itself.
k (float, default 0.01) – The top k percentage of correlations, where k ranges from 0 to 1, or the top k number of correlations if k exceeds 1.
method ({'pearson', 'spearman', 'kendall'}, default 'pearson') – The method used to calculate the correlation coefficient.
na_action ({'auto', 'ignore', 'fillMean', 'fillMedian'}, default 'auto') –
The action to take when encountering NaN values. Pearson and Spearman recommend using the ignore method and Kendall recommends using the fillMedian method. The ignore method cannot be used for Kendall.
’ignore’: the calculation ignores pairs of elements that contain NaN.
’fillMean’: fills the NaN values in each row with the mean of non-NaN values.
’fillMedian’: fills the NaN values in each row with the median of non-NaN values.
chunk_size (int, default 1024) – chunk_size * columns number of x to be calculated per chunk.
threads (int, default 1) – The number of threads to use.
- Returns:
A 2D array with 5 columns: [index1, index2, diffCor, cor1, cor2].
- Return type:
ndarray
P-value
- pynetcor.cor.pvalue_student_t(x, df: int, approx: bool = True, threads: int = 1) ndarray
Calculate the p-value for correlations(pearson or spearman) using the Student’s t-distribution.
- Parameters:
x (array_like) – Array of correlation coefficients.
df (int) – The degrees of freedom.
approx (bool, default True) – Whether to use the approximation method of p-value calculation.
threads (int, default 1) – The number of threads to use.
- Returns:
Array has the same shape as x.
- Return type:
ndarray