cor

Full correlation matrix functions

pynetcor.cor.corrcoef(x, y=None, method: str = 'pearson', nan_action: str = 'auto', threads: int = 1) ndarray

Calculate the correlation coefficient between each row of two arrays.

Parameters:
  • x (array_like) – A 1-D or 2-D array.

  • y (array_like, optional) – A 1-D or 2-D array. y has the same column length as x. If not provided, the correlation coefficient will be calculated between x and itself.

  • method ({'pearson', 'spearman', 'kendall'}, default 'pearson') – The method used to calculate the correlation coefficient.

  • nan_action ({'auto', 'ignore', 'fillMean', 'fillMedian'}, default 'auto') –

    The action to take when encountering NaN values. Pearson and Spearman recommend using the ignore method and Kendall recommends using the fillMedian method. The ignore method cannot be used for Kendall.

    • ’ignore’: the calculation ignores pairs of elements that contain NaN.

    • ’fillMean’: fills the NaN values in each row with the mean of non-NaN values.

    • ’fillMedian’: fills the NaN values in each row with the median of non-NaN values.

  • threads (int, default 1) – The number of threads to use.

Returns:

A 2D array representing the matrix of correlation coefficients.

Return type:

ndarray

Examples

>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> corrcoef(x, y)
array([[1.        , 1.        ],
        [1.        , 1.        ]])
pynetcor.cor.cortest(x, y=None, method: str = 'pearson', na_action='auto', approx_pvalue: bool = True, adjust_pvalue: bool = False, adjust_method: str = 'BH', approx_adjust_pvalue: bool = False, threads: int = 1) ndarray

Testing for correlation between each row of two arrays, using one of pearson, spearman or kendall.

Parameters:
  • x (array_like) – A 1-D or 2-D array.

  • y (array_like, optional) – A 1-D or 2-D array. y has the same column length as x. If not provided, the correlation coefficient will be calculated between x and itself.

  • method ({'pearson', 'spearman', 'kendall'}, default 'pearson') – The method used to calculate the correlation coefficient.

  • na_action ({'auto', 'ignore', 'fillMean', 'fillMedian'}, default 'auto') –

    The action to take when encountering NaN values. Pearson and Spearman recommend using the ignore method and Kendall recommends using the fillMedian method. The ignore method cannot be used for Kendall.

    • ’ignore’: the calculation ignores pairs of elements that contain NaN.

    • ’fillMean’: fills the NaN values in each row with the mean of non-NaN values.

    • ’fillMedian’: fills the NaN values in each row with the median of non-NaN values.

  • approx_pvalue (bool, default True) – Whether to use the approximation method of p-value calculation.

  • adjust_pvalue (bool, default False) – Set this parameter to adjust p-value for multiple hypothesis testing.

  • adjust_method ({'holm', 'hochberg', 'bonferroni', 'BH', 'BY'}, default 'BH') – The method used to adjust p-value for multiple hypothesis testing.

  • approx_adjust_pvalue (bool, default False) – Whether to use the approximation method of p-value adjustment.

  • threads (int, default 1) – The number of threads to use.

Returns:

A 2D array with 4 columns: [index1, index2, r, p], or 5 columns if adjust_pvalue is True: [index1, index2, r, p, p_adjusted]

Return type:

ndarray

Examples

>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> cortest(x, y)
array([[0.        , 0.        , 1.        , 2.2e-16.        ],

Chunked correlation matrix functions

pynetcor.cor.chunked_corrcoef(x, y=None, method: str = 'pearson', nan_action: str = 'auto', chunk_size: int = 1024, threads: int = 1) CorrcoefIterator

Iterating for correlation between each row of two arrays into chunks.

Parameters:
  • x (array_like) – A 1-D or 2-D array.

  • y (array_like, optional) – A 1-D or 2-D array. y has the same column length as x. If not provided, the correlation coefficient will be calculated between x and itself.

  • method ({'pearson', 'spearman', 'kendall'}, default 'pearson') – The method used to calculate the correlation coefficient.

  • nan_action ({'auto', 'ignore', 'fillMean', 'fillMedian'}, default 'auto') –

    The action to take when encountering NaN values. Pearson and Spearman recommend using the ignore method and Kendall recommends using the fillMedian method. The ignore method cannot be used for Kendall.

    • ’ignore’: the calculation ignores pairs of elements that contain NaN.

    • ’fillMean’: fills the NaN values in each row with the mean of non-NaN values.

    • ’fillMedian’: fills the NaN values in each row with the median of non-NaN values.

  • chunk_size (int, default 1024) – Rows number of correlation matrix to be calculated per chunk.

  • threads (int, default 1) – The number of threads to use.

Returns:

Iterator over the computation of correlation matrix, utilizing a lazy evaluation approach that processes the data chunk by chunk.

Return type:

CorrcoefIter

pynetcor.cor.chunked_cortest(x, y=None, correlation_method: str = 'pearson', na_action: str = 'auto', approx_pvalue: bool = True, adjust_pvalue: bool = False, adjust_method: str = 'BH', chunk_size: int = 1024, threads: int = 1) CortestIterator

Iterating for testing the correlation between each row of two arrays into chunks, using one of pearson, spearman or kendall.

Parameters:
  • x (array_like) – A 1-D or 2-D array.

  • y (array_like, optional) – A 1-D or 2-D array. y has the same column length as x. If not provided, the correlation coefficient will be calculated between x and itself.

  • correlation_method ({'pearson', 'spearman', 'kendall'}, default 'pearson') – The method used to calculate the correlation coefficient.

  • na_action ({'auto', 'ignore', 'fillMean', 'fillMedian'}, default 'auto') –

    The action to take when encountering NaN values. Pearson and Spearman recommend using the ignore method and Kendall recommends using the fillMedian method. The ignore method cannot be used for Kendall.

    • ’ignore’: the calculation ignores pairs of elements that contain NaN.

    • ’fillMean’: fills the NaN values in each row with the mean of non-NaN values.

    • ’fillMedian’: fills the NaN values in each row with the median of non-NaN values.

  • approx_pvalue (bool, default True) – Whether to use the approximation method of p-value calculation.

  • adjust_pvalue (bool, default False) – Set this parameter to approximate adjusted P-value for multiple hypothesis testing. NOTE: chunked function only supports approximate adjusted P-value.

  • adjust_method ({'holm', 'hochberg', 'bonferroni', 'BH', 'BY'}, default 'BH') – The method used to adjust p-value for multiple hypothesis testing.

  • chunk_size (int, default 1024) – chunk_size * columns number of x to be calculated per chunk.

  • threads (int, default 1) – The number of threads to use.

Returns:

Iterator over the computation of testing correlations, utilizing a lazy evaluation approach that processes the data chunk by chunk.

Return type:

CortestIter

class pynetcor.cor.CorrcoefIterator(iter)
class pynetcor.cor.CortestIterator(iter)

P-value

pynetcor.cor.pvalue_student_t(x, df: int, approx: bool = True, threads: int = 1) ndarray

Calculate the p-value for correlations(pearson or spearman) using the Student’s t-distribution.

Parameters:
  • x (array_like) – Array of correlation coefficients.

  • df (int) – The degrees of freedom.

  • approx (bool, default True) – Whether to use the approximation method of p-value calculation.

  • threads (int, default 1) – The number of threads to use.

Returns:

Array has the same shape as x.

Return type:

ndarray