Finding DMR

Finding DMR by Wilcoxon, t-Student, Kolmogorov-Smirnow tests or logistic regression, logistic regression with mixed models, logistic regression with mixed models with correlation matrix. In Ttest, Wilcoxon and Ks are compared methylation rate between x and y prob on the same position and chromosome and null hypothesis is that mean (median, distribution respectively) of methylation rate. In these methods alternative hypothesis is two sided and these sorts regions based on criticial value.

find_DMR(data, methods, p.value.log.reg = NULL, p.value.reg.mixed = NULL,
  p.value.reg.corr.mixed = NULL, beta.coef.max = 30)

Arguments

data	There are two options: 1. dataframe with specific columns: chr, poz, prob, no, meth, unmeth, meth.rate. This dataframe is result of function preprocessing. 2. dataframe with specific columns: chr, poz, prob, no, meth, unmeth, meth.rate, tiles and possible tiles.common columns. This dataframe is result of function create.tiles.min.gap or create.tiles.fixed.length.
methods	vectors with given methods. Possible values are: 'Wilcoxon', 'Ttest', 'KS', 'Reg.Log', 'Reg.Mixed', 'Reg.Corr.Mixed'. 'Wilcoxon' - Wilcoxon signed test; 'Ttest' - t-Student test with unequal variance; 'KS' - Kolmogorov-Smirnov test; 'Reg.Log' - Wald test of grouping variable from logistic regression; 'Reg.Mixed' - Wald test of grouping variable from logistic regression with mixed effects; 'Reg.Corr.Mixed' - Wald test of grouping variable from logistic regression with mixed effect and estimated previous correlation matrix
p.value.log.reg	if not NULL regions with p.value of prob variable smaller than p.value.log.reg are returned and decreasingly ordered by absolute value of beta coefficient of prob variable otherwise regions ale increasingly ordered by p.value
p.value.reg.mixed	if not NULL regions with p.value of prob variable smaller than p.value.log.reg are returned and decreasingly ordered by absolute value of beta coefficient of prob variable otherwise regions ale increasingly ordered by p.value
p.value.reg.corr.mixed	if not NULL regions with p.value of prob variable smaller than p.value.log.reg are returned and decreasingly ordered by absolute value of beta coefficient of prob variable otherwise regions ale increasingly ordered by p.value
beta.coef.max	only results which have absolute value of beta.coef less than this parameter are returned from Log.Reg, Reg.Mixed, Reg.Corr.Mixed. This prevent cases when algorithm did not convergence well

Value

list object. Elements of list are results of given methods. The most interesting regions are on the top

Details

In regression methods, number of success are number of methylated citosines and failures are number of unmethylated citosines. Output from this methods is beta coefficient of indicator variable from regression model and criticial value from Wald test on indicator variable. Indicator variable is equal 1 if observations are from x prob and 0 otherwise. These methods order regions based on beta coefficients of grouping variable or p.values of grouping variable. In mixed models explantatory variable is only indicator variable and positions of chromosome are random effects. In standard logistic regression explantatory variables are also position of chromosome.

Examples

data('schizophrenia')
control <- schizophrenia %>% filter(category == 'control') %>%
dplyr::select(-category)

disease <- schizophrenia %>% filter(category == 'disease') %>%
 dplyr::select(-category)

data <- preprocessing(control, disease)
data.tiles <- create_tiles_max_gap(data, gaps.length = 100)
data.tiles.small <- data.tiles %>% filter(tiles < 30)

 #finding DMR by all methods with sorting on p.values
find_DMR(data.tiles.small, c('Wilcoxon', 'Ttest', 'KS', 'Reg.Log', 'Reg.Mixed', 'Reg.Corr.Mixed'))
#> [1] "Started: Finding DMR by Wilcoxon test"
#> [1] "Started: Finding DMR by t-test"
#> [1] "Started: Finding DMR by KS test"
#> [1] "Started: Finding DMR by Logistic Regression"
#> [1] "Started: Finding DMR by Logistic Regression with Mixed Effects"
#> [1] "Started: Finding DMR by Logistic Regression with Mixed Effects with Correlation Matrix"
#> 
|==========================                            | 48% ~2 s remaining     
|===============================                       | 59% ~2 s remaining     
|=======================================               | 72% ~1 s remaining     
|========================================              | 76% ~1 s remaining     
|============================================          | 83% ~1 s remaining     
|==============================================        | 86% ~1 s remaining     
|================================================      | 90% ~0 s remaining     
|======================================================|100% ~0 s remaining     
#> $Wilcoxon
#> # A tibble: 29 x 4
#>      chr  start    end    p.value
#>    <chr>  <dbl>  <dbl>      <dbl>
#>  1  chr1  84100  84338 0.01073342
#>  2  chr1  82959  83254 0.05333685
#>  3  chr1  86359  86505 0.05333685
#>  4  chr1  83588  83698 0.05447404
#>  5  chr1  81698  81863 0.07186064
#>  6  chr1  84722  84935 0.09751254
#>  7  chr1 220557 220666 0.12500000
#>  8  chr1 221169 221365 0.12500000
#>  9  chr1  85472  85575 0.14891467
#> 10  chr1  81984  82025 0.17356817
#> # ... with 19 more rows
#> 
#> $Ttest
#> # A tibble: 29 x 4
#>      chr  start    end      p.value
#>    <chr>  <dbl>  <dbl>        <dbl>
#>  1  chr1  83588  83698 3.048765e-06
#>  2  chr1  86359  86505 2.378342e-04
#>  3  chr1  82959  83254 3.665470e-04
#>  4  chr1 221169 221365 4.016274e-04
#>  5  chr1  81984  82025 6.787773e-04
#>  6  chr1  83880  83980 7.896496e-04
#>  7  chr1  84722  84935 5.759169e-03
#>  8  chr1  87096  87175 4.736525e-02
#>  9  chr1 220557 220666 5.260321e-02
#> 10  chr1 220872 220952 4.013260e-01
#> # ... with 19 more rows
#> 
#> $KS
#> # A tibble: 29 x 4
#>      chr  start    end     p.value
#>    <chr>  <dbl>  <dbl>       <dbl>
#>  1  chr1  84100  84338 0.001823764
#>  2  chr1  82959  83254 0.013475890
#>  3  chr1  83588  83698 0.013475890
#>  4  chr1  86359  86505 0.013475890
#>  5  chr1  81698  81863 0.036631053
#>  6  chr1  84722  84935 0.036631053
#>  7  chr1 221169 221365 0.036631053
#>  8  chr1  81984  82025 0.099561848
#>  9  chr1  83880  83980 0.099561848
#> 10  chr1  85472  85575 0.099561848
#> # ... with 19 more rows
#> 
#> $Reg.Log
#> # A tibble: 20 x 5
#>      chr  start    end      p.value  beta.coef
#>    <chr>  <dbl>  <dbl>        <dbl>      <dbl>
#>  1  chr1  83588  83698 1.519918e-05 2.19688537
#>  2  chr1  82959  83254 1.100881e-04 1.57739904
#>  3  chr1  84100  84338 2.882855e-04 1.65822808
#>  4  chr1  86359  86505 3.656493e-03 2.31792789
#>  5  chr1  84722  84935 8.574373e-03 2.81693813
#>  6  chr1  83880  83980 1.784013e-02 1.65160526
#>  7  chr1 221169 221365 3.023693e-02 0.89940934
#>  8  chr1 222086 222090 3.699038e-02 1.25276297
#>  9  chr1  82603  82625 5.504190e-02 1.05605267
#> 10  chr1  85472  85575 8.465175e-02 1.28093385
#> 11  chr1 221803 221854 8.517635e-02 0.91629073
#> 12  chr1  87096  87175 9.549951e-02 1.58870244
#> 13  chr1  86661  86668 1.655336e-01 1.26851133
#> 14  chr1  85292  85316 1.913045e-01 1.18562367
#> 15  chr1  81698  81863 2.679230e-01 0.31153329
#> 16  chr1  81984  82025 3.180319e-01 0.32472153
#> 17  chr1 220872 220952 3.946347e-01 0.54692060
#> 18  chr1  81412  81442 4.334074e-01 0.31153329
#> 19  chr1 220557 220666 6.029379e-01 0.21107465
#> 20  chr1 219816 219880 9.372464e-01 0.04864693
#> 
#> $Reg.Mixed
#> # A tibble: 11 x 5
#>      chr  start    end      p.value  beta.coef
#>    <chr>  <dbl>  <dbl>        <dbl>      <dbl>
#>  1  chr1  83588  83698 1.521418e-05 2.19565841
#>  2  chr1  82959  83254 1.181530e-04 1.56687830
#>  3  chr1  86359  86505 3.811671e-03 2.30258509
#>  4  chr1  84722  84935 8.732648e-03 2.80748410
#>  5  chr1  83880  83980 1.788381e-02 1.65068087
#>  6  chr1 221169 221365 2.998610e-02 0.89794159
#>  7  chr1  87096  87175 9.556622e-02 1.58696506
#>  8  chr1  81984  82025 3.181493e-01 0.32463778
#>  9  chr1 220872 220952 3.825471e-01 0.55961579
#> 10  chr1 220557 220666 5.744138e-01 0.22665737
#> 11  chr1 219816 219880 9.370590e-01 0.04879016
#> 
#> $Reg.Corr.Mixed
#> # A tibble: 12 x 5
#>      chr  start    end     p.value  beta.coef
#>    <chr>  <dbl>  <dbl>       <dbl>      <dbl>
#>  1  chr1  82959  83254 0.007918476  2.2600171
#>  2  chr1  83588  83698 0.010380179  3.2544319
#>  3  chr1  86359  86505 0.036781488  5.3136593
#>  4  chr1  87096  87175 0.052391053  2.4186572
#>  5  chr1  81698  81863 0.065166268  0.8531988
#>  6  chr1 220557 220666 0.112228905 -0.4947742
#>  7  chr1  85472  85575 0.154777532  2.6553906
#>  8  chr1  83880  83980 0.188767710  2.6947221
#>  9  chr1  81984  82025 0.260540995  1.0858214
#> 10  chr1 219816 219880 0.343682945 -0.2581625
#> 11  chr1 220872 220952 0.450486788 -0.6860944
#> 12  chr1 221169 221365 0.712882389  0.1152671
#> 

#finding DMR by 'Reg.Log', 'Reg.Mixed', 'Reg.Corr.Mixed'  with sorting on beta values
find_DMR(data.tiles.small, c('Reg.Log', 'Reg.Mixed', 'Reg.Corr.Mixed'), p.value.log.reg = 0.01, p.value.reg.mixed = 0.02, p.value.reg.corr.mixed=0.03)
#> [1] "Started: Finding DMR by Logistic Regression"
#> [1] "Started: Finding DMR by Logistic Regression with Mixed Effects"
#> [1] "Started: Finding DMR by Logistic Regression with Mixed Effects with Correlation Matrix"
#> 
|==========================                            | 48% ~2 s remaining     
|===============================                       | 59% ~2 s remaining     
|=======================================               | 72% ~1 s remaining     
|========================================              | 76% ~1 s remaining     
|============================================          | 83% ~1 s remaining     
|==============================================        | 86% ~1 s remaining     
|================================================      | 90% ~0 s remaining     
|====================================================  | 97% ~0 s remaining     
#> $Reg.Log
#> # A tibble: 5 x 5
#>     chr start   end      p.value beta.coef
#>   <chr> <dbl> <dbl>        <dbl>     <dbl>
#> 1  chr1 84722 84935 8.574373e-03  2.816938
#> 2  chr1 86359 86505 3.656493e-03  2.317928
#> 3  chr1 83588 83698 1.519918e-05  2.196885
#> 4  chr1 84100 84338 2.882855e-04  1.658228
#> 5  chr1 82959 83254 1.100881e-04  1.577399
#> 
#> $Reg.Mixed
#> # A tibble: 5 x 5
#>     chr start   end      p.value beta.coef
#>   <chr> <dbl> <dbl>        <dbl>     <dbl>
#> 1  chr1 84722 84935 8.732648e-03  2.807484
#> 2  chr1 86359 86505 3.811671e-03  2.302585
#> 3  chr1 83588 83698 1.521418e-05  2.195658
#> 4  chr1 83880 83980 1.788381e-02  1.650681
#> 5  chr1 82959 83254 1.181530e-04  1.566878
#> 
#> $Reg.Corr.Mixed
#> # A tibble: 2 x 5
#>     chr start   end     p.value beta.coef
#>   <chr> <dbl> <dbl>       <dbl>     <dbl>
#> 1  chr1 83588 83698 0.010380179  3.254432
#> 2  chr1 82959 83254 0.007918476  2.260017
#> 

#finding DMR only by 'Reg.Log' with sorting on beta values and 'Wilcoxon' with sorting on p.values
find_DMR(data.tiles.small, c('Wilcoxon', 'Reg.Log'), p.value.log.reg = 0.001)
#> [1] "Started: Finding DMR by Wilcoxon test"
#> [1] "Started: Finding DMR by Logistic Regression"
#> $Wilcoxon
#> # A tibble: 29 x 4
#>      chr  start    end    p.value
#>    <chr>  <dbl>  <dbl>      <dbl>
#>  1  chr1  84100  84338 0.01073342
#>  2  chr1  82959  83254 0.05333685
#>  3  chr1  86359  86505 0.05333685
#>  4  chr1  83588  83698 0.05447404
#>  5  chr1  81698  81863 0.07186064
#>  6  chr1  84722  84935 0.09751254
#>  7  chr1 220557 220666 0.12500000
#>  8  chr1 221169 221365 0.12500000
#>  9  chr1  85472  85575 0.14891467
#> 10  chr1  81984  82025 0.17356817
#> # ... with 19 more rows
#> 
#> $Reg.Log
#> # A tibble: 3 x 5
#>     chr start   end      p.value beta.coef
#>   <chr> <dbl> <dbl>        <dbl>     <dbl>
#> 1  chr1 83588 83698 1.519918e-05  2.196885
#> 2  chr1 84100 84338 2.882855e-04  1.658228
#> 3  chr1 82959 83254 1.100881e-04  1.577399
#>

Arguments

Value

Details

Examples

Contents