Finding DMR by Wilcoxon, t-Student, Kolmogorov-Smirnow tests or logistic regression, logistic regression with mixed models, logistic regression with mixed models with correlation matrix. In Ttest, Wilcoxon and Ks are compared methylation rate between x and y prob on the same position and chromosome and null hypothesis is that mean (median, distribution respectively) of methylation rate. In these methods alternative hypothesis is two sided and these sorts regions based on criticial value.

find_DMR(data, methods, p.value.log.reg = NULL, p.value.reg.mixed = NULL,
  p.value.reg.corr.mixed = NULL, beta.coef.max = 30)

Arguments

data

There are two options: 1. dataframe with specific columns: chr, poz, prob, no, meth, unmeth, meth.rate. This dataframe is result of function preprocessing. 2. dataframe with specific columns: chr, poz, prob, no, meth, unmeth, meth.rate, tiles and possible tiles.common columns. This dataframe is result of function create.tiles.min.gap or create.tiles.fixed.length.

methods

vectors with given methods. Possible values are: 'Wilcoxon', 'Ttest', 'KS', 'Reg.Log', 'Reg.Mixed', 'Reg.Corr.Mixed'. 'Wilcoxon' - Wilcoxon signed test; 'Ttest' - t-Student test with unequal variance; 'KS' - Kolmogorov-Smirnov test; 'Reg.Log' - Wald test of grouping variable from logistic regression; 'Reg.Mixed' - Wald test of grouping variable from logistic regression with mixed effects; 'Reg.Corr.Mixed' - Wald test of grouping variable from logistic regression with mixed effect and estimated previous correlation matrix

p.value.log.reg

if not NULL regions with p.value of prob variable smaller than p.value.log.reg are returned and decreasingly ordered by absolute value of beta coefficient of prob variable otherwise regions ale increasingly ordered by p.value

p.value.reg.mixed

if not NULL regions with p.value of prob variable smaller than p.value.log.reg are returned and decreasingly ordered by absolute value of beta coefficient of prob variable otherwise regions ale increasingly ordered by p.value

p.value.reg.corr.mixed

if not NULL regions with p.value of prob variable smaller than p.value.log.reg are returned and decreasingly ordered by absolute value of beta coefficient of prob variable otherwise regions ale increasingly ordered by p.value

beta.coef.max

only results which have absolute value of beta.coef less than this parameter are returned from Log.Reg, Reg.Mixed, Reg.Corr.Mixed. This prevent cases when algorithm did not convergence well

Value

list object. Elements of list are results of given methods. The most interesting regions are on the top

Details

In regression methods, number of success are number of methylated citosines and failures are number of unmethylated citosines. Output from this methods is beta coefficient of indicator variable from regression model and criticial value from Wald test on indicator variable. Indicator variable is equal 1 if observations are from x prob and 0 otherwise. These methods order regions based on beta coefficients of grouping variable or p.values of grouping variable. In mixed models explantatory variable is only indicator variable and positions of chromosome are random effects. In standard logistic regression explantatory variables are also position of chromosome.

Examples

data('schizophrenia') control <- schizophrenia %>% filter(category == 'control') %>% dplyr::select(-category) disease <- schizophrenia %>% filter(category == 'disease') %>% dplyr::select(-category) data <- preprocessing(control, disease) data.tiles <- create_tiles_max_gap(data, gaps.length = 100) data.tiles.small <- data.tiles %>% filter(tiles < 30) #finding DMR by all methods with sorting on p.values find_DMR(data.tiles.small, c('Wilcoxon', 'Ttest', 'KS', 'Reg.Log', 'Reg.Mixed', 'Reg.Corr.Mixed'))
#> [1] "Started: Finding DMR by Wilcoxon test" #> [1] "Started: Finding DMR by t-test" #> [1] "Started: Finding DMR by KS test" #> [1] "Started: Finding DMR by Logistic Regression" #> [1] "Started: Finding DMR by Logistic Regression with Mixed Effects" #> [1] "Started: Finding DMR by Logistic Regression with Mixed Effects with Correlation Matrix" #> |========================== | 48% ~2 s remaining |=============================== | 59% ~2 s remaining |======================================= | 72% ~1 s remaining |======================================== | 76% ~1 s remaining |============================================ | 83% ~1 s remaining |============================================== | 86% ~1 s remaining |================================================ | 90% ~0 s remaining |======================================================|100% ~0 s remaining
#> $Wilcoxon #> # A tibble: 29 x 4 #> chr start end p.value #> <chr> <dbl> <dbl> <dbl> #> 1 chr1 84100 84338 0.01073342 #> 2 chr1 82959 83254 0.05333685 #> 3 chr1 86359 86505 0.05333685 #> 4 chr1 83588 83698 0.05447404 #> 5 chr1 81698 81863 0.07186064 #> 6 chr1 84722 84935 0.09751254 #> 7 chr1 220557 220666 0.12500000 #> 8 chr1 221169 221365 0.12500000 #> 9 chr1 85472 85575 0.14891467 #> 10 chr1 81984 82025 0.17356817 #> # ... with 19 more rows #> #> $Ttest #> # A tibble: 29 x 4 #> chr start end p.value #> <chr> <dbl> <dbl> <dbl> #> 1 chr1 83588 83698 3.048765e-06 #> 2 chr1 86359 86505 2.378342e-04 #> 3 chr1 82959 83254 3.665470e-04 #> 4 chr1 221169 221365 4.016274e-04 #> 5 chr1 81984 82025 6.787773e-04 #> 6 chr1 83880 83980 7.896496e-04 #> 7 chr1 84722 84935 5.759169e-03 #> 8 chr1 87096 87175 4.736525e-02 #> 9 chr1 220557 220666 5.260321e-02 #> 10 chr1 220872 220952 4.013260e-01 #> # ... with 19 more rows #> #> $KS #> # A tibble: 29 x 4 #> chr start end p.value #> <chr> <dbl> <dbl> <dbl> #> 1 chr1 84100 84338 0.001823764 #> 2 chr1 82959 83254 0.013475890 #> 3 chr1 83588 83698 0.013475890 #> 4 chr1 86359 86505 0.013475890 #> 5 chr1 81698 81863 0.036631053 #> 6 chr1 84722 84935 0.036631053 #> 7 chr1 221169 221365 0.036631053 #> 8 chr1 81984 82025 0.099561848 #> 9 chr1 83880 83980 0.099561848 #> 10 chr1 85472 85575 0.099561848 #> # ... with 19 more rows #> #> $Reg.Log #> # A tibble: 20 x 5 #> chr start end p.value beta.coef #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 chr1 83588 83698 1.519918e-05 2.19688537 #> 2 chr1 82959 83254 1.100881e-04 1.57739904 #> 3 chr1 84100 84338 2.882855e-04 1.65822808 #> 4 chr1 86359 86505 3.656493e-03 2.31792789 #> 5 chr1 84722 84935 8.574373e-03 2.81693813 #> 6 chr1 83880 83980 1.784013e-02 1.65160526 #> 7 chr1 221169 221365 3.023693e-02 0.89940934 #> 8 chr1 222086 222090 3.699038e-02 1.25276297 #> 9 chr1 82603 82625 5.504190e-02 1.05605267 #> 10 chr1 85472 85575 8.465175e-02 1.28093385 #> 11 chr1 221803 221854 8.517635e-02 0.91629073 #> 12 chr1 87096 87175 9.549951e-02 1.58870244 #> 13 chr1 86661 86668 1.655336e-01 1.26851133 #> 14 chr1 85292 85316 1.913045e-01 1.18562367 #> 15 chr1 81698 81863 2.679230e-01 0.31153329 #> 16 chr1 81984 82025 3.180319e-01 0.32472153 #> 17 chr1 220872 220952 3.946347e-01 0.54692060 #> 18 chr1 81412 81442 4.334074e-01 0.31153329 #> 19 chr1 220557 220666 6.029379e-01 0.21107465 #> 20 chr1 219816 219880 9.372464e-01 0.04864693 #> #> $Reg.Mixed #> # A tibble: 11 x 5 #> chr start end p.value beta.coef #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 chr1 83588 83698 1.521418e-05 2.19565841 #> 2 chr1 82959 83254 1.181530e-04 1.56687830 #> 3 chr1 86359 86505 3.811671e-03 2.30258509 #> 4 chr1 84722 84935 8.732648e-03 2.80748410 #> 5 chr1 83880 83980 1.788381e-02 1.65068087 #> 6 chr1 221169 221365 2.998610e-02 0.89794159 #> 7 chr1 87096 87175 9.556622e-02 1.58696506 #> 8 chr1 81984 82025 3.181493e-01 0.32463778 #> 9 chr1 220872 220952 3.825471e-01 0.55961579 #> 10 chr1 220557 220666 5.744138e-01 0.22665737 #> 11 chr1 219816 219880 9.370590e-01 0.04879016 #> #> $Reg.Corr.Mixed #> # A tibble: 12 x 5 #> chr start end p.value beta.coef #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 chr1 82959 83254 0.007918476 2.2600171 #> 2 chr1 83588 83698 0.010380179 3.2544319 #> 3 chr1 86359 86505 0.036781488 5.3136593 #> 4 chr1 87096 87175 0.052391053 2.4186572 #> 5 chr1 81698 81863 0.065166268 0.8531988 #> 6 chr1 220557 220666 0.112228905 -0.4947742 #> 7 chr1 85472 85575 0.154777532 2.6553906 #> 8 chr1 83880 83980 0.188767710 2.6947221 #> 9 chr1 81984 82025 0.260540995 1.0858214 #> 10 chr1 219816 219880 0.343682945 -0.2581625 #> 11 chr1 220872 220952 0.450486788 -0.6860944 #> 12 chr1 221169 221365 0.712882389 0.1152671 #>
#finding DMR by 'Reg.Log', 'Reg.Mixed', 'Reg.Corr.Mixed' with sorting on beta values find_DMR(data.tiles.small, c('Reg.Log', 'Reg.Mixed', 'Reg.Corr.Mixed'), p.value.log.reg = 0.01, p.value.reg.mixed = 0.02, p.value.reg.corr.mixed=0.03)
#> [1] "Started: Finding DMR by Logistic Regression" #> [1] "Started: Finding DMR by Logistic Regression with Mixed Effects" #> [1] "Started: Finding DMR by Logistic Regression with Mixed Effects with Correlation Matrix" #> |========================== | 48% ~2 s remaining |=============================== | 59% ~2 s remaining |======================================= | 72% ~1 s remaining |======================================== | 76% ~1 s remaining |============================================ | 83% ~1 s remaining |============================================== | 86% ~1 s remaining |================================================ | 90% ~0 s remaining |==================================================== | 97% ~0 s remaining
#> $Reg.Log #> # A tibble: 5 x 5 #> chr start end p.value beta.coef #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 chr1 84722 84935 8.574373e-03 2.816938 #> 2 chr1 86359 86505 3.656493e-03 2.317928 #> 3 chr1 83588 83698 1.519918e-05 2.196885 #> 4 chr1 84100 84338 2.882855e-04 1.658228 #> 5 chr1 82959 83254 1.100881e-04 1.577399 #> #> $Reg.Mixed #> # A tibble: 5 x 5 #> chr start end p.value beta.coef #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 chr1 84722 84935 8.732648e-03 2.807484 #> 2 chr1 86359 86505 3.811671e-03 2.302585 #> 3 chr1 83588 83698 1.521418e-05 2.195658 #> 4 chr1 83880 83980 1.788381e-02 1.650681 #> 5 chr1 82959 83254 1.181530e-04 1.566878 #> #> $Reg.Corr.Mixed #> # A tibble: 2 x 5 #> chr start end p.value beta.coef #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 chr1 83588 83698 0.010380179 3.254432 #> 2 chr1 82959 83254 0.007918476 2.260017 #>
#finding DMR only by 'Reg.Log' with sorting on beta values and 'Wilcoxon' with sorting on p.values find_DMR(data.tiles.small, c('Wilcoxon', 'Reg.Log'), p.value.log.reg = 0.001)
#> [1] "Started: Finding DMR by Wilcoxon test" #> [1] "Started: Finding DMR by Logistic Regression"
#> $Wilcoxon #> # A tibble: 29 x 4 #> chr start end p.value #> <chr> <dbl> <dbl> <dbl> #> 1 chr1 84100 84338 0.01073342 #> 2 chr1 82959 83254 0.05333685 #> 3 chr1 86359 86505 0.05333685 #> 4 chr1 83588 83698 0.05447404 #> 5 chr1 81698 81863 0.07186064 #> 6 chr1 84722 84935 0.09751254 #> 7 chr1 220557 220666 0.12500000 #> 8 chr1 221169 221365 0.12500000 #> 9 chr1 85472 85575 0.14891467 #> 10 chr1 81984 82025 0.17356817 #> # ... with 19 more rows #> #> $Reg.Log #> # A tibble: 3 x 5 #> chr start end p.value beta.coef #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 chr1 83588 83698 1.519918e-05 2.196885 #> 2 chr1 84100 84338 2.882855e-04 1.658228 #> 3 chr1 82959 83254 1.100881e-04 1.577399 #>