Median normalization
med.norm.Rd
Normalize a training dataset so that each array shares a same median and store the median from the training dataset as the reference to frozen median normalize a test dataset. Also two other options are available: to only normalize a training dataset but not frozen normalize a test dataset, or vise versa.
Arguments
- train
the training dataset to be median normalized. The dataset must have rows as probes and columns as samples. This can be left unspecified if
ref.dis
is suppied for frozen normalize test set.- test
the test dataset to be frozen median normalized. The dataset must have rows as probes and columns as samples. The number of rows must equal to the number of rows in the training set. By default, the test set is not specified (
test = NULL
) and no frozen normalization will be performed.- ref.dis
the reference distribution for frozen median normalize test set against previously normalized training set. This is required when
train
is not supplied. By default,ref.dis = NULL
.
Value
a list of two datasets and one reference distribution:
- train.mn
the normalized training set, if training set is specified
- test.fmn
the frozen normalized test set, if test set is specified
- ref.dis
the reference distribution
Examples
set.seed(101)
group.id <- substr(colnames(nuhdata.pl), 7, 7)
train.ind <- colnames(nuhdata.pl)[c(sample(which(group.id == "E"), size = 64),
sample(which(group.id == "V"), size = 64))]
train.dat <- nuhdata.pl[, train.ind]
test.dat <- nuhdata.pl[, !colnames(nuhdata.pl) %in% train.ind]
# normalize only training set
data.mn <- med.norm(train = train.dat)
str(data.mn)
#> List of 3
#> $ train.mn: num [1:1810, 1:128] 8.5 8.39 8.14 8.47 8.45 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:1810] "A_25_P00011991" "A_25_P00011991" "A_25_P00011991" "A_25_P00011991" ...
#> .. ..$ : chr [1:128] "GL5140E" "JB5556E" "JB4783E" "GL4527E" ...
#> $ test.fmn: NULL
#> $ ref.dis : num 5.43
# normalize training set and frozen normalize test set
data.mn <- med.norm(train = train.dat, test = test.dat)
str(data.mn)
#> List of 3
#> $ train.mn: num [1:1810, 1:128] 8.5 8.39 8.14 8.47 8.45 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:1810] "A_25_P00011991" "A_25_P00011991" "A_25_P00011991" "A_25_P00011991" ...
#> .. ..$ : chr [1:128] "GL5140E" "JB5556E" "JB4783E" "GL4527E" ...
#> $ test.fmn: num [1:1810, 1:64] 6.48 6.92 7.21 5.92 6.14 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:1810] "A_25_P00011991" "A_25_P00011991" "A_25_P00011991" "A_25_P00011991" ...
#> .. ..$ : chr [1:64] "JB4166E" "JB5669E" "JB4112E" "JB5847E" ...
#> $ ref.dis : num 5.43
# frozen normalize test set with reference distribution
ref <- med.norm(train = train.dat)$ref.dis
data.mn <- med.norm(test = test.dat, ref.dis = ref)
str(data.mn)
#> List of 3
#> $ train.mn: NULL
#> $ test.fmn: num [1:1810, 1:64] 6.48 6.92 7.21 5.92 6.14 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:1810] "A_25_P00011991" "A_25_P00011991" "A_25_P00011991" "A_25_P00011991" ...
#> .. ..$ : chr [1:64] "JB4166E" "JB5669E" "JB4112E" "JB5847E" ...
#> $ ref.dis : num 5.43