Two methods of data normalization
Why normalization
When we draw heat maps, linear regression, neural networks, etc., we often need to normalize the data first. That’s because the values of some variables belong to different magnitude levels, such as variables around 10,000 and variables around 100. Then when drawing heat maps or machine learning, because this 10,000 exists, variables at the level of 100 The changes or sample differences will become insignificant and will not show up on the heat map, or will not play a role in machine learning training. Therefore, we have to normalize different variables so that they are at the same level of magnitude.
normalize method
There are two common ones, zscore normalization and 01 normalization.
- Zscore normalization, the value of each variable is subtracted from the mean of the variable, and then divided by the variance of the change. In fact, it is to find the zscore of the normal distribution, so that the normalized value is positive and negative.
-
01 normalization, the value of each variable is subtracted from the minimum value of the variable, and then divided by the difference between the maximum value and the minimum value of the change, so that the normalized value obtained is between 0 and 1.
Code
R language code implementation
# Note that the input x here is a matrix of numbers, or a data.frame of numbers # 01 normalize scale01 <- function(x, low = min(x), high = max(x)) { x = (x - low)/(high - low) x } # zscore normalize # Normalize each column scale <- function(x){ colMeans = rm = colMeans(x, na.rm = T) x = sweep(x, 2, rm) colSDs = sx = apply(x, 2, sd, na.rm = T) x = sweep(x, 2, sx, "/") return(x) } # Normalize each row scale <- function(x){ rm = rowMeans(x, na.rm = T) x = sweep(x, 1, rm) sx = apply(x, 1, sd, na.rm = T) x = sweep(x, 1, sx, "/") return(x) }
Original code
https://github.com/DavidQuigley/WCDT_WGBS/blob/master/scripts/2019_05_15_WGBS_figure_1B.R