Everything about the Normalization in Feature Engineering — part 1

Newt Tan
3 min readDec 5, 2021
https://bigdataanalyticsnews.com/pros-cons-of-feature-engineering/

Have you ever confused about how and when to use the normalization in either industrial product or research environment?

Have you ever thought about and explored the potential influence to the final prediction result with different processing order?

Have you ever researched and tried to figure out which the preprocessing method should use?

Whether to do different feature engineering on every single feature or just the same method on all features?

Here I will give you some empirical evidence I have done to show the right approaches to those questions. I will split the series of articles into three parts and explore them separately. The problems I am going to solve include:

  • The different influence among different normalization methods to final result — part 1
  • The potential effect brought by different processing order, especially when to use scaler — part 2
  • Whether to apply different scaler to different feature or apply one scaler method to all features? And explain the difference for feature engineering in research and production environment — part3

In this article, I will solve the first question with a full engineering process. If you want to know more detail. Please check the course online.

Normalization or standardization is the concept widely used in machine learning when doing feature engineering. It is applied to control the range of a feature to make sure there is no much bias on any single prediction.

I used the data at Kaggle to predict the Saleprice of houses.

I want to escape the description of feature engineering processes including replace missing, encoding categorical data and feature selection in this article, and just focus on the different between different scaler method.

In total, four main scaler methods are tested in this article:

scaler methods

I measured the prediction result and also the mse in different methods. The mse is used to evaluate the error between predicted value and real value.

prediction error

We can see from the result, there is no much gap between the final predicted result. The main different lies in the mse. In the meantime, the first three methods are definitely better than the Normalizer scaler. It also verified there is no much different to use which one in real production (the first three methods). I did not cover the clustering task in this article. But you can use another dataset for classification to have a trial if you want.

There is one sentence from book Large Scale Machine Learning with Python: converting in the range [0,1] works particularly well if you are dealing with a sparse matrix and most of your values are zero

The code has been published on this resp, you can take a look and play with yourself.

Thanks for your reading. I will bring the article to explore the second question soon.

References

Large Scale Machine Learning with Python by Bastiaan Sjardin (2016)

Deployment of Machine Learning Models

--

--

Newt Tan

In the end, the inventor is still the hero and always will be. Don’t give up on your dreams. We started with DVDs.