In data mining tasks is the attributes normalizing usually a prerequisite in order to get a meaningful result. For example, if you want to calculate the euclidean distance, the attributes with relative large numerical values will have more influence of the result than the attributes with relative smaller values.

Formal of normalization:

normalized_value = (value – min)/(max – min), where

min: the minimum of the attribute

max: the maximum of the attribute

I wrote the 2 Versions of the normalization function, the first one the the map and lambda function, the second one use the numpy.tile() function and calculate the elements with the whole matrix. The first one is more concise and need less memory compare to the second ones.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.