normalize vectors

In data mining tasks is the attributes normalizing usually a prerequisite in order to get a meaningful result. For example, if you want to calculate the euclidean distance, the attributes with relative large numerical values will have more influence of the result than the attributes with relative smaller values.

Formal of normalization:

normalized_value = (value – min)/(max – min), where

min: the minimum of the attribute

max: the maximum of the attribute

I wrote the 2 Versions of the normalization function, the first one the the map and lambda function, the second one use the numpy.tile() function and calculate the elements with the whole matrix. The first one is more concise and need less memory compare to the second ones.

	import numpy as np

	def autoNorm(dataMatrix):
	'''
	normalize the data matrix
	(Method: use the map() and lambda)

	return:
	normed data matrix: np.ndarray. value are in [0,1]

	'''

	minVals = dataMatrix.min(0)#Conny: In dataMatrix.min(0): 0 means column, 1 means line
	maxVals = dataMatrix.max(0)
	range = maxVals - minVals

	dataMatrix = map(lambda record: (record - minVals)/range, dataMatrix) #Conny: The map() function return a list not a numpy matrix.

	return np.matrix(dataMatrix) #Conny: numpy.matrix() turn the List to Matrix

view raw autoNorm.py hosted with ❤ by GitHub

	import numpy as np
	def autoNorm2(dataMatrix):
	'''
	normalize the data matrix
	(Method: use the numpy.tile() and calculate with all the matrix)

	return:
	normed data matrix: np.ndarray. value are in [0,1]

	'''
	minVals = dataMatrix.min(0)
	maxVals = dataMatrix.max(0)
	ranges = maxVals - minVals

	numberOfRecords = dataMatrix.shape[0]

	normDataSet = dataMatrix - np.tile(minVals,(numberOfRecords, 1) )

	normDataSet = normDataSet/np.tile(ranges, (numberOfRecords, 1)) #conny: numpy.tile(vector, (numberOfLines, repeatTimes))

	return normDataSet

view raw autoNorm2.py hosted with ❤ by GitHub

Codehamster

Codehamster

normalize vectors

Leave a ReplyCancel Reply

Related Posts

Install jupyterlab with docker-compose

How to generate a random secure password with python

jupyter notebook show enviornment

Leave a ReplyCancel Reply