机器学习之线性回归

Question:
A retail company “ABC Private Limited” wants to understand the customer purchase behaviour (specifically, purchase amount) against various products of different categories. They have shared purchase summary of various customers for selected high volume products from last month.
The data set also contains customer demographics (age, gender, marital status, city_type, stay_in_current_city), product details (product_id and product category) and Total purchase_amount from last month.
Now, they want to build a model to predict the purchase amount of customer against various products which will help them to create personalized offer for customers against different products.

Explanation:

Variable Definition
User_ID User ID
Product_ID Product ID
Gender Sex of User
Age Age in bins
Occupation Occupation (Masked)
City_Category Category of the City (A,B,C)
Stay_In_Current_City_Years Number of years stay in current city
Marital_Status Marital Status
Product_Category_1 Product Category (Masked)
Product_Category_2 Product may belongs to other category also (Masked)
Product_Category_3 Product may belongs to other category also (Masked)
Purchase Purchase Amount (Target Variable)
Your model performance will be evaluated on the basis of your prediction of the purchase amount for the test data (test.csv), which contains similar data-points as train except for their purchase amount.

Test_file
Train_file

思路:先利用panda读取训练数据,然后把数据进行转化归一,接着利用sklearn的线性回归进行模型计算,接着导入测试数据并归一化,最后进行purchase预测并写入结果文件中。

Answer:

import pandas as pd
from sklearn.linear_model import LinearRegression
import sklearn
import sklearn.preprocessing

df = pd.read_csv("train.csv")

#数据转化
gender_number = {'F':'0','M':'1'}
age_number = {'0-17':'0','18-25':'1','26-35':'2','36-45':'3','46-50':'4','51-55':'5','55+':'6'}
city_category_number = {'A':'0','B':'1','C':'2'}
stay_in_current_city_years_number = {'4+':'1','0':'0','1':'1','2':'2','3':'3'}
df['Gender'] = df['Gender'].map(gender_number)
df['Age'] = df['Age'].map(age_number)
df['Stay_In_Current_City_Years'] = df['Stay_In_Current_City_Years'].map(stay_in_current_city_years_number)
df['City_Category'] = df['City_Category'].map(city_category_number)

x = df[['Gender','Age','City_Category','Occupation','Stay_In_Current_City_Years','Marital_Status','Product_Category_1']]
scaler = sklearn.preprocessing.MinMaxScaler() #归一化
x_scaler = scaler.fit_transform(x)
y = df['Purchase']


model = LinearRegression()
model.fit(x_scaler,y)
model.score(x_scaler,y)

print('Coefficient: \n',model.coef_)
print('Intercept: \n',model.intercept_)

df_test = pd.read_csv('test.csv')
df_test['Gender'] = df_test['Gender'].map(gender_number)
df_test['Age'] = df_test['Age'].map(age_number)
df_test['Stay_In_Current_City_Years'] = df_test['Stay_In_Current_City_Years'].map(stay_in_current_city_years_number)
df_test['City_Category'] = df_test['City_Category'].map(city_category_number)
x_test = df_test[['Gender','Age','City_Category','Occupation','Stay_In_Current_City_Years','Marital_Status','Product_Category_1']]
x_test_scaler = scaler.fit_transform(x_test)
y_predicted = model.predict(x_test_scaler)

df_result = pd.DataFrame({'User_ID':df_test['User_ID'],'Product_ID':df_test['Product_ID'],'Purchase':y_predicted})
print(df_result)
df_result.to_csv('result.csv')

链接:Black Friday – Like I already said – No amount of theory can beat practice. Here is a regression problem that you can try your hands on for a deeper understanding.

Author: MrHook
Link: https://bigjar.github.io/2018/01/29/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E4%B9%8B%E7%BA%BF%E6%80%A7%E5%9B%9E%E5%BD%92/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.