欢迎访问 生活随笔!

生活随笔

当前位置: 首页 > 编程资源 > 编程问答 >内容正文

编程问答

金融风控实战——特征工程上

发布时间:2025/3/21 编程问答 30 豆豆
生活随笔 收集整理的这篇文章主要介绍了 金融风控实战——特征工程上 小编觉得挺不错的,现在分享给大家,帮大家做个参考.

特征工程 

业务建模流程 

  • 将业务抽象为分类or回归问题
  • 定义标签,得到y
  • 选取合适的样本,并匹配出全部的信息作为特征来源
  • 特征工程+模型训练+模型评价与调优(相互之间可能会有交互)
  • 输出模型报告
  • 上线与监控
  • 什么是特征

    在机器学习的背景下,特征是用来解释现象发生的单个特性或一组特性。 当这些特性转换为某种可度量的形式时,它们被称为特征。

    举个例子,假设你有一个学生列表,这个列表里包含每个学生的姓名、学习小时数、IQ和之前考试的总分数。现在,有一个新学生,你知道他/她的学习小时数和IQ,但他/她的考试分数缺失,你需要估算他/她可能获得的考试分数。

    在这里,你需要用IQ和study_hours构建一个估算分数缺失值的预测模型。所以,IQ和study_hours就成了这个模型的特征。

    特征工程可能包含的内容

  • 基础特征构造

  • 数据预处理

  • 特征衍生

  • 特征筛选

  • 这是一个完整的特征工程流程,但不是唯一的流程,每个过程都有可能会交换顺序,具体的场景需要具体分析。

    import pandas as pd import numpy as np df_train = pd.read_csv('/Users/zhucan/Desktop/金融风控实战/第三课资料/train.csv') df_train.head()

    结果:

    #查看数据基本情况 df_train.shape #(891, 12) df_train.info()

    结果:

    df_train.describe()

    结果:

    #箱线图 df_train.boxplot(column = "Age")

    结果: 

     

    import seaborn as sns sns.set(color_codes = True) np.random.seed(sum(map(ord,"distributions"))) #固定种子 sns.distplot(df_train.Age, kde = True, bins = 20, rug = True)

    结果:

    set(df_train.label) #{0, 1}

    数据预处理

    (1)缺失值

    主要用到的两个包:

  • pandas fillna  
  • sklearn Imputer

  • df_train['Age'].sample(10) #299 50.0 #408 21.0 #158 NaN #672 70.0 #172 1.0 #447 34.0 #86 16.0 #824 2.0 #527 NaN #327 36.0 #Name: Age, dtype: float64df_train['Age'].fillna(value=df_train['Age'].mean()).sample(10) #115 21.000000 #372 19.000000 #771 48.000000 #379 19.000000 #855 18.000000 #231 29.000000 #641 24.000000 #854 44.000000 #303 29.699118 #0 22.000000 #Name: Age, dtype: float64

    (2)数值型 

    数值缩放

    """取对数等变换,可以对分布做一定的缓解 可以让数值间的差异变大""" import numpy as np log_age = df_train['Age'].apply(lambda x:np.log(x)) df_train.loc[:,'log_age'] = log_age df_train.head(10)

    结果:

    """ 幅度缩放,最大最小值缩放到[0,1]区间内 """ from sklearn.preprocessing import MinMaxScaler mm_scaler = MinMaxScaler() fare_trans = mm_scaler.fit_transform(df_train[['Fare']])""" 幅度缩放,将每一列的数据标准化为正态分布 """ from sklearn.preprocessing import StandardScaler std_scaler = StandardScaler() fare_std_trans = std_scaler.fit_transform(df_train[['Fare']])""" 中位数或者四分位数去中心化数据,对异常值不敏感 """ from sklearn.preprocessing import robust_scale fare_robust_trans = robust_scale(df_train[['Fare','Age']])""" 将同一行数据规范化,前面的同一变为1以内也可以达到这样的效果 """ from sklearn.preprocessing import Normalizer normalizer = Normalizer() fare_normal_trans = normalizer.fit_transform(df_train[['Age','Fare']])

    (3)统计值

    """ 最大最小值 """ max_age = df_train['Age'].max() min_age = df_train["Age"].min()""" 分位数,极值处理,我们最粗暴的方法就是将前后1%的值替换成前后两个端点的值 """ age_quarter_01 = df_train['Age'].quantile(0.01) age_quarter_99 = df_train['Age'].quantile(0.99)""" 四则运算 """ df_train.loc[:,'family_size'] = df_train['SibSp']+df_train['Parch']+1 df_train.loc[:,'tmp'] = df_train['Age']*df_train['Pclass'] + 4*df_train['family_size']""" 多项式特征 """ from sklearn.preprocessing import PolynomialFeatures poly = PolynomialFeatures(degree=2) df_train[['SibSp','Parch']].head() poly_fea = poly.fit_transform(df_train[['SibSp','Parch']]) pd.DataFrame(poly_fea,columns = poly.get_feature_names()).head()

    (4)离散化/分箱/分桶

    """ 等距切分 """ df_train.loc[:, 'fare_cut'] = pd.cut(df_train['Fare'], 20) df_train.head()""" 等频切分做切分,但是每一部分的人数是差不多的""" """ 通常情况都是使用等频分箱,让每个区间人数差不多""" df_train.loc[:,'fare_qcut'] = pd.qcut(df_train['Fare'], 10) df_train.head()

    结果: 

    (5)BiVar图 

    """ BiVar图是指横轴为特征升序,纵轴为badrate的变化趋势 """ """ badrate曲线 """ df_train = df_train.sort_values('Fare') alist = list(set(df_train['fare_qcut'])) badrate = {} for x in alist:a = df_train[df_train.fare_qcut == x]bad = a[a.label == 1]['label'].count()good = a[a.label == 0]['label'].count()badrate[x] = bad/(bad+good) f = zip(badrate.keys(),badrate.values()) f = sorted(f,key = lambda x : x[1],reverse = True ) badrate = pd.DataFrame(f) badrate.columns = pd.Series(['cut','badrate']) badrate = badrate.sort_values('cut') print(badrate) badrate.plot("cut","badrate",figsize=(10,4)) #.plot用于前面是dataframe,series

    结果:

    一般采取等频分箱,很少等距分箱,等距分箱可能造成样本非常不均匀

    一般分5-6箱,保证badrate曲线从非严格递增转化为严格递增曲线,分箱同时要考虑占比均衡

    BIivar图(1)业务上可解释(2)bivar图太平也不好,类似星座这个变量,去掉(3)粗分箱,使bivar图严格单调递增

    (6)OneHot编码

    """ OneHot encoding/独热向量编码 """ """ 一般像男、女这种二分类categories类型的数据采取独热向量编码, 转化为0、1主要用到 pd.get_dummies """ fare_qcut_oht = pd.get_dummies(df_train[['fare_qcut']]) fare_qcut_oht.head() embarked_oht = pd.get_dummies(df_train[['Embarked']]) embarked_oht.head()

    结果: 

    onehot编码会导致维度过高的问题,可以分箱后再使用onehot

    分箱会损失信息,但会带来稳定性、鲁棒性

    (7)时间型数据

    '''时间型 日期处理''' car_sales = pd.read_csv('/Users/zhucan/Desktop/金融风控实战/第三课资料/car_data.csv') print(car_sales.head())car_sales.loc[:,'date'] = pd.to_datetime(car_sales['date_t']) print(car_sales.head())

    结果:

    car_sales.info() '''原始是字符型的,转变后变成datatime型的'''

    结果: 

    """ 取出关键时间信息 """ """ 月份 """ car_sales.loc[:,'month'] = car_sales['date'].dt.month """ 几号 """ car_sales.loc[:,'dom'] = car_sales['date'].dt.day """ 一年当中第几天 """ car_sales.loc[:,'doy'] = car_sales['date'].dt.dayofyear """ 星期几 """ car_sales.loc[:,'dow'] = car_sales['date'].dt.dayofweek print(car_sales.head())

    结果:

    (8)文本型数据

    """ 词袋模型 """ from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer() corpus = ['This is a very good class','students are very very very good','This is the third sentence','Is this the last doc','PS teacher Mei is very very handsome' ] X = vectorizer.fit_transform(corpus) print(vectorizer.get_feature_names()) X.toarray()

    结果:
    可以得到样本的词向量

    '''单分词,双分词,多分词''' vec = CountVectorizer(ngram_range=(1,3)) X_ngram = vec.fit_transform(corpus) print(vec.get_feature_names()) X_ngram.toarray()

    结果:

    """ TF-IDF """ from sklearn.feature_extraction.text import TfidfVectorizer tfidf_vec = TfidfVectorizer() tfidf_X = tfidf_vec.fit_transform(corpus) print(tfidf_vec.get_feature_names()) tfidf_X.toarray()

     结果:

    可视化 

    """ 词云图可以直观的反应哪些词作用权重比较大 """ from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer() corpus = ['This is a very good class','students are very very very good','This is the third sentence','Is this the last doc','teacher Mei is very very handsome' ] X = vectorizer.fit_transform(corpus) L = []for item in list(X.toarray()):L.append(list(item))value = [0 for i in range(len(L[0]))] for i in range(len(L[0])):for j in range(len(L)):value[i] += L[j][i]from pyecharts import WordCloud wordcloud = WordCloud(width=800,height=500) #这里是需要做的 wordcloud.add('',vectorizer.get_feature_names(),value,word_size_range=[20,100]) wordcloud

    结果:

    (9)组合特征

    """ 根据条件去判断获取组合特征 """ df_train.loc[:,'alone'] = (df_train['SibSp']==0)&(df_train['Parch']==0)

    基于时间序列进行特征衍生 

    import pandas as pd import numpy as np data = pd.read_excel('/Users/zhucan/Desktop/金融风控实战/第三课资料/textdata.xlsx') data.head() """ ft 和 gt 表示两个变量名 1-12 表示对应12个月中每个月的相应数值 """ '''ft1 指的是 离申请当天一个月内的数据计算出来的加油次数''' '''gt1 指的是 离申请当天一个月内的数据计算出来的加油金额'''

    结果:

    """ 基于时间序列进行特征衍生 """ """ 最近p个月,inv>0的月份数 inv表示传入的变量名 """ def Num(data,inv,p):df=data.loc[:,inv+'1':inv+str(p)]auto_value=np.where(df>0,1,0).sum(axis=1)return data,inv+'_num'+str(p),auto_valuedata_new = data.copy()for p in range(1,12):for inv in ['ft','gt']:data_new,columns_name,values=Num(data_new,inv,p)data_new[columns_name]=values

    结果:

    '''构建时间序列衍生特征,37个函数'''import numpy as np import pandas as pdclass time_series_feature(object):def __init__(self):passdef Num(self,data,inv,p):""":param data::param inv::param p::return: 最近p个月,inv大于0的月份个数"""df = data.loc[:,inv+'1':inv+str(p)]auto_value = np.where(df > 0,1,0).sum(axis=1)return inv+'_num'+str(p),auto_valuedef Nmz(self,data,inv,p):""":param data::param inv::param p::return: 最近p个月,inv=0的月份个数"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.where(df == 0, 1, 0).sum(axis=1)return inv + '_nmz' + str(p), auto_valuedef Evr(self,data,inv, p):""":param data::param inv::param p::return: 最近p个月,inv>0的月份数是否>=1"""df = data.loc[:, inv + '1':inv + str(p)]arr = np.where(df > 0, 1, 0).sum(axis=1)auto_value = np.where(arr, 1, 0)return inv + '_evr' + str(p), auto_valuedef Avg(self,data,inv, p):""":param p::return: 最近p个月,inv均值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmean(df, axis=1)return inv + '_avg' + str(p), auto_valuedef Tot(self,data,inv, p):""":param data::param inv::param p::return: 最近p个月,inv和"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nansum(df, axis=1)return inv + '_tot' + str(p), auto_valuedef Tot2T(self,data,inv, p):""":param data::param inv::param p::return: 最近(2,p+1)个月,inv和可以看出该变量的波动情况"""df = data.loc[:, inv + '2':inv + str(p + 1)]auto_value = df.sum(1)return inv + '_tot2t' + str(p), auto_valuedef Max(self,data,inv, p):""":param data::param inv::param p::return: 最近p个月,inv最大值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmax(df, axis=1)return inv + '_max' + str(p), auto_valuedef Min(self,data,inv, p):""":param data::param inv::param p::return: 最近p个月,inv最小值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmin(df, axis=1)return inv + '_min' + str(p), auto_valuedef Msg(self,data,inv, p):""":param data::param inv::param p::return: 最近p个月,最近一次inv>0到现在的月份数"""df = data.loc[:, inv + '1':inv + str(p)]df_value = np.where(df > 0, 1, 0)auto_value = []for i in range(len(df_value)):row_value = df_value[i, :]if row_value.max() <= 0:indexs = '0'auto_value.append(indexs)else:indexs = 1for j in row_value:if j > 0:breakindexs += 1auto_value.append(indexs)return inv + '_msg' + str(p), auto_valuedef Msz(self,data,inv, p):""":param data::param inv::param p::return: 最近p个月,最近一次inv=0到现在的月份数"""df = data.loc[:, inv + '1':inv + str(p)]df_value = np.where(df == 0, 1, 0)auto_value = []for i in range(len(df_value)):row_value = df_value[i, :]if row_value.max() <= 0:indexs = '0'auto_value.append(indexs)else:indexs = 1for j in row_value:if j > 0:breakindexs += 1auto_value.append(indexs)return inv + '_msz' + str(p), auto_valuedef Cav(self,data,inv, p):""":param p::return: 当月inv/(最近p个月inv的均值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] / np.nanmean(df, axis=1)return inv + '_cav' + str(p), auto_valuedef Cmn(self,data,inv, p):""":param data::param inv::param p::return: 当月inv/(最近p个月inv的最小值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] / np.nanmin(df, axis=1)return inv + '_cmn' + str(p), auto_valuedef Mai(self,data,inv, p):""":param data::param inv::param p::return: 最近p个月,每两个月间的inv的增长量的最大值"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k] - df_value[k + 1]value_lst.append(minus)auto_value.append(np.nanmax(value_lst))return inv + '_mai' + str(p), auto_valuedef Mad(self,data,inv, p):""":param data::param inv::param p::return: 最近p个月,每两个月间的inv的减少量的最大值"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k + 1] - df_value[k]value_lst.append(minus)auto_value.append(np.nanmax(value_lst))return inv + '_mad' + str(p), auto_valuedef Std(self,data,inv, p):""":param data::param inv::param p::return: 最近p个月,inv的标准差"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanvar(df, axis=1)return inv + '_std' + str(p), auto_valuedef Cva(self,data,inv, p):""":param p::return: 最近p个月,inv的变异系数"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmean(df, axis=1) / np.nanvar(df, axis=1)return inv + '_cva' + str(p), auto_valuedef Cmm(self,data,inv, p):""":param data::param inv::param p::return: (当月inv) - (最近p个月inv的均值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] - np.nanmean(df, axis=1)return inv + '_cmm' + str(p), auto_valuedef Cnm(self,data,inv, p):""":param p::return: (当月inv) - (最近p个月inv的最小值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] - np.nanmin(df, axis=1)return inv + '_cnm' + str(p), auto_valuedef Cxm(self,data,inv, p):""":param data::param inv::param p::return: (当月inv) - (最近p个月inv的最大值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] - np.nanmax(df, axis=1)return inv + '_cxm' + str(p), auto_valuedef Cxp(self,data,inv, p):""":param p::return: ( (当月inv) - (最近p个月inv的最大值) ) / (最近p个月inv的最大值) )"""df = data.loc[:, inv + '1':inv + str(p)]temp = np.nanmin(df, axis=1)auto_value = (df[inv + '1'] - temp) / tempreturn inv + '_cxp' + str(p), auto_valuedef Ran(self,data,inv, p):""":param data::param inv::param p::return: 最近p个月,inv的极差"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmax(df, axis=1) - np.nanmin(df, axis=1)return inv + '_ran' + str(p), auto_valuedef Nci(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )个月中,后一个月相比于前一个月增长了的月份数"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k] - df_value[k + 1]value_lst.append(minus)value_ng = np.where(np.array(value_lst) > 0, 1, 0).sum()auto_value.append(np.nanmax(value_ng))return inv + '_nci' + str(p), auto_valuedef Ncd(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )个月中,后一个月相比于前一个月减少了的月份数"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k] - df_value[k + 1]value_lst.append(minus)value_ng = np.where(np.array(value_lst) < 0, 1, 0).sum()auto_value.append(np.nanmax(value_ng))return inv + '_ncd' + str(p), auto_valuedef Ncn(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )个月中,相邻月份inv 相等的月份数"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k] - df_value[k + 1]value_lst.append(minus)value_ng = np.where(np.array(value_lst) == 0, 1, 0).sum()auto_value.append(np.nanmax(value_ng))return inv + '_ncn' + str(p), auto_valuedef Bup(self,data,inv, p):""":param p::return:desc:If 最近min( Time on book,p )个月中,对任意月份i ,都有 inv[i] > inv[i+1] 即严格递增,且inv > 0则flag = 1 Else flag = 0"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]index = 0for k in range(len(df_value) - 1):if df_value[k] > df_value[k + 1]:breakindex = + 1if index == p:value = 1else:value = 0auto_value.append(value)return inv + '_bup' + str(p), auto_valuedef Pdn(self,data,inv, p):""":param data::param inv::param p::return:desc: If 最近min( Time on book,p )个月中,对任意月份i ,都有 inv[i] < inv[i+1] ,即严格递减,且inv > 0则flag = 1 Else flag = 0"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]index = 0for k in range(len(df_value) - 1):if df_value[k + 1] > df_value[k]:breakindex = + 1if index == p:value = 1else:value = 0auto_value.append(value)return inv + '_pdn' + str(p), auto_valuedef Trm(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )个月,inv的修建均值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = []for i in range(len(df)):trm_mean = list(df.loc[i, :])trm_mean.remove(np.nanmax(trm_mean))trm_mean.remove(np.nanmin(trm_mean))temp = np.nanmean(trm_mean)auto_value.append(temp)return inv + '_trm' + str(p), auto_valuedef Cmx(self,data,inv, p):""":param data::param inv::param p::return: 当月inv / 最近p个月的inv中的最大值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = (df[inv + '1'] - np.nanmax(df, axis=1)) / np.nanmax(df, axis=1)return inv + '_cmx' + str(p), auto_valuedef Cmp(self,data,inv, p):""":param data::param inv::param p::return: ( 当月inv - 最近p个月的inv均值 ) / inv均值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = (df[inv + '1'] - np.nanmean(df, axis=1)) / np.nanmean(df, axis=1)return inv + '_cmp' + str(p), auto_valuedef Cnp(self,data,inv, p):""":param p::return: ( 当月inv - 最近p个月的inv最小值 ) /inv最小值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = (df[inv + '1'] - np.nanmin(df, axis=1)) / np.nanmin(df, axis=1)return inv + '_cnp' + str(p), auto_valuedef Msx(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )个月取最大值的月份距现在的月份数"""df = data.loc[:, inv + '1':inv + str(p)]df['_max'] = np.nanmax(df, axis=1)for i in range(1, p + 1):df[inv + str(i)] = list(df[inv + str(i)] == df['_max'])del df['_max']df_value = np.where(df == True, 1, 0)auto_value = []for i in range(len(df_value)):row_value = df_value[i, :]indexs = 1for j in row_value:if j == 1:breakindexs += 1auto_value.append(indexs)return inv + '_msx' + str(p), auto_valuedef Rpp(self,data,inv, p):""":param data::param inv::param p::return: 近p个月的均值/((p,2p)个月的inv均值)"""df1 = data.loc[:, inv + '1':inv + str(p)]value1 = np.nanmean(df1, axis=1)df2 = data.loc[:, inv + str(p):inv + str(2 * p)]value2 = np.nanmean(df2, axis=1)auto_value = value1 / value2return inv + '_rpp' + str(p), auto_valuedef Dpp(self,data,inv, p):""":param data::param inv::param p::return: 最近p个月的均值 - ((p,2p)个月的inv均值)"""df1 = data.loc[:, inv + '1':inv + str(p)]value1 = np.nanmean(df1, axis=1)df2 = data.loc[:, inv + str(p):inv + str(2 * p)]value2 = np.nanmean(df2, axis=1)auto_value = value1 - value2return inv + '_dpp' + str(p), auto_valuedef Mpp(self,data,inv, p):""":param data::param inv::param p::return: (最近p个月的inv最大值)/ (最近(p,2p)个月的inv最大值)"""df1 = data.loc[:, inv + '1':inv + str(p)]value1 = np.nanmax(df1, axis=1)df2 = data.loc[:, inv + str(p):inv + str(2 * p)]value2 = np.nanmax(df2, axis=1)auto_value = value1 / value2return inv + '_mpp' + str(p), auto_valuedef Npp(self,data,inv, p):""":param data::param inv::param p::return: (最近p个月的inv最小值)/ (最近(p,2p)个月的inv最小值)"""df1 = data.loc[:, inv + '1':inv + str(p)]value1 = np.nanmin(df1, axis=1)df2 = data.loc[:, inv + str(p):inv + str(2 * p)]value2 = np.nanmin(df2, axis=1)auto_value = value1 / value2return inv + '_npp' + str(p), auto_valuedef auto_var(self,data_new,inv,p):""":param data::param inv::param p::return: 批量调用双参数函数"""try:columns_name, values = self.Num(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Nmz(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Evr(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Avg(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Tot(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Tot2T(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Max(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Max(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Min(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Msg(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Msz(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cav(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cmn(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Std(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cva(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cmm(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cnm(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cxm(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cxp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Ran(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Nci(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Ncd(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Ncn(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Pdn(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cmx(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cmp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cnp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Msx(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Nci(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Trm(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Bup(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Mai(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Mad(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Rpp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Dpp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Mpp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Npp(data_new,inv, p)data_new[columns_name] = valuesexcept:passreturn data_newauto_var2 = time_series_feature() for p in range(1,12):for inv in ['ft','gt']:data = auto_var2.auto_var(data,inv,p) data

    结果:

    总结

    以上是生活随笔为你收集整理的金融风控实战——特征工程上的全部内容,希望文章能够帮你解决所遇到的问题。

    如果觉得生活随笔网站内容还不错,欢迎将生活随笔推荐给好友。