파이썬으로 앙상블 구현하기

💡

1.앙상블 사용하지 않고 의사결정트리 단독 모델 생성

2. 앙상블 배깅으로 모델 생성

3. 앙상블 부스팅으로 모델 생성

1. 앙상블 사용하지 않는 경우

정확도 = 100
이미 우수한 모델이기에 정확도가 100이 나왔다.

💡 순서

1. 데이터 로드

2. 훈련, 테스트 데이터 분리

3. 정규화

4. 단독 의사결정트리 모델 생성

5. 모델 훈련

6. 모델 예측

7. 모델 평가

# 순서
# 1. 데이터 로드
import pandas as pd

iris = pd.read_csv("d:\\data\\iris2.csv")
iris


# 2. 훈련, 테스트 데이터 분리

x = iris.iloc[:,0:4]
y = iris['Species']

from sklearn.model_selection import train_test_split

x_train,x_test, y_train, y_test = train_test_split(x,y, test_size = 0.1,random_state =1 )

x_train.shape, x_test.shape, y_train.shape, y_test.shape
# ((135, 4), (15, 4), (135,), (15,))

# 3. 정규화
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
scaler.fit(x_train)

x_train2 = scaler.transform(x_train)
x_test2= scaler.transform(x_test)

# 4. 단독 의사결정트리 모델 생성
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(criterion = 'entropy', max_depth = 20, random_state = 2)


# 5. 모델 훈련
model.fit(x_train2, y_train)

# 6. 모델 예측
result = model.predict(x_test2)

# 7. 모델 평가
accuracy = sum(result == y_test)/ len(y_test)
print(accuracy)                               #정확도 =100

2. 앙상블 - 배깅 ⇒ 정확도 =100

# 순서
# 1. 데이터 로드
import pandas as pd

iris = pd.read_csv("d:\\data\\iris2.csv")
iris


# 2. 훈련, 테스트 데이터 분리

x = iris.iloc[:,0:4]
y = iris['Species']

from sklearn.model_selection import train_test_split

x_train,x_test, y_train, y_test = train_test_split(x,y, test_size = 0.1,random_state =1 )

x_train.shape, x_test.shape, y_train.shape, y_test.shape
# ((135, 4), (15, 4), (135,), (15,))

# 3. 정규화
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
scaler.fit(x_train)

x_train2 = scaler.transform(x_train)
x_test2= scaler.transform(x_test)

#4. ★★ 앙상블 사용(배깅 모델 만들기)
from sklearn.tree import DecisionTreeClassifier
model2 = DecisionTreeClassifier(criterion = 'entropy', max_depth = 20, random_state = 2)

from sklearn.ensemble import BaggingClassifier
bagging_model = BaggingClassifier(model2, max_samples = 0.9,n_estimators = 25, random_state = 1)

# ※ max_samples = 0.9 는 bag에 데이터를 담을 때, 훈련 데이터의 90%를 샘플링하겠다.
# n_estimators = 25 : 의사결정트리 모델을 25개 만들겠다.


#5.모델 훈련

bagging_model.fit(x_train2,y_train)


#6.모델 예측, 평가
result2 = bagging_model.predict(x_test2)


print(sum(result2 == y_test)/len(y_test))

# 정확도 = 100%

3. 앙상블 - 부스팅 ⇒ 정확도 =100

★ kaggle에서 우승한 부스팅(XGboost)

# 순서
# 1. 데이터 로드
import pandas as pd

iris = pd.read_csv("d:\\data\\iris2.csv")
iris


# 2. 훈련, 테스트 데이터 분리

x = iris.iloc[:,0:4]
y = iris['Species']

from sklearn.model_selection import train_test_split

x_train,x_test, y_train, y_test = train_test_split(x,y, test_size = 0.1,random_state =1 )

x_train.shape, x_test.shape, y_train.shape, y_test.shape
# ((135, 4), (15, 4), (135,), (15,))

# 3. 정규화
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
scaler.fit(x_train)

x_train2 = scaler.transform(x_train)
x_test2= scaler.transform(x_test)

#4. ★★ 앙상블 사용(부스팅 모델 만들기)
from sklearn.ensemble import GradientBoostingClassifier

model3 = GradientBoostingClassifier(n_estimators = 300, random_state = 2)

#5.모델 훈련

model3.fit(x_train2,y_train)


#6.모델 예측, 평가
result3 = model3.predict(x_test2)


print(sum(result3 == y_test)/len(y_test))

# 정확도 = 100%

'Data Analysis > ML,DL' 카테고리의 다른 글

R을 활용한 사회적 연결망 시각화 (0)	2022.08.30
인공신경망 파이썬으로 구현하기 (0)	2022.08.30

1. 앙상블 사용하지 않는 경우

2. 앙상블 - 배깅 ⇒ 정확도 =100

3. 앙상블 - 부스팅 ⇒ 정확도 =100

'Data Analysis > ML,DL' 카테고리의 다른 글

티스토리툴바