728x90

현재 차원에서 선형 분리 불가능하다면, 적당한 투영 방법을 찾아야 한다. 한차원 더 높여서 분할을 할수도 있는데, KLDA같은 방법이 있다. 즉 커널 함수를 사용하면 된다.

6.6 커널 함수를 참조. 커널 선형 판별 분석에 대한 내용이 있다.

'단단한 머신러닝' 카테고리의 다른 글

[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.9 (0)	2021.04.12
[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.7 (0)	2021.04.12
[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.5 (0)	2021.03.28
[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.3 (0)	2021.03.28
[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.1 - 3.2 (0)	2021.03.28

728x90

The Data Engineering Cookbook.pdf

3.27MB

전체 91pages 분량으로, 내용이 많은 것은 아니나 data engineering 관련 지식을 정리해 놓은 책입니다.

도커(Docker)나 REST APIs 등 데이터 사이언티스트가 알아야할 내용도 많이 포함되어 있습니다. 데이터 엔지니어링 책이지만 데이터 사이언티스트도 최소 이정도 지식은 있어야 하지 않을까 (개인적으로)생각합니다.

저자의 github (github.com/andkret/Cookbook)에서 관련 코드도 확인할 수 있습니다.

내용은 다음과 같은데,

책에 모든 내용이 자세히 담겨 있다기 보다는 관련 링크가 첨부된 형식입니다.

예를 들어 data science@ Uber 링크를 클릭하면, 관련 tech blog나 링크가 첨부되어 있습니다.

많은 분들께 필요한 자료인듯하여 공유합니다:)

기회가 된다면 조금씩 정리해서 공유하도록 하겠습니다.

Full Table Of Contents:

Introduction

Basic Engineering Skills

Advanced Engineering Skills

Hands On Course

Case Studies

Best Practices Cloud Platforms

130+ Free Data Sources For Data Science

1001 Interview Questions

Interview Questions

Recommended Books and Courses

저작자표시 비영리 변경금지 (새창열림)

'머신러닝 서적 (무료 e-book, review 등)' 카테고리의 다른 글

An Introduction to Statistical Learning with Applications in R 의 Second Edition!! (feat. ISLR) (0)	2021.08.23
IBM에서 만든 Machine Learning for dummies (왕초보를 위한 머신러닝) (0)	2021.08.23
[무료 e-book]Learning SQL (feat. SQL 도서 추천) (0)	2021.08.02
Numpy exercises 100 - 넘파이 연습문제 100개 (1)	2021.04.15
Rebuilding Reliable Data Pipelines Through Modern Tools - 데이터 파이프라인 구축을 위한 심플한 책 추천! (0)	2021.04.13

728x90

3.5 수박 데이터 세트 3.0𝛼를 사용해 선형 판별분석에 대한 코드를 작성하고 결과를 기술하라.

참고 답안 코드 (1):


import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

class LDA(object):

    def fit(self, X_, y_, plot_=False):
        pos = y_ == 1
        neg = y_ == 0
        X0 = X_[neg]
        X1 = X_[pos]

        u0 = X0.mean(0, keepdims=True)  # (1, n)
        u1 = X1.mean(0, keepdims=True)

        sw = np.dot((X0 - u0).T, X0 - u0) + np.dot((X1 - u1).T, X1 - u1)
        w = np.dot(np.linalg.inv(sw), (u0 - u1).T).reshape(1, -1)  # (1, n)

        if plot_:
            fig, ax = plt.subplots()
            ax.spines['right'].set_color('none')
            ax.spines['top'].set_color('none')
            ax.spines['left'].set_position(('data', 0))
            ax.spines['bottom'].set_position(('data', 0))

            plt.scatter(X1[:, 0], X1[:, 1], c='k', marker='o', label='good')
            plt.scatter(X0[:, 0], X0[:, 1], c='r', marker='x', label='bad')

            plt.xlabel('밀도', labelpad=1)
            plt.ylabel('당도')
            plt.legend(loc='upper right')

            x_tmp = np.linspace(-0.05, 0.15)
            y_tmp = x_tmp * w[0, 1] / w[0, 0]
            plt.plot(x_tmp, y_tmp, '#808080', linewidth=1)

            wu = w / np.linalg.norm(w)

            
            X0_project = np.dot(X0, np.dot(wu.T, wu))
            plt.scatter(X0_project[:, 0], X0_project[:, 1], c='r', s=15)
            for i in range(X0.shape[0]):
                plt.plot([X0[i, 0], X0_project[i, 0]], [X0[i, 1], X0_project[i, 1]], '--r', linewidth=1)

            X1_project = np.dot(X1, np.dot(wu.T, wu))
            plt.scatter(X1_project[:, 0], X1_project[:, 1], c='k', s=15)
            for i in range(X1.shape[0]):
                plt.plot([X1[i, 0], X1_project[i, 0]], [X1[i, 1], X1_project[i, 1]], '--k', linewidth=1)

            
            u0_project = np.dot(u0, np.dot(wu.T, wu))
            plt.scatter(u0_project[:, 0], u0_project[:, 1], c='#FF4500', s=60)
            u1_project = np.dot(u1, np.dot(wu.T, wu))
            plt.scatter(u1_project[:, 0], u1_project[:, 1], c='#696969', s=60)

            ax.annotate(r'u0 투영 포인트',
                        xy=(u0_project[:, 0], u0_project[:, 1]),
                        xytext=(u0_project[:, 0] - 0.2, u0_project[:, 1] - 0.1),
                        size=13,
                        va="center", ha="left",
                        arrowprops=dict(arrowstyle="->",
                                        color="k",
                                        )
                        )

            ax.annotate(r'u1 투영 포인트',
                        xy=(u1_project[:, 0], u1_project[:, 1]),
                        xytext=(u1_project[:, 0] - 0.1, u1_project[:, 1] + 0.1),
                        size=13,
                        va="center", ha="left",
                        arrowprops=dict(arrowstyle="->",
                                        color="k",
                                        )
                        )
            plt.axis("equal")  
            plt.show()

        self.w = w
        self.u0 = u0
        self.u1 = u1
        return self

    def predict(self, X):
        project = np.dot(X, self.w.T)

        wu0 = np.dot(self.w, self.u0.T)
        wu1 = np.dot(self.w, self.u1.T)

        return (np.abs(project - wu1) < np.abs(project - wu0)).astype(int)

if __name__ == '__main__':
	  #data 경로는 사용자에 맞게 바꿔줍니다
		data_path = r'C:\Users\hanmi\Documents\xiguabook\watermelon3_0_Ch.csv'

    data = pd.read_csv(data_path).values

    X = data[:, 7:9].astype(float)
    y = data[:, 9]

    y[y == 'yes'] = 1
    y[y == 'no'] = 0
    y = y.astype(int)

    lda = LDA()
    lda.fit(X, y, plot_=True)
    print(lda.predict(X))  
    print(y)

참고 답안 코드 (2):

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def LDA(X0, X1):
    """
    Get the optimal params of LDA model given training data.
    Input:
        X0: np.array with shape [N1, d]
        X1: np.array with shape [N2, d]
    Return:
        omega: np.array with shape [1, d]. Optimal params of LDA.
    """
    #shape [1, d]
    mean0 = np.mean(X0, axis=0, keepdims=True)
    mean1 = np.mean(X1, axis=0, keepdims=True)
    Sw = (X0-mean0).T.dot(X0-mean0) + (X1-mean1).T.dot(X1-mean1)
    omega = np.linalg.inv(Sw).dot((mean0-mean1).T)
    return omega

if __name__=="__main__":
    #read data from xls
    work_book = pd.read_csv("watermelon_3a.csv", header=None)
    positive_data = work_book.values[work_book.values[:, -1] == 1.0, :]
    negative_data = work_book.values[work_book.values[:, -1] == 0.0, :]
    print (positive_data)

    #LDA
    omega = LDA(negative_data[:, 1:-1], positive_data[:, 1:-1])

    #plot
    plt.plot(positive_data[:, 1], positive_data[:, 2], "bo")
    plt.plot(negative_data[:, 1], negative_data[:, 2], "r+")
    lda_left = 0
    lda_right = -(omega[0]*0.9) / omega[1]
    plt.plot([0, 0.9], [lda_left, lda_right], 'g-')

    plt.xlabel('density')
    plt.ylabel('sugar rate')
    plt.title("LDA")
    plt.show()
————————————————
source:：https://blog.csdn.net/weixin_43518584/article/details/105588310

저작자표시 비영리 변경금지 (새창열림)

'단단한 머신러닝' 카테고리의 다른 글

[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.7 (0)	2021.04.12
[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.6 (0)	2021.04.12
[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.3 (0)	2021.03.28
[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.1 - 3.2 (0)	2021.03.28
[단단한 머신러닝 - 연습문제 참고 답안] Chapter2 모델 평가 및 선택 (1)	2021.03.28

728x90

3.4 UCI 데이터 세트에서 두 개의 데이터를 골라 10-fold 교차 검증법과 Leav-one-out이 측정한 로지스틱 회귀의 오차율을 비교하라.

참고 답안 코드 (1):

import numpy as np
from sklearn import linear_model
from sklearn.model_selection import LeaveOneOut
from sklearn.model_selection import cross_val_score

#data 경로는 사용자가 지정해 놓은 위치로 변경합니다.
data_path = r'C:\Users\hanmi\Documents\xiguabook\Transfusion.txt'

data = np.loadtxt(data_path, delimiter=',').astype(int)

X = data[:, :4]
y = data[:, 4]

m, n = X.shape

# normalization
X = (X - X.mean(0)) / X.std(0)

# shuffle
index = np.arange(m)
np.random.shuffle(index)

X = X[index]
y = y[index]

# sklearn 라이브러리를 사용
# k-10 cross validation
lr = linear_model.LogisticRegression(C=2)

score = cross_val_score(lr, X, y, cv=10)

print(score.mean())

# LOO
loo = LeaveOneOut()

accuracy = 0
for train, test in loo.split(X, y):
    lr_ = linear_model.LogisticRegression(C=2)
    X_train = X[train]
    X_test = X[test]
    y_train = y[train]
    y_test = y[test]
    lr_.fit(X_train, y_train)

    accuracy += lr_.score(X_test, y_test)

print(accuracy / m)

# 결과는 비슷합니다.

# k-10

num_split = int(m / 10)
score_my = []
for i in range(10):
    lr_ = linear_model.LogisticRegression(C=2)
    test_index = range(i * num_split, (i + 1) * num_split)
    X_test_ = X[test_index]
    y_test_ = y[test_index]

    X_train_ = np.delete(X, test_index, axis=0)
    y_train_ = np.delete(y, test_index, axis=0)

    lr_.fit(X_train_, y_train_)

    score_my.append(lr_.score(X_test_, y_test_))

print(np.mean(score_my))

# LOO
score_my_loo = []
for i in range(m):
    lr_ = linear_model.LogisticRegression(C=2)
    X_test_ = X[i, :]
    y_test_ = y[i]

    X_train_ = np.delete(X, i, axis=0)
    y_train_ = np.delete(y, i, axis=0)

    lr_.fit(X_train_, y_train_)

    score_my_loo.append(int(lr_.predict(X_test_.reshape(1, -1)) == y_test_))

print(np.mean(score_my_loo))

# 결과는 모두 비슷합니다.

참고 답안 코드 (2):

```python
import numpy as np
from sklearn import linear_model
from sklearn.model_selection import LeaveOneOut
from sklearn.model_selection import cross_val_score

#data 경로는 사용자가 지정해 놓은 위치로 변경합니다.
data_path = r'C:\Users\hanmi\Documents\xiguabook\Transfusion.txt'

data = np.loadtxt(data_path, delimiter=',').astype(int)

X = data[:, :4]
y = data[:, 4]

m, n = X.shape

# normalization
X = (X - X.mean(0)) / X.std(0)

# shuffle
index = np.arange(m)
np.random.shuffle(index)

X = X[index]
y = y[index]

# sklearn 라이브러리를 사용
# k-10 cross validation
lr = linear_model.LogisticRegression(C=2)

score = cross_val_score(lr, X, y, cv=10)

print(score.mean())

# LOO
loo = LeaveOneOut()

accuracy = 0
for train, test in loo.split(X, y):
    lr_ = linear_model.LogisticRegression(C=2)
    X_train = X[train]
    X_test = X[test]
    y_train = y[train]
    y_test = y[test]
    lr_.fit(X_train, y_train)

    accuracy += lr_.score(X_test, y_test)

print(accuracy / m)

# 결과는 비슷합니다.

# k-10

num_split = int(m / 10)
score_my = []
for i in range(10):
    lr_ = linear_model.LogisticRegression(C=2)
    test_index = range(i * num_split, (i + 1) * num_split)
    X_test_ = X[test_index]
    y_test_ = y[test_index]

    X_train_ = np.delete(X, test_index, axis=0)
    y_train_ = np.delete(y, test_index, axis=0)

    lr_.fit(X_train_, y_train_)

    score_my.append(lr_.score(X_test_, y_test_))

print(np.mean(score_my))

# LOO
score_my_loo = []
for i in range(m):
    lr_ = linear_model.LogisticRegression(C=2)
    X_test_ = X[i, :]
    y_test_ = y[i]

    X_train_ = np.delete(X, i, axis=0)
    y_train_ = np.delete(y, i, axis=0)

    lr_.fit(X_train_, y_train_)

    score_my_loo.append(int(lr_.predict(X_test_.reshape(1, -1)) == y_test_))

print(np.mean(score_my_loo))

# 결과는 모두 비슷합니다.
```
source: https://blog.csdn.net/weixin_43518584/article/details/105588310
'''

저작자표시 비영리 변경금지 (새창열림)

728x90

3.3 수박 데이터 세트 3.0𝛼를 사용해 로지스틱 회귀에 대한 코드를 작성하고 결과를 기술하라.

참고 답안 (1)

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
#데이터 불러오기
data = np.array([[0.697, 0.460, 1],
        [0.774, 0.376, 1],
        [0.634, 0.264, 1],
        [0.608, 0.318, 1],
        [0.556, 0.215, 1],
        [0.403, 0.237, 1],
        [0.481, 0.149, 1],
        [0.437, 0.211, 1],
        [0.666, 0.091, 0],
        [0.243, 0.267, 0],
        [0.245, 0.057, 0],
        [0.343, 0.099, 0],
        [0.639, 0.161, 0],
        [0.657, 0.198, 0],
        [0.360, 0.370, 0],
        [0.593, 0.042, 0],
        [0.719, 0.103, 0]])
X = data[:,0:2]
y = data[:,2]
#데이터 분할
X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=0.25,random_state=33)
def sigmoid(z): 
    s = 1 / (1 + np.exp(-z))
    return s
def initialize_with_zeros(dim):
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    """
    w = np.zeros((dim, 1))
    b = 0
    assert (w.shape == (dim, 1))
    assert (isinstance(b, float) or isinstance(b, int))
    return w, b
def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above
    """
    m = X.shape[1]
    A = sigmoid(np.dot(w.T, X) + b)
      # cost 계산
    cost = -np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))/ m  
  
    dw = np.dot(X, (A - Y).T) / m
    db = np.sum(A - Y) / m
    assert (dw.shape == w.shape)
    assert (db.dtype == float)
    cost = np.squeeze(cost)
    assert (cost.shape == ())
    grads = {"dw": dw,
             "db": db}
    return grads, cost
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost=False):
    """
    This function optimizes w and b by running a gradient descent algorithm
    """
    costs = []
    for i in range(num_iterations):
        grads, cost = propagate(w, b, X, Y)
        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]
        # update rule 
        w = w - learning_rate * dw
        b = b - learning_rate * db
        # Record the costs
        if i % 100 == 0:
            costs.append(cost)
        # Print the cost every 100 training iterations
        if print_cost and i % 100 == 0:
            print("Cost after iteration %i: %f" % (i, cost))
    params = {"w": w,
              "b": b}
    grads = {"dw": dw,
             "db": db}
    return params, grads, costs
def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    '''
    m = X.shape[1]
    Y_prediction = np.zeros((1, m))
    w = w.reshape(X.shape[0], 1)
    # Compute vector "A" predicting the probabilities of a cat being present in the picture
    A = sigmoid(np.dot(w.T, X) + b)
    for i in range(A.shape[1]):
        # Convert probabilities A[0,i] to actual predictions p[0,i]
        if A[0, i] >= 0.5:
            Y_prediction[0, i] = 1
        else:
            Y_prediction[0, i] = 0
        pass
    assert (Y_prediction.shape == (1, m))
    return Y_prediction
def model(X_train, Y_train, X_test, Y_test, num_iterations, learning_rate, print_cost=False):
    # initialize parameters with zeros (≈ 1 line of code)
    w, b = initialize_with_zeros(X_train.shape[0])
    # Gradient descent (≈ 1 line of code)
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    # Retrieve parameters w and b from dictionary "parameters"
    w = parameters["w"]
    b = parameters["b"]
    # Predict test/train set examples (≈ 2 lines of code)
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)
    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test,
         "Y_prediction_train": Y_prediction_train,
         "w": w,
         "b": b,
         "learning_rate": learning_rate,
         "num_iterations": num_iterations}
    return d
X_train = X_train.T
Y_train = Y_train.T.reshape(1,X_train.shape[1])
X_test = X_test.T
Y_test = Y_test.T.reshape(1,X_test.shape[1])
d = model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = True)
# Plot learning curve (with costs)
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

참고 답안 (2):

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn import linear_model


def sigmoid(x):
    s = 1 / (1 + np.exp(-x))
    return s


def J_cost(X, y, beta):
    '''
    :param X:  sample array, shape(n_samples, n_features)
    :param y: array-like, shape (n_samples,)
    :param beta: the beta in formula 3.27 , shape(n_features + 1, ) or (n_features + 1, 1)
    :return: the result of formula 3.27
    '''
    X_hat = np.c_[X, np.ones((X.shape[0], 1))]
    beta = beta.reshape(-1, 1)
    y = y.reshape(-1, 1)

    Lbeta = -y * np.dot(X_hat, beta) + np.log(1 + np.exp(np.dot(X_hat, beta)))

    return Lbeta.sum()


def gradient(X, y, beta):
    '''
    compute the first derivative of J(i.e. formula 3.27) with respect to beta      i.e. formula 3.30
    ----------------------------------
    :param X: sample array, shape(n_samples, n_features)
    :param y: array-like, shape (n_samples,)
    :param beta: the beta in formula 3.27 , shape(n_features + 1, ) or (n_features + 1, 1)
    :return:
    '''
    X_hat = np.c_[X, np.ones((X.shape[0], 1))]
    beta = beta.reshape(-1, 1)
    y = y.reshape(-1, 1)
    p1 = sigmoid(np.dot(X_hat, beta))

    gra = (-X_hat * (y - p1)).sum(0)

    return gra.reshape(-1, 1)


def hessian(X, y, beta):
    '''
    compute the second derivative of J(i.e. formula 3.27) with respect to beta      i.e. formula 3.31
    ----------------------------------
    :param X: sample array, shape(n_samples, n_features)
    :param y: array-like, shape (n_samples,)
    :param beta: the beta in formula 3.27 , shape(n_features + 1, ) or (n_features + 1, 1)
    :return:
    '''
    X_hat = np.c_[X, np.ones((X.shape[0], 1))]
    beta = beta.reshape(-1, 1)
    y = y.reshape(-1, 1)

    p1 = sigmoid(np.dot(X_hat, beta))

    m, n = X.shape
    P = np.eye(m) * p1 * (1 - p1)

    assert P.shape[0] == P.shape[1]
    return np.dot(np.dot(X_hat.T, P), X_hat)


def update_parameters_gradDesc(X, y, beta, learning_rate, num_iterations, print_cost):
    '''
    update parameters with gradient descent method
    --------------------------------------------
    :param beta:
    :param grad:
    :param learning_rate:
    :return:
    '''
    for i in range(num_iterations):

        grad = gradient(X, y, beta)
        beta = beta - learning_rate * grad

        if (i % 10 == 0) & print_cost:
            print('{}th iteration, cost is {}'.format(i, J_cost(X, y, beta)))

    return beta


def update_parameters_newton(X, y, beta, num_iterations, print_cost):
    '''
    update parameters with Newton method
    :param beta:
    :param grad:
    :param hess:
    :return:
    '''

    for i in range(num_iterations):

        grad = gradient(X, y, beta)
        hess = hessian(X, y, beta)
        beta = beta - np.dot(np.linalg.inv(hess), grad)

        if (i % 10 == 0) & print_cost:
            print('{}th iteration, cost is {}'.format(i, J_cost(X, y, beta)))
    return beta


def initialize_beta(n):
    beta = np.random.randn(n + 1, 1) * 0.5 + 1
    return beta


def logistic_model(X, y, num_iterations=100, learning_rate=1.2, print_cost=False, method='gradDesc'):
    '''
    :param X:
    :param y:~
    :param num_iterations:
    :param learning_rate:
    :param print_cost:
    :param method: str 'gradDesc' or 'Newton'
    :return:
    '''
    m, n = X.shape
    beta = initialize_beta(n)

    if method == 'gradDesc':
        return update_parameters_gradDesc(X, y, beta, learning_rate, num_iterations, print_cost)
    elif method == 'Newton':
        return update_parameters_newton(X, y, beta, num_iterations, print_cost)
    else:
        raise ValueError('Unknown solver %s' % method)


def predict(X, beta):
    X_hat = np.c_[X, np.ones((X.shape[0], 1))]
    p1 = sigmoid(np.dot(X_hat, beta))

    p1[p1 >= 0.5] = 1
    p1[p1 < 0.5] = 0

    return p1


if __name__ == '__main__':
    data_path = r'C:\Users\hanmi\Documents\xiguabook\watermelon3_0_Ch.csv'
    #
    data = pd.read_csv(data_path).values

    is_good = data[:, 9] == 'yes'
    is_bad = data[:, 9] == 'no'

    X = data[:, 7:9].astype(float)
    y = data[:, 9]

    y[y == 'yes'] = 1
    y[y == 'no'] = 0
    y = y.astype(int)

    plt.scatter(data[:, 7][is_good], data[:, 8][is_good], c='k', marker='o')
    plt.scatter(data[:, 7][is_bad], data[:, 8][is_bad], c='r', marker='x')

    plt.xlabel('밀도')
    plt.ylabel('당도')

    # 결과 시각화
    beta = logistic_model(X, y, print_cost=True, method='gradDesc', learning_rate=0.3, num_iterations=1000)
    w1, w2, intercept = beta
    x1 = np.linspace(0, 1)
    y1 = -(w1 * x1 + intercept) / w2

    ax1, = plt.plot(x1, y1, label=r'my_logistic_gradDesc')

    lr = linear_model.LogisticRegression(solver='lbfgs', C=1000)  # 注意sklearn的逻辑回归中，C越大表示正则化程度越低。
    lr.fit(X, y)

    lr_beta = np.c_[lr.coef_, lr.intercept_]
    print(J_cost(X, y, lr_beta))

    # 시각화
    w1_sk, w2_sk = lr.coef_[0, :]

    x2 = np.linspace(0, 1)
    y2 = -(w1_sk * x2 + lr.intercept_) / w2

    ax2, = plt.plot(x2, y2, label=r'sklearn_logistic')

    plt.legend(loc='upper right')
    plt.show()

참고 답안 2의 source는 github.com/han1057578619/MachineLearning_Zhouzhihua_ProblemSets/blob/master/ch3--%E7%BA%BF%E6%80%A7%E6%A8%A1%E5%9E%8B/3.3/3.3-LogisticRegression.py 입니다.

저작자표시 비영리 변경금지 (새창열림)

'단단한 머신러닝' 카테고리의 다른 글

[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.6 (0)	2021.04.12
[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.5 (0)	2021.03.28
[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.1 - 3.2 (0)	2021.03.28
[단단한 머신러닝 - 연습문제 참고 답안] Chapter2 모델 평가 및 선택 (1)	2021.03.28
[단단한 머신러닝 - 연습문제 참고 답안] Chapter1 서론 (0)	2021.03.27

728x90

3장 해답은 코드가 많아 한번에 올리지 않고 분할해서 업로드하겠습니다.

3.1 식 3.2에서 바이어스 항 b에 대해 고려하지 않아도 되는 상황을 설명하라.

샘플 x의 어떤 속성 xi가 고정값일 경우, wixi+b는 편향항과 동일하기 때문에 wibi+b 는 b가 된다.

3.2 파라미터 w에 대해, 로지스틱 회귀의 목표 함수 식 3.18은 넌컨벡스non-convex이지만, 해당 로그 우도 함수는 컨벡스convex라는 것을 증명하라.

만약 어떤 다변수 함수(다변량 함수)가 컨벡스(convex)라면, Hessian 행렬은 positive semi-definite 행렬이다.

$xx^T$는 단위 행렬과 합동이기 때문에 $xx^T$는 positive semi-definite 행렬이다.

y값의 영역이(0,1) 이고 y ∈ (0.5, 1)일때 y(y-1)(1-2y)<0은 $d/dw^T(dy/dw)$ negative semi-definite 를 도출 가능하기 때문에, $y = 1/(1+e^-(w^Tx+b)$)는 non-convex이다.

확률 p1 ∈ (0,1) 일 때, p1(x; β)(1-p1(x;β)) ≥ 이기 때문에,

는 컨벡스이다.

저작자표시 비영리 변경금지 (새창열림)

'단단한 머신러닝' 카테고리의 다른 글

[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.5 (0)	2021.03.28
[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.3 (0)	2021.03.28
[단단한 머신러닝 - 연습문제 참고 답안] Chapter2 모델 평가 및 선택 (1)	2021.03.28
[단단한 머신러닝 - 연습문제 참고 답안] Chapter1 서론 (0)	2021.03.27
단단한 머신러닝 연습문제 해답에 관해 (4)	2020.02.16

728x90

책 소스코드가 있는 github 주소는 아래와 같습니다:)

깃헙 주소:

github.com/quant4junior/algoTrade

저작자표시 비영리 변경금지 (새창열림)

'퀀트 투자를 위한 인공지능 트레이딩' 카테고리의 다른 글

미국 주식 투자를 위한 감성분석 Tool - FinBERT (0)	2022.01.23
퀀트투자를 위한 인공지능 트레이딩: 파이썬과 케라스를 활용한 머신러닝/딥러닝 퀀트 전략 기술 (0)	2021.03.28

728x90

넷플릭스(Netflix)에 맞서기 위해 디즈니에서 인수한 OTT 플랫폼 훌루(Hulu)!

Hulu의 데이터 사이언티스트 & 데이터 엔지니어들이 모여서 만든 데이터 사이언티스트 & 데이터 엔지니어를 위한 인터뷰 가이드서 <데이터 과학자와 데이터 엔지니어를 위한 인터뷰 문답집>의 목차를 살펴볼게요.

sample 파일은

https://jpub.tistory.com/attachment/cfile25.uf@9989694A5EF1BDCE0EC2E4.pdf

여기서 다운로드 받아 주세요^^

인터뷰 문답집이지만 사실 데이터 사이언스 개념을 정리하기 위한 서적입니다.

혹시 관련해서 궁금하신 부분이 있다면 문의는 언제든 환영합니다:)

저작자표시 비영리 변경금지 (새창열림)

'데이터 과학자와 데이터 엔지니어를 위한 인터뷰 문답집' 카테고리의 다른 글

데이터 사이언티스트/데이터 분석가/데이터 엔지니어 인터뷰 정보 제공 사이트 (0)	2022.03.18
[실전 데이터사이언티스트 면접 문제] 클러스터링 알고리즘 평가 (0)	2021.09.07
데이터 사이언티스트 면접을 위한 치트시트 (Cheat Sheets for Machine Learning and Data Science) (0)	2021.08.26
머신러닝 딥러닝 인터뷰 문제 150+ (0)	2021.08.23
데이터 과학자와 데이터 엔지니어를 위한 인터뷰 문답집 (0)	2021.03.28

728x90

jpub.tistory.com/1057

데이터 과학자와 데이터 엔지니어를 위한 인터뷰 문답집

로지스틱 회귀, 랜덤 포레스트 등 전통적인 머신러닝에서 GANs, 강화학습 등 최신 알고리즘까지! ■ 도서구매 사이트(가나다순) [교보문고] [도서11번가] [반디앤루니스] [알라딘] [영풍문고] [예스

jpub.tistory.com

저작자표시 비영리 변경금지 (새창열림)

'데이터 과학자와 데이터 엔지니어를 위한 인터뷰 문답집' 카테고리의 다른 글

데이터 사이언티스트/데이터 분석가/데이터 엔지니어 인터뷰 정보 제공 사이트 (0)	2022.03.18
[실전 데이터사이언티스트 면접 문제] 클러스터링 알고리즘 평가 (0)	2021.09.07
데이터 사이언티스트 면접을 위한 치트시트 (Cheat Sheets for Machine Learning and Data Science) (0)	2021.08.26
머신러닝 딥러닝 인터뷰 문제 150+ (0)	2021.08.23
데이터 사이언티스트 면접 문제 한 눈에 살펴 보기 (0)	2021.03.28

728x90

저작자표시 비영리 변경금지 (새창열림)

'퀀트 투자를 위한 인공지능 트레이딩' 카테고리의 다른 글

미국 주식 투자를 위한 감성분석 Tool - FinBERT (0)	2022.01.23
퀀트 투자를 위한 인공지능 트레이딩 Github (0)	2021.03.28

DataManyo

분류 전체보기

[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.6

'단단한 머신러닝' 카테고리의 다른 글

The Data Engineering Cookbook [pdf 파일 첨부]

Contents:

Full Table Of Contents:

Introduction

Basic Engineering Skills

Advanced Engineering Skills

Hands On Course

Case Studies

Best Practices Cloud Platforms

130+ Free Data Sources For Data Science

1001 Interview Questions

Recommended Books and Courses

'머신러닝 서적 (무료 e-book, review 등)' 카테고리의 다른 글

[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.5

'단단한 머신러닝' 카테고리의 다른 글

[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.4

[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.3

'단단한 머신러닝' 카테고리의 다른 글

[단단한 머신러닝 - 연습문제 참고 답안]Chapter3 - 선형 모델 3.1 - 3.2

'단단한 머신러닝' 카테고리의 다른 글

퀀트 투자를 위한 인공지능 트레이딩 Github

'퀀트 투자를 위한 인공지능 트레이딩' 카테고리의 다른 글

데이터 사이언티스트 면접 문제 한 눈에 살펴 보기

'데이터 과학자와 데이터 엔지니어를 위한 인터뷰 문답집' 카테고리의 다른 글

데이터 과학자와 데이터 엔지니어를 위한 인터뷰 문답집

'데이터 과학자와 데이터 엔지니어를 위한 인터뷰 문답집' 카테고리의 다른 글

퀀트투자를 위한 인공지능 트레이딩: 파이썬과 케라스를 활용한 머신러닝/딥러닝 퀀트 전략 기술

'퀀트 투자를 위한 인공지능 트레이딩' 카테고리의 다른 글

+ Recent posts

티스토리툴바