Lecture1 : Supervised Learning

INTRO : What is Machine Learning ?

고전적인 컴퓨터 과학에서 프로그램은 일종의 Input이다. Data와 Program을 Input으로 입력하면 컴퓨터가 계산하여 Output을 출력하는 형태로 이루어진다.

반면에 기계학습에서 프로그램은 일종의 Output이다. Data와 **Output(Label)**을 Input으로 입력하면 컴퓨터가 Ouput으로 Program을 만들어주는 형태로 작동한다.

이 것은 기계학습 중 Supervised Learning에 해당되는 이야기이지만, Unsupervised나 Reinforced Learning에서도 사정은 같다.
위 기계학습 과정에서 컴퓨터의 역할은 학습을 통해 모델의 Parameter를 찾아내는 것이다.

Untitled

SETUP : We need to find h ∈ H

Superviese Learning에서 다루는 데이터는 라벨이 있는 데이터$\left( \mathbf{x}, y \right)$이다. 각 Data Point$(\mathbf{x}_i, y_i)$는 우리가 모르는 분포 $\mathcal{P}(\mathbf{X}, Y)$로 부터 $i.i.d$( $identically \space independent$)하게 샘플링 되었다고 볼 수 있다.

Data Points의 수학적인 표현
Training 과정에서 우리의 목표는 $\mathcal{P}(\mathbf{X}, Y)$로 부터 추출된 data point $(\mathbf{x}, y)$에 대해서 높은 확률로 $h(\mathbf{x}) \approx y$를 출력하도록 $h$를 학습하는 것이다.
- h를 학습한다는 것의 수학적인 의미는 다음과 같다.
  - [생성모델] 주어진 Data Points 들로부터 $\mathcal{P}(\mathbf{X}, Y) = \mathcal P(\mathbf X\mid Y)P(Y)$를 추정하는 것이다. 이를 통해 $\mathcal{P}(Y | \mathbf{X})$의 분포를 알고, $h(\mathbf x) = \hat{y} = \argmax\limits_y \mathcal{P}(Y | \mathbf{X})$를 구하면 된다.
  - [판별모델] 주어진 Data Points 들로부터 $\mathcal{P}(Y | \mathbf{X})$를 직접 구하는 것이다. 이를 통해 $h(\mathbf x) = \hat{y} = \argmax\limits_y \mathcal{P}(Y | \mathbf{X})$를 구하면 된다.
    - 여기서 빨간 선이 $\mathcal{P}(Y|\mathbf x)$가 된다.
- h를 학습한다는 것의 공학적인 의미는 다음과 같다.
기타 논점 : “완벽한 h”
기타 논점 : Feature Vector와 Label Space의 예제

[Which H to choose ?] Hypothesis Classes and No Free Lunch Theorem

No Free Lunch Theorem은 모든 ML 알고리즘이 Assumption이 없이는 어떠한 학습도 할 수 없다는 의미이다. 따라서 함수$h$를 학습시키기 위해 적절한 $\mathcal{H} \ni h$ 선택을 통해 Assumption을 반영해야 한다.

Hypothesis Class $\mathcal{H} \ni h$ : set of possible functions
- $\mathcal{H}$를 결정하는 것은 우리가 학습하고자 하는 문제 또는 데이터(distribution $\mathcal{P}$)에 대한 Assumption을 인코딩하는 것이다.
- $\mathcal{H}$의 사례 : Decision Trees, Linear Classifiers, Artificial NN, SVM

Example

[Which h to choose ?] Loss Functions on Training Data ; Average Loss

Loss function은 **Training Data $\mathcal D$**에서 $h \in \mathcal{H}$의 성능이 얼마나 나쁜가를 평가하는 함수이다. 이 함수를 최소화하는 과정 $h = \argmin\limits_{h \in \mathcal{H}} \mathcal{L}(h)$을 통해 hypothesis class (set of possible functions) $\mathcal{H}$에서 적절한 function $h$를 찾는다.