Machine Learning By Andew Ng - Week 8
Clustering
Unsupervised Learning: Introduction
- 
    The data in the supervised learning problems comes with labels  
- 
    The data in the unsupervised learning problems doesn’t come with the labels 
- 
    Unsupervised learning algorithms are meant to find the structure in the dataset 
- 
    Clustering algorithm is used find groups of data point on the dataset 

K-Means Algorithm
- 
    This is a Clustering Algorithm 
- 
    Steps - 
        Cluster Centroids  
- 
        Random Initialisation  
- 
        Mark a new cluster according to the points nearer to the centroids 
- 
        Move Centroids to the average of new cluster  
- 
        Repeat until the centroid doesn’t change  
 
- 
        
- 
    K-Means Algorithm - 
        Input - 
            K - number of clusters 
- 
            Training Set 
  
- 
            
- 
        Overview Steps  
- 
        For Non-Separated Data  
 
- 
        
Optimisation Objective
- 
    Optimisation Objective - Minimise the distance between the actual point and the centroid of the cluster associated
  
- 
    Algorithm - 
        First part of the algorithm consist of minimising cost function with respect to c^i holding u_k fixed 
- 
        Second part of the algorithm consist of minimising cost function with respect to u_k 
  
- 
        
Random Initialisation
- 
    Random Initialisation - Should have K < m
- Randomly pick K training examples
- Set u_1….u_k to these K examples
  
- 
    Local Optima - K-Means can get stuck at local optima
  
- 
    Random Initialisation Extended - 
        Run K-Means for n number of times - n = 50 to 1000
 
- 
        Pick the iteration which gives the minimum cost function 
  
- 
        
Choosing the Number of Clusters
- 
    What is the right value of K ?  
- 
    Elbow Method  
- 
    Other Method - Choosing the value of K based on the application
  
Motivation
Motivation 1: Data Compression
- 
    Data Compression - Compressing the size of the data
  
- 
    2D to 1D  
- 
    3D to 2D  
Motivation 2: Visualisation
- 
    Data Visualisation - Dataset of countries with many features
  - Converting 50 D to 2 D
  - Plotting the dataset
  
Principal Component Analysis
Principal Component Analysis Problem Formulation
- 
    PCA - Dimension Reductionality Algorithm
  - Problem Formulation
  
- 
    PCA is not Linear Regression - 
        In linear regression, the error is draw 90 degree from the line to the data point 
- 
        In PCA, the projection error is drawn at a angle from the line to the data point 
    
- 
        
Principal Component Analysis Algorithm
- 
    Data Preprocessing - Mean Normalisation
  
- 
    PCA Algorithm - Reducing dimension of data
  - 
        Compute ‘Covariance Matrix’ 
- 
        Compute ‘eigenvectors’ of matrix 
    - Summary
  
Applying PCA
Reconstruction from Compressed Representation

Choosing the Number of Principal Components
- 
    Choosing k ( number of principal components ) - 
        Average squared projection error / Total variation in the data ≤ 0.01 
- 
        99 % of variance is retained 
  
- 
        
- 
    Different Algorithms  
- 
    Recommended Method  
Advice for Applying PCA
- 
    Supervised Learning Speedup  
- 
    Applications - 
        Compression - 
            Reduce memory / disk needed to store data 
- 
            Speed up learning algorithm 
 
- 
            
- 
        Visualisation 
  
- 
        
- 
    Bad Use  
- 
    Where it shouldn’t be used 