Machine Learning By Andew Ng - Week 9
Density Estimation
Problem Motivation
- 
    Anomaly Detection Example - 
        Plotting the dataset, and compare it with the new datapoint for its behaviour 
- 
        If its in the same range as the dataset then the new datapoint is identified as ok 
- 
        It its not in the same range as the dataset then the new datapoint is flagged as anomaly 
  
- 
        
- 
    Density Estimation - 
        If new datapoint is less than some value ( epsilon ) then flagged as anomaly 
- 
        If new datapoint is equal to or more than some value ( epsilon ) then identified as ok 
  
- 
        
- 
    Anomaly Detection Applications - 
        Fraud Detection 
- 
        Manufacturing 
- 
        Monitoring computers in a data center 
  
- 
        
Gaussian Distribution
- 
    Gaussian Distribution - 
        Say x belongs to Real Number. 
- 
        If x is a distributed Gaussian with mean and variance 
  
- 
        
- 
    Gaussian Distribution Examples  
- 
    Parameter Estimation - Finding mean and variance from the Gaussian Distribution
  
Algorithm
- 
    Density Estimation - Big notation of pi indicates product
  
- 
    Algorithm - 
        Choose features x, that might be indicative of anomalous examples 
- 
        Fit parameters - mean and variance 
- 
        Given new example x, compute p( x ) 
- 
        p ( x ) ≥ epsilon ⇒ OK 
- 
        p ( x ) < epsilon ⇒ Anomaly 
  
- 
        
- 
    Anomaly Detection Example  
Building an Anomaly Detection System
Developing and Evaluating an Anomaly Detection System
- 
    Importance of Real Number Evaluation - When developing a learning algorithm ( choosing features, etc . ) making decisions is much easier if we have a way of evaluating our learning algorithm
  
- 
    Data Split - 60 - 20 - 20 data split
  
- 
    Evaluation - 
        Evaluation metrics - 
            True positive, false positive, false negative, true negative 
- 
            Precision / Recall 
- 
            F score 
 
- 
            
- 
        Use cross validation set to choose parameter epsilon 
  
- 
        
Anomaly Detection vs Supervised Learning
- Anomaly Detection vs Supervised Learning

- Anomaly Detection vs Supervised Learning Examples

Choosing What Features to Use
- 
    Non Gaussian Features - 
        Transform your non gaussian data to the form of gaussian by doing some operations on it and then feed it to the algorithm 
- 
        It will work even if its not transformed, but it will gives less performance 
  
- 
        
- 
    Error Analysis - Find new features by analysing the mistake done by the algorithm in flagging anomaly
  
- 
    Example - 
        Monitoring computers in a data center - 
            Choose features that might take on unusually large or small values in the event of anomaly 
- 
            x1 = memory use of computer 
- 
            x2 = number of disk accesses / sec 
- 
            x3 = CPU load 
- 
            x4 = network traffic 
- 
            x5 = CPU load / network traffic ( new feature ) 
 
- 
            
  
- 
        
Multivariate Gaussian Distribution
Multivariate Gaussian Distribution
- 
    Motivating Example - Monitoring machines in a data center
  
- 
    Multivariate Gaussian Distribution - 
        x belongs to real number 
- 
        Don’t model p ( x1 ), p ( x2 ) …. separately. 
- 
        Model p ( x ) all in one go 
- 
        Parameters: mean, covariance matrix 
  
- 
        
- 
    Multivariate Gaussian Distribution Examples - 
        Covariance matrix is altered 
- 
        Only the first diagonal 
- 
        Even alteration 
  - Uneven alteration
  - Altering second diagonal evenly
  - Altering second diagonal values negatively
  - Altering the mean value
  
- 
        
Anomaly Detection using the Multivariate Gaussian Distribution
- 
    Multivariate Gaussian Distribution - 
        Formula - Finding the parameters with the formula
  
- 
        Flow - Substituting the values of parameters in the formula
  
- 
        Relationship to the original model - It can proved as the special case of the multivariate gaussian distribution where it aligned with the axis
  
 
- 
        
- 
    Differentiation 
- Original Model vs Multivariate Gaussian

Predicting Movie Ratings
Problem Formulation
- 
    Example - 
        Predicting Movie Rating - 
            n_u ⇒ no. of users 
- 
            n_m ⇒ no. of movies 
- 
            r ( i , j ) = 1 ⇒if user j has rated movie i 
- 
            y ^ ( i , j ) ⇒ rating given by the user j to movie i ( defined only if r ( i , j ) = 1 ) 
 
- 
            
  
- 
        
Content Based Recommendations
- 
    Content Based Recommender System - This is recommender system which uses one form of linear regression
  - 
        Problem Formulation - 
            r ( i , j ) = 1 ⇒if user j has rated movie i 
- 
            y ^ ( i , j ) ⇒ rating given by the user j to movie i ( defined only if r ( i , j ) = 1 ) 
- 
            theta^j ⇒ parameter vector for user j 
- 
            x ^ i ⇒ feature vector for movie i 
  
- 
            
- 
        Optimisation Objective  
- 
        Gradient Descent Update  
 
Collaborative Filtering
Collaborative Filtering
- 
    Problem Motivation  
- 
    Optimisation Algorithm  
- 
    Collaborative Filtering - 
        Given x, to learn theta 
- 
        Given theta, to learn x 
- 
        Continuous updating both values 
  
- 
        
Collaborative Filtering Algorithm
- 
    Collaborated Formula - Formula has been concatenated of the earlier formula
  
- 
    Collaborative Flow - 
        Initialise x and theta to small random values 
- 
        Minimise J ( x , theta ) using gradient descent ( or any other advance optimisation algorithm ) 
- 
        For a user with parameter theta and a movie with ( learned ) features x, predict a star rating of theta^T 
  
- 
        
Low Rank Matrix Factorisation
Vectorisation: Low Rank Matrix Factorisation
- 
    Collaborative Filtering  
- 
    Low Rank Matrix Factorisation  
- 
    Finding Related Movies  
Implementation Detail: Mean Normalisation
- 
    Users who have not rated any movies  
- 
    Mean Normalisation 