Machine Learning By Andew Ng - Week 9

Jun 23, 2020

Density Estimation

Anomaly Detection Example
- Plotting the dataset, and compare it with the new datapoint for its behaviour
- If its in the same range as the dataset then the new datapoint is identified as ok
- It its not in the same range as the dataset then the new datapoint is flagged as anomaly
Density Estimation
- If new datapoint is less than some value ( epsilon ) then flagged as anomaly
- If new datapoint is equal to or more than some value ( epsilon ) then identified as ok
Anomaly Detection Applications
- Fraud Detection
- Manufacturing
- Monitoring computers in a data center

Gaussian Distribution
- Say x belongs to Real Number.
- If x is a distributed Gaussian with mean and variance
Gaussian Distribution Examples
Parameter Estimation
- Finding mean and variance from the Gaussian Distribution

Importance of Real Number Evaluation
- When developing a learning algorithm ( choosing features, etc . ) making decisions is much easier if we have a way of evaluating our learning algorithm
Data Split
- 60 - 20 - 20 data split
Evaluation
- Evaluation metrics
  - True positive, false positive, false negative, true negative
  - Precision / Recall
  - F score
- Use cross validation set to choose parameter epsilon

AD vs SL.png

AD vs SL Example.png

Non Gaussian Features
- Transform your non gaussian data to the form of gaussian by doing some operations on it and then feed it to the algorithm
- It will work even if its not transformed, but it will gives less performance
Error Analysis
- Find new features by analysing the mistake done by the algorithm in flagging anomaly
Example
- Monitoring computers in a data center
  - Choose features that might take on unusually large or small values in the event of anomaly
  - x1 = memory use of computer
  - x2 = number of disk accesses / sec
  - x3 = CPU load
  - x4 = network traffic
  - x5 = CPU load / network traffic ( new feature )