Monday, March 13, 2017

Andrew Ng machine learning

Gradient Descent vs Normal Equation
Normal equation good for smaller feature size

Normal Equation Noninvertibility
too many features (m <= n)
redundant features (linearly dependent)
pinv vs inv in Octave.
Normal equation - regularization makes X'X invertible even if it's not.

Sunday, March 5, 2017

Andrew ng clustering & PCA

randomly assign clusters
assign clusters to each instance
re compute clusters
example image compression - choose R,G,B as numerical features and
assign clusters to each point

example - data compression - reduce n-dimensions to K-dimensions
co variance matrix to capture non axis aligned features' variance(spread)
reconstruct original data by same matrix - U
eigen vectors
example - image compression - choose each pixel as feature - select K
most important ones

scatter3 in octave for 3D-visualization

andrew ng collaborative filtering

recommender systems - collaborative filtering

If you know weights for movie attributes, romance, action etc, you can
learn weights for user preferences.

If you know weights for user preferences, you can compute movie attributes.

If you don't know both, start with a guess for one and compute other.
then reverse. then reverse. until it converges.

But there is another efficient approach which can solve for both together.

Thursday, March 2, 2017

mysql : database size query

SELECT table_schema "DB Name", Round(Sum(data_length + index_length) /
1024 / 1024, 1) "DB Size in MB" FROM information_schema.tables GROUP
BY table_schema;

table size query

SELECT TABLE_NAME, table_rows, data_length, index_length, round(((data_length + index_length) / 1024 / 1024),2) "Size in MB" FROM information_schema.TABLES WHERE table_schema = "schema_name"

Wednesday, March 1, 2017

Octave indexing

>> tk
tk =

1 2 3
4 5 6
7 8 9

>> tk(1,:) //first row, all columns
ans =

1 2 3

>> tk(:,1) //first column, all rows
ans =


>> tk(1,1) //first row, first column
ans = 1
>> tk(1,2:3) //first row, second/third columns
ans =

2 3

>> tk(2:3,1) //2nd,3rd row, 1st column
ans =


>> tk(2:3) //If you omit column part, it's 1 by default. Also the result is in row format as opposed to the previous one
ans =

4 7
//flattening in row/column format
>> tk(:)
ans =


>> tk(:)'
ans =

1 4 7 2 5 8 3 6 9

Monday, February 27, 2017

Andrew Ng course

Online learning doesn't require learning rate configuration?

ceiling analysis - machine learning pipeline etc

Anomaly detection - Andrew Ng

Anomaly detection vs Supervised learning - when negative examples are
too few go for anamoly detection

Anamoly detection - choosing features - features should have Normal
distribution. Plot histogram and see. If not, try log(x), log(x+c),
x^0.5, x^0.2 etc. Try combination of features : CPU/Net traffic,
CPU^2/Network traffic etc

Multivariate Normal distribution - let's say memory is unusually high
for a given cpu load. But both of them individually have good enough
probability of occurring. But they are at different sides of their
respective bell curves. So we would go for multivariate Normal

Each feature modelled independently as gaussian and multiplied is same
as multivariate Gaussian when axes are aligned, i.e. all off diagonal
components are zero.

Multivariate captures correlations between features automatically.
Otherwise you have to create those unusual features manually.

But the original model is computationally cheaper and scales with
large number of features. In MV, you have to do large matrix

In MV m > n => number of examples should be more than number of
features. Not so in original. Since you can't inverse the matrix.

In MV, the covariance matrix(sigma) should be invertible. It will not
be invertible if there are redundant features, i.e. you have duplicate
features like x2 = x1 or x3 = x4 + x5 etc.

Blog Archive