Sunday, December 9, 2018

R syntax

Vectors:
1. All elements of same type. All elements should be atomic, can't be broken down further.
2. c() to create a vector.
3. If you try to create a vector of vectors using c(), both the vectors be deflated into a single vector.
4. Easy to multiply/add an element to the entire vector. Even sin(), log() etc.
5. Aggregating - sum(), product(), mean(),
6. Similarly - operations between vectors of same lengths
7. numeric(6) will instantiate a vector of length 6 with all of them instantiated at 0.
8. operations on vector of different lengths - Recycling - shorter vector will be reused as many times as possible - for e.g. c(1, 2, 3, 4, 5, 6 ) + c(0,1) will give (1,3,3,5,5,7).

Generating sequences:
1:10  will give 1 to 10
10:1 will give 10 to 1
2*1:5 will give 2,4,6,8,10

repeat sequence function - rep() generates complicated seqs
seq() is also useful here

Using conditions for vectors - check each elem - for e.g. numberSeq == 2 will output seq of Logicals.
Similarly 2 vectors can be compared.
----------
applying nchar on str_vec
where str_vec <- c('a', 'bc')
will give (1, 2)
----------
Generate a complex sequence using recycling:
A1,B2,C3,D4,E1,F2,G3,H4

simpleSequence <- 1:4
stringSequence <- c("A","B","C","D","E","F","G","H")
out <- paste(stringSequence, simpleSequence, sep="")
paste() will combine any number of variables into a string
Notice that paste() has taken 2 diff sequences of types numeric and string.
------
[] are use to select elements of a vector, they are indexing operators.
-----
indexing in R starts from 1 not 0.
stringSequnce[-6] will give you all elements except 6th.
> mySeq <- 3*1:5

> print(mySeq)
[1]  3  6  9 12 15
> source('C:/projects/R/a.r', echo=TRUE)

> mySeq <- 3*1:5

> print(mySeq)
[1]  3  6  9 12 15

> print(mySeq[2:4])
[1]  6  9 12
> source('C:/projects/R/a.r', echo=TRUE)

> mySeq <- 3*1:5

> print(mySeq)
[1]  3  6  9 12 15

> print(mySeq[2:4])
[1]  6  9 12

> print(mySeq[-3])
[1]  3  6 12 15
> source('C:/projects/R/a.r', echo=TRUE)

> mySeq <- 3*1:5

> print(mySeq)
[1]  3  6  9 12 15

> print(mySeq[2:4])
[1]  6  9 12

> print(mySeq[-3])
[1]  3  6 12 15

> print(mySeq[rep(c(1,3), times=5)])
 [1] 3 9 3 9 3 9 3 9 3 9
-------------------------
> print(mySeq[c(-1,-3)])
[1]  6 12 15
-------------------
> print(mySeq[c(TRUE, FALSE)])
[1]  3  9 15
-----------
Let's invert the sign of vectors which are not equal to 9:
> mySeq[mySeq != 9] <- -mySeq[mySeq != 9]

> print(mySeq)
[1]  -3  -6   9 -12 -15
-----------------
When you use a logical vector for indexing, if it's not the same length of the original vector, it's recycled.
-----------
Each element in a vector can be given a name:
> names(mySeq) <- c("A","B","C")

> print(mySeq[c("A","C")])
 A  C
-3  9
---------------
Arrays:
Arrays are like vectors in that they can have elements of same type.
They have dimensions.
An Array is a vector with an additional attribute dimensions.
By assigning dimensions to a vector, you can turn a vector into an array.
mySeq <- 3*1:6
myArray <- mySeq
dim(myArray) <- c(2,3)
> print(myArray)
     [,1] [,2] [,3]
[1,]    3    9   15
[2,]    6   12   18

Product of all dimensions equal to number of elements in the array.
When you assign dimensions to a vector, elements are arranged accordingly.

array() function
anotherArray <- array(c(1:12), dim=c(3,2,2))
------------
Solving a set of linear equations using metrices.
--------
Factors:
Answer questions like what are top selling product categories, What are the sales in each city?
City and Category are Categorical variables. They take a limited set of values.
Factor vector is for handling categorical variables.
factor() method.
Internally Factor maps each Level(Value) to an integer.
It's like an ENUM.
-------------
tapply() and table() functions for Aggregating the data, for e.g. sum and group by
----------------
Lists and Data Frames:
---------
List can have any kind of elements.
---------
DataFrame is like a SQL table - row and columns(named) - you can perform Aggregations.
-----------
Regression:
Predict the value of one variable using other variables.
Example:
CAPM - Capital Asset Pricing Model - Find Beta of Google against NasDaq
Multiple linear regression- multiple dependent variables.
Summary: read up more on linear regression. How to determine efficacy/robustness of the model? What is T-stat/F-stat/R-squared/Adjusted R-squared etc.?





Monday, December 3, 2018

git show name and status for a commit

git show --name-status HEAD

R syntax

1. For assignment, safest to use are: <- and ->.

Printing:
2. Using a variable without assigning will print it.
3. print() prints single expression, show() can print plots,graphs,tables etc
4. cat() can print multiple results.
5. paste() can store the result of resulting string unlike cat() which prints it.
6. result of paste() can be printed by message()
7. message() can also be used like print() but it adds a newline
8. message() won't print list indices etc.

Data types:
1. Numeric - all types of numbers
2. class() function tells the datatype
3. is.numeric() is.integer() will give True/False. is.<datatypename>()
4. append L after the number to make it integer. for e.g. 4L
5. typecasting: val <- as.integer(3+5)
6. An integer is also a numeric. So numeric is like super class of integer.
7. Double is synonym of numeric.
8. Character is the datatype for strings. nchar() is like strlen().
9. For dates: DATE
10. For timestamp: POSIXCT
11. Logical - TRUE/FALSE

Data structures:
1. Vector(Default)/Array/DataFrame/Matrix/List
2. List can have different kind of elements unlike Vector.
3. Vector can have only simple datatypes.
4. Array - only same type elements. Arrays have dimensions.
5. Matrix - is a 2D array - has different functions exclusively for math operations.
6. DataFrame - is like a SQL table
7. 

Thursday, November 29, 2018

R tutorial + Stats + Types of Inferences

Random variables - Continuous,Discreet,Categorical
Their probability distributions be discreet/continuous.

First type of Inference - Computing Population mean from a Sample mean:
Problem: find mean weight of all football players in the world.
Input data: Weights of 45 players of your college. Mean = 173, StanDev = 15, N(number of samples) = 45
From this input data you have to compute the output.

Sampling distribution is a normal distribution with mu = x bar = sample mean
Sigma =  StanDev for Sampling distribution = Sample error which can be derived from StanDev of the sample = Sample StanDev/Sqrt(N)

So if sample mean = 173
number of data points = N = 45
then Sampling Distribution mean estimate = 173

If StanDev for Sample = 15
then Standard error = 15/Sqrt(45) = 2.23

So with 95% (2 Sigma) confidence we can say that the mean of population lies in 173 +- 2.23*2
-------------------------
Second type of Inference - Population Proportion - Identifying the population % - Election polling - Yes/No type of variable
Find out the winning chances of this candidate.
Pollster picks a sample of 2000 voters.
1100 say Yes for this candidate. 55%.
Sample StanDev in this case is different since it's population % case.
 = Sqrt(p*1-p/N) = Sqrt(55*45/2000) = 0.01
Here p = 55 as computed above
Standard Error  = Sample StanDev in this case = 0.01% = 1 basis point

2 Sigma = 2% here
Summary: 55% people support the candidate and we can say with 95% (2 sigma) probability that this number is off by at most 2%..
---------------------------------
Third - verifying whether the population mean is equal to a certain value - Hypothesis testing for population mean
A medical study
Let's verify this hypothesis: The average life expectancy of an Indian College Graduate is 70 years.
Let's take a sample.
N = 100
Mean life expectancy of the sample = 65
SD of life expectancy of that sample = 10

So:
Standard Error(the best estimate for the SD of the Sampling Distribution) = 10/Sqrt(100) = 1
Also,
Best point estimate of the population mean = Sample mean = 65

Now, let's perform a test of significance. What does that mean?
We need to figure out whether the difference between 65 and 70 is only due to chance or that difference is because the population mean is actually very different from 70.
First one is Null Hypothesis.
Second one is Alternative hypothesis.
If we end up accepting Null Hypothesis - it means that the mean is indeed 70.
If we accept Alternative Hypothesis - it means that the mean is not 70.

For doing this:
Compute the probability of Null hypothesis being true.
If the probability is too low - reject the Null hypothesis. Else accept it.

For doing this:
We will use Z-Statistic = Normalized distance between the Sample mean(65) and the Hypothesized population mean(70) = (Sample mean - Population mean)/Standard Error = 65-70/1 = -5
What we are doing here is (I am not sure about this part) converting it to a standard normal distribution with mean = 0 and StanDev = 1.
So now we need to check P(|Z| > 5) = P(the difference > 5) = Area under the curve for values beyond 5.

In R, compute like this: 1 - PNorm(5) - PNorm(-5) = 5.73e-07 which is very low. So Null Hypothesis is false.
What if this value was 0.1, then with 90% confidence we can say that the Null Hypothesis is false.
If it was 0.05 we can say the same thing with 95% confidence
If it was 0.01 then 99% confidence.
and so on....
-----------------------
4 - Hypothesis testing for population proportion
Verifying whether the population % is equal to a certain value
Example: 40% of the employees check Facebook first thing in the morning.

% of the sample who said yes = 43
N = 100 (number of samples)
StanDev of the sample = Sqrt(p(1-p)/n) = Sqrt(.43*.57/100) = .05
So mean = .43, SD=.05
Null hypothesis p = .4
Alternative hypothesis p <> .4
Null and Alternate hypothesis should be MECE(Mutually Exclusive and Collectively Exhaustive)
Z-statistic = p-hat - p/SD = .43 - .40/.05 = .03/.05 = .6
P(|Z| > .6) = probability that null hypothesis is true
Using R:
> print(1 - pnorm(.6) + pnorm(-.6)  )
[1] 0.5485062

So, probability is very strong that the Null Hypothesis holds.


pnorm(x) gives the area under the curve from x = -infinity to the given value of x.
So 1 - (pnorm(.6) - pnorm(-.6) ) = Area under the curve before -.6 and after .6 .
----------------------
5 - A/B testing - comparing the means of two populations
Two layouts of a website. Old A and new B.
Null hypothesis: Average minutes/visit on A <= B. B is no worse than A.
Alternate: A > B. B is worse than A.
It's the first time when the Null hypothesis has an inequality. In all earlier cases, it was the other way round. It's called one sided test otherwise it's called 2-sided test.

A has 5 mins per visit.
B has 3 mins per visit.

Null hypothesis says difference between A and B is due to chance - hence not significant.
Alternate hypothesis says it's significant.

Z-statistic in this case(when we are comparing means of 2 distributions) = (X1bar - X2bar)/Sqrt(Sigma_x1_square + Sigma_x2_square)
Sigma = standard error = StanDev/Sqrt(number_of_samples)
Here they are 0.5, 0.4
= (5-3)/sqrt(.5^2 + .4^2) = 2/sqrt(.25+.16) = 3.12
P (Z > 3.12) -notice only one sided test, no absolute value
> print(1 - pnorm(3.12)  )
[1] 0.0009042552
It's same as
pnorm(-3.12) because of symmetry of normal distribution
Probability is too low. Reject the null hypothesis.
So new layout is indeed worse.
----------------
Verifying whether 2 population % are different
Customer Surveys
Which users are more satisfied - mobile app or web
A = 100 people who use only the app
B = 100 people who use only the website
% of happy customers in A = 67%
% of happy customers in B = 63%

Standard Error = StanDev = Sqrt(p*(1-p)/100) = .047 for A and .048 for B
Null hyothesis: % voting yes on App <= website
Alternate App > website

Z-stat = .67-.63/sqrt(.047^2 + .048^2) = .04/sqrt(.002331 + .002211) = .04/.067 = .597
> print(1 - pnorm(.597)  )
[1] 0.2752537

So probability of null hypothesis = 0.27
Alternate = .73


Tuesday, November 27, 2018

Thalassemia intermedia is characterized by Hb level between 7 and 10
g/dl, MCV between 50 and 80 fl and MCH between 16 and 24 pg.
Thalassemia minor ischaracterized by reduced MCV and MCH, with
increased Hb A2 level.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2893117/

Wednesday, November 21, 2018

R and statistics

https://www.safaribooksonline.com/videos/learn-by-example/9781788996877


r <- c(1,2,3,4,5,5,6,6,8,9)
range(r)

Method 1:
bins <- seq(0,11,by=2)
intervals <- cut(r,bins,right=FALSE)
print(intervals)
print(bins)
t <- table(intervals)
print(t)
plot(t,type="h",main="rabit",xlab="intervals",ylab="count")

Method 2:
hist(r, breaks=bins)

Mean,Median,Mode:
print(mean(r))
print(median(r))
print(sort(table(r), decreasing = TRUE)[1])

Histogram is the bar chart of the frequency table.
-------------
Measuring data spread:

computing IQR - Inter quartile range - a measure of data spread

Divide your sorted data into 4 equal parts. The 3 partition boundaries inside it are Q1,Q2, Q3. 
Q3 - Q1 is IQR.
Q2 is effectively median of the data.
Q1 is the median of the first half of data.
Q3 is the median of the second half of data.

-------------
Box and whisker plots help us in visualizing IQR and outliers.
How to draw:
Draw a box starting at Q1 and ending at Q3. So width of this box is IQR.
From the median draw two whiskers in both directions (left and right). Each having length 1.5*IQR.
Now any data points beyond the end of the whiskers are outliers.
---------

Next measure of dispersion is: Standard Deviation(SD) - way more popular than IQR.
In financial markets - it's called volatility.

Essentially - how far away from the mean the points are.

SD = SQRT(Variance)
Variance = sum((mean - datapoint)^2)/n = mean of (squares of deviations)

Why is variance better than the mean of absolute values?

1. Variance is more sensitive:
For the deviations -2,4 and -3,3
Mean of abs = 3,3
But variance = (2^2 + 4^2)/2 = 10 
and (-3^2 + 3^2)/2 = 9
So variance is more for 2,4 as opposed to 3,3 which is good!

2. Variance is computationally cheaper:
An if condition(to check whether the number is negative or not) is more expensive than a square.

3. Variance has cool mathematical properties and fundamentally tied with Normal distribution.

4. Unlike IQR, Variance is sensitive to outliers. Like mean is sensitive to outliers but median is not.
So if you see median/IQR combo is not sensitive to outliers but Mean/Variance combo is sensitive.

Drawback of the variance: it's not of the same order of the dataset and the mean - since it's squared.
That's why we have SD = Sqrt(Variance)
-----------------
In R, summary() method gives a good summary of the dataset.
r <- c(1,2,3,4,5,5,6,6,8,9)

print(summary(r))

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 

   1.00    3.25    5.00    4.90    6.00    9.00 

IQR(r) method:
> print(IQR(r))

[1] 2.75
----------------------
Visually compare 2 datasets using boxplot and IQR
r1 <- c(1,1,2,4,5,5,6,6,9,10)
r2 <- c(1,2,3,4,5,5,6,6,8,9)
print(IQR(r1))
print(IQR(r2))
print(mean(r1))
print(mean(r2))
combined <- cbind(r1,r2)
boxplot(combined)
-----------------
Method for computing SD:
sd(r)
-------------






Wednesday, November 14, 2018

Hard-learned lessons in leading design at MailChimp - Aarron Walter VP of Design Education, InVision

1. Your legacy is the people you hire.
2. Bad things happen when people feel left out. Communicate early,
often. Rather than communicating the grand design at the end.
3. You don't become VP by keeping your headphones on.
4. Your legs are the best design tool - visit the customers.
5. Try to draw from many designs - ask everyone to draw and put on wall.
6. You need to start formalizing a refinement process—something at
MailChimp we called "Guns forward, guns backward." That might mean
having a team dedicated specifically to product refinement, constantly
sanding off the edges.

"The best product companies in the world have figured out how to make
constant quality improvements part of their essential DNA." –Phil
Libin, former CEO of Evernote

Monday, November 12, 2018

Building an image classifier quickly

Updated on 24th June 2020

Here are my steps:
1. Installed Ubuntu subsystem on my Windows 10 machine.
sudo -i
apt update
apt install python-pip
sudo rm -rf /etc/apt/apt.conf.d/20snapd.conf(had to run this due to an error)
apt install python-pip
apt install tensorflow 
apt update
apt install tensorflow
apt install python-pip
apt install tensorflow
apt install python3-pip
pip3 install --user --upgrade tensorflow
pip install numpy
pip install tensorflow
git clone https://github.com/googlecodelabs/tensorflow-for-poets-2 (HEAD: bc96088a4de86729920e120111f5b208f7f1cbb1)
cd tensorflow-for-poets-2
mkdir -p tf_files/images/useful
mkdir -p tf_files/images/useless
Put images in above folders

Training:
11. IMAGE_SIZE=224
12. ARCHITECTURE="mobilenet_0.50_${IMAGE_SIZE}"
13.  python -m scripts.retrain --bottleneck_dir=tf_files/bottlenecks --model_dir=tf_files/models/"${ARCHITECTURE}" --summaries_dir=tf_files/training_summaries/"${ARCHITECTURE}"  --output_graph=tf_files/retrained_graph.pb  --output_labe
ls=tf_files/retrained_labels.txt --architecture="${ARCHITECTURE}"  --image_dir=tf_files/images

Testing:
12. python -m scripts.label_image  --graph=tf_files/retrained_graph.pb  --image=YOUR_PATH_TO_IMAGE_HERE


I built it for classifying whatsapp images into personal vs generic(good morning messages etc). It works quite well.

Shell script to test and move images into respective folders:

#!/bin/bash
for filename in /mnt/c/images/actual/*.jpg; do
        echo $filename
        dest=`python -m scripts.label_image  --graph=tf_files/retrained_graph.pb  --image=$filename | head -n4 | tail -n1 |grep -o '^\S*'`
        echo $dest
        mv $filename /mnt/c/images/out/$dest
done

Friday, November 9, 2018

design of design essays

Rationalist vs Empiricist - in Software the later wins

Constraints:
are good.
General purpose artifact design is harder than special purpose one.
What's the budgeted resource - may not be dollars. Time/chip size/number of pins/UX real estate.
That resource can change too.


Design divorce:
Design as a separate thing is fairly recent phenomena.

User Model - Better wrong than vague - Truth will sooner come out of error than from confusion.

Telecollaboration:

Low tech is often good : document + phone call
Face to face time is crucial. Investing money in it makes sense.


What to design:

The hardest part of design is deciding what to design.

A chief service of a designer is helping clients discover what they want designed.

Requirements:

Any attempt to formulate all possible requirements at the start of a project will fail and would cause considerable delays.

Why does waterfall model still persist:
Contracts

Role of Process:

The trick is to hold "process" off long enough to permit great design
to occur, so that the lesser issues can be debated once the great
design is on the table—rather than smothering it in the cradle.

Thus, product processes are, properly, designed for follow-on
products. For innovation, one must step outside of process.

Thursday, November 8, 2018

Azure Centos 7.3 setup with apache + php7.3 +mongodb

#disable selinux
163 vim /etc/selinux/config
164 setenforce 0

#install php7.3
10 yum install
https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
11 yum install http://rpms.remirepo.net/enterprise/remi-release-7.rpm
12 yum install yum-utils
22 yum-config-manager --enable remi-php73
25 yum install php php-mcrypt php-cli php-gd php-curl php-mysql
php-ldap php-zip php-fileinfo
26 php -v

#httpd(apache)
1 yum install httpd
27 vim /etc/httpd/conf/httpd.conf
38 netstat -punta | grep LISTEN
60 pkill -9 httpd
57 service httpd restart

#install mongodb
89 vim /etc/yum.repos.d/mongodb-org-4.0.repo (copied from mongodb website)
90 sudo yum install -y mongodb-org
93 sudo service mongod start
95 cat /var/log/mongodb/mongod.log |grep waiting
96 sudo chkconfig mongod on
98 mongo (client)
103 mongorestore -d master mongo_dump/master/

#php mongo extension
127 yum -y install gcc php-pear php-devel
141 pecl install mongodb
142 vim /etc/php.ini (enable mongodb.so extension)
125 php -m | grep -i mongo
144 service httpd restart
145 mongo
146 setsebool -P httpd_can_network_connect on (not required if you
disable selinux)


Monday, October 29, 2018

Excel 2016 Pie/Pivot charts

If you have Gender column in your sheet which have values "Male", "Female" for every row - how to create a PIe Chart?
In Google sheets it just takes couple of clicks and quite intuitive.
Here is how to do it in Excel:

1. Select the column
2. Insert Tab - Click PivotChart - OK
3. In PivotChart Fields select and drag that column to all sections.
4. Now you have Clustered Column chart.
5. Right click the chart -> Change Chart Type -> Pie -> Ok

Excel 2016 Macros

Enabling Macros:
File -> Options -> Customize Ribbon -> Main Tabs -> Check Developer
Now Developer tab will be visible.

Inserting Macro:
Click: Developer Tab -> Insert ActiveX Controls -> Command Button
Now click anywhere on the sheet. You will see a Command Button. Right click the button and View Code.

There you can enter this code: (Give Score of 1 if Gender is Female). Column B is Gender. Column A is score.

Private Sub CommandButton1_Click()
Dim Gender
Dim Score

For i = 1 To 3
   Score = 0
   Gender = Range("B" & i).Value
   If Gender = "Female" Then
      Score = 1
   End If
   
   Range("A" & i).Value = Score
Next i
End Sub

Monday, September 17, 2018

Inner game of tennis - Concentration

1. Focus on the seams - but don't stare hard
2. Say bounce-hit
3. Focus on the sound of the ball during serve or your strokes
4. Notice the trajectory/angle as the ball bounces/height of the ball over the net
5. How to focus in between points - focus on breathing - mind wanders but gently bring it back to breathing
6. You can slow down time by focusing well
7. Why compete at all?
8. 

Sunday, September 16, 2018

Inner game of tennis

It is a painful process to fight one's way out of deep mental grooves. It's like digging yourself out of a trench. But there is a natural and more childlike method. A child doesn't dig his way out of his old grooves; he simply starts new ones! The groove may be there, but you're not in it unless you put yourself there. If you think you are controlled by a bad habit, then you will feel you have to try to break it. A child doesn't have to break the habit of crawling, because he doesn't think he has a habit. He simply leaves it as he finds walking an easier way to get around.
------------
In short, there is no need to fight old habits. Start new ones. It is the resisting of an old habit that puts you in that trench.
----------

Inner game of tennis - Chapter 2 -


Chapter 2:
THE THESIS OF THE LAST CHAPTER WAS THAT THE FIRST STEP IN bringing a greater harmony between ego-mind and body—that is, between Self 1 and Self 2—was to let go of self-judgment. Only when Self 1 stops sitting in judgment over Self 2 and its actions can he become aware of who and what Self 2 is and appreciate the processes by which it works.
-------------------
Make yourself serve vs Let yourself serve
Rather than consciously controlling your serve, visualize the path the ball should take and the way your racket should move to hit a particular spot with your serve.
Then while serving just focus on the seams of the ball and let your Self-2 do the work for you.
---
Asking for form
Ask your Self-2(body) to imitate a particular movement of making a forehand. Give it an image which it should try to emulate. Swing your racket a few times to enact that movement.
---
Similarly Asking for results - where the ball should land then close your eyes to visualize the trajectory.
---

Inner game of tennis - Chapter 2 - Asking for qualities

Asking for qualities
Most players hypnotize themselves into acting the roles of much worse players than they actually are, but interesting results can often be achieved by doing a little role-playing of a different kind.
--
You should look as if you are hitting every ball exactly where you want to. Really get into the role, hit as hard as you like and ignore where the ball is actually going. 
---
There is an important distinction between this kind of role-playing and what is normally called positive thinking. In the latter, you are telling yourself that you are as good as Steffi Graf or Michael Chang, while in the former you are not trying to convince yourself that you are any better than you believe you are. You are quite consciously playing a role, but in the process, you may become more aware of the range of your true capabilities.
----
After they have played tennis for a year or so, most people fall into a particular pattern of play from which they seldom depart. Some adopt a defensive style; they spare no effort to retrieve every ball, lob often, hit deep into the opponent's court and seldom hit the ball hard or go for a winner. The defensive player waits for his opponent to make an error and wears him down by degrees with endless patience. Some Italian clay-court players used to be the prototype for this style.
The opposite of this is the offensive style. In its extreme form the ball is hit for a winner every time. Every serve is designed to be an ace, every return of serve a clean passing shot, while volleys and overheads are all aimed to land within one or two inches of the lines.
----
A third common pattern is what might be called the "formal" style of play. Players in this category don't care so much where their ball goes as long as they look good stroking it. They would rather be seen using flawless form than winning the match.
----
In contrast, there is the competitive style of the player who will do anything to win. He runs hard and hits hard or soft, depending on what seems to bother his opponent most, exploiting his every weakness, mental and physical.
----
Having outlined these basic styles to a group of players, I often suggest that as an experiment they adopt the style that seems most unlike the one they have previously adopted. I also suggest that they act the role of a good player, no matter what style they have chosen. Besides being a lot of fun, this kind of role-playing can greatly increase a player's range. 
----

Saturday, September 15, 2018

The Inner Game of Tennis - Chapter 1

Chapter 1:

So before hitting the next set of balls, I asked Joan, "This time I want you to focus your mind on the seams of the ball. Don't think about making contact. In fact, don't try to hit the ball at all. Just let your racket contact the ball where it wants to, and we'll see what happens." Joan looked more relaxed, and proceeded to hit nine out of ten balls dead center!
----------------
When this happens on the tennis court, we are focused without trying to concentrate. We feel spontaneous and alert. We have an inner assurance that we can do what needs to be done, without having to "try hard." We simply know the action will come, and when it does, we don't feel like taking credit; rather, we feel fortunate, "graced." As Suzuki says, we become "childlike."
-------------------
Perfectly, thoughtlessly executed action, and afterward, no self-congratulations, just the reward inherent in his action: the bird in the mouth.
-----------------
The first skill to learn is the art of letting go the human inclination to judge ourselves and our performance as either good or bad. Letting go of the judging process is a basic key to the Inner Game; its meaning will emerge as you read the remainder of this chapter.
--------------
Mr. A frowns, says something demeaning about himself, and calls the serve "terrible." Seeing the same stroke, Mr. B. judges it as "good" and smiles. The umpire neither frowns nor smiles; he simply calls the ball as he sees it.
-------------
What I mean by judgment is the act of assigning a negative or positive value to an event.
---------
Well, it is the initial act of judgment which provokes a thinking process. First the player's mind judges one of his shots as bad or good. If he judges it as bad, he begins thinking about what was wrong with it. Then he tells himself how to correct it. Then he tries hard, giving himself instructions as he does so. Finally he evaluates again. Obviously the mind is anything but still and the body is tight with trying. If the shot is evaluated as good, Self 1 starts wondering how he hit such a good shot; then it tries to get his body to repeat the process by giving self-instructions, trying hard and so on. Both mental processes end in further evaluation, which perpetuates the process of thinking and self-conscious performance.
--------------
As a result, what usually happens is that these self-judgments become self-fulfilling prophecies. 
--------
letting go of judgments does not mean ignoring errors. It simply means seeing events as they are and not adding anything to them. Nonjudgmental awareness might observe that during a certain match you hit 50 percent of your first serves into the net. It doesn't ignore the fact. It may accurately describe your serve on that day as erratic and seek to discover the causes. Judgment begins when the serve is labeled "bad" and causes interference with one's playing when a reaction of anger, frustration or discouragement follows.
---
But judgmental labels usually lead to emotional reactions and then to tightness, trying too hard, self-condemnation, etc. 
---------
Similarly, the errors we make can be seen as an important part of the developing process. In its process of developing, our tennis game gains a great deal from errors. Even slumps are part of the process. They are not "bad" events, but they seem to endure endlessly as
------
The first step is to see your strokes as they are. They must be perceived clearly. This can be done only when personal judgment is absent. 
--------------
"If the pro is pleased with one kind of performance, he will be displeased by the opposite. If he likes me for doing well, he will dislike me for not doing well." 
------------
Three men in a car are driving down a city street early one morning. For the sake of analogy, suppose that each man represents a different kind of tennis player. 
-------------
In the game of tennis there are two important things to know. The first is where the ball is. The second is where the racket head is.
-------------
THE FIRST INNER SKILL to be developed in the Inner Game is that of nonjudgmental awareness. 
------------------
Acknowledgment of one's own or another's strengths, efforts, accomplishments, etc., can facilitate natural learning, whereas judgments interfere. What is the difference? Acknowledgment of and respect for one's capabilities support trust in Self 2. Self 1's judgments, on the other hand, attempt to manipulate and undermine that trust.
-------------


Tuesday, August 28, 2018

find command find multiple extensions

find ./ -type f \( -iname \*.jpg -o -iname \*.png \)

php sending chunked encoding response

<?php
header("Transfer-encoding: chunked");
@apache_setenv('no-gzip', 1);
@ini_set('zlib.output_compression', 0);
@ini_set('implicit_flush', 1);
for ($i = 0; $i < ob_get_level(); $i++)  ob_end_flush();
ob_implicit_flush(1); flush();

function dump_chunk($chunk)
{
  printf("%x\r\n%s\r\n", strlen($chunk), $chunk);
  flush();
}

for ($i=0;$i<2;++$i) {
        $output = array();
        for ($i=0;$i<20;++$i) {
                $output[] = str_repeat("-=", 100);
        }
        dump_chunk(implode("\n", $output));
        usleep(500000);
}
dump_chunk("");
?>

Wednesday, August 8, 2018

Azure 70 534 notes

https://www.safaribooksonline.com/library/view/Exam+Ref+70-534+Architecting+Microsoft+Azure+Solutions/9780735697706/

Compute instances:
RDMA capable backend

Azure Batch and TVM
- for long running tasks

How do Homogeneous instances handle session replication?
- Sticky sessions
- External starte store(for e.g. redis)

Scheduled vs Reactive scaling (reactive means there will be some delay in scaling)
--------------
Azure Traffic Manager - redirects traffic based on round robin/performance etc. Triggered in DNS phase, so the actual traffic doesn't pass through it.






Friday, July 6, 2018

Mongodb sample queries

Assuming DB name is "master" and table(collection) name is "top"

Command line:
> use master
> db.getCollection("top").count() (or better db.top.count())
> db.getCollection("top").find({"query" : "best goa beaches"}).count();
> db.getCollection("top").find({"query" : "best goa beaches", "is_last_page": true}).count();
> db.getCollection("top").drop()
> db.places.distinct('result.place_id').length //distinct on nested field and length(not count)
> db.dm.find({ dist: { $lt: 5000 } ,type1: 'zoo', type2:'hotel'} ).count(); //less than operator
PHP:  $query_assoc = array();
$query_assoc['type1'] = 'zoo';
$query_assoc['type2'] = 'hotel';
$query_assoc['dist'] = array('$lt'=> $radius);
> db.places.find({ "result.place_id": { $in: ['ChIJYU0mP3W_vzsReXl288rhJ9M'] }},{"result.name":1} ) //find and project with "in" clause

> db.dm.find({g_dist: {$gt: 0}},{g_dist:1} ).sort({"g_dist":1}).limit(1) //find with projection and sort
Response: { "_id" : ObjectId("5b449e47a387a535a00013bb"), "g_dist" : 134 }

> db.places.find({place_type: {$exists: false}}).count() //whether the field exists?
> db.places.updateMany({eplace: 'Goa'}, {$set: {eplace: 'kk'}}) //update where


> import/export
mongodump -d <database_name> -o <directory_backup>
mongorestore -d <database_name> <directory_backup>

PHP:
$manager = new MongoDB\Driver\Manager(); //localhost
$bulk = new MongoDB\Driver\BulkWrite;
$record = array("a" => "b");
$bulk->insert($record);
$manager->executeBulkWrite('master.top', $bulk); //write  
getCount($manager, array('query' => $q, 'page' => $page));//get record count meeting the criteria


function getCount($manager, $q) {
$query = new MongoDB\Driver\Query($q);
$rows = $manager->executeQuery('master.top', $query);
$count = 0;
foreach($rows as $doc) {
++$count;
}
return $count;
}

Friday, June 8, 2018

CAP Theorem - Alternate explanation - nice article summary

https://codahale.com/you-cant-sacrifice-partition-tolerance/


1. Partition tolerance is not optional. It's a given - packets will

drop/communication errors are bound to happen between nodes.

2. So all you can choose is Availability or Consistency.

3. Choosing Consistency - You can stop accepting writes or only take

writes if the node is "Master" of the data to be written.

4. Choosing Availability - You can take all the writes but clients may

get "stale data".


2 more relevant metrices which better capture the performance


Yield & Harvest


Yield is similar to uptime but one major diff. If node is down for 1
second in peak/off-peak hours - uptime is same - but yield is vastly
different. Yield directly maps to what the user experienced. So Yield
= % of user requests served.

Harvest = available data/total data. If data lies on 3 nodes but
server was able to serve data from only 2 nodes => harvest = 66%

Now we need to decide whether faults impact yield or harvest.
Replicated systems tend to map faults to reduced yield - since fewer
requests will complete.
Partitioned systems will map faults to reduced harvest - since lesser
data will be available.

Tuesday, June 5, 2018

vim line margin spacing etc

line spacing: set lsp=10

left margin: :set foldcolumn=3

then
:highlight FoldColumn guibg=white guifg=white
or
:highlight FoldColumn guibg=gray14 guifg=white

Tuesday, April 24, 2018

letsencrypt wildcard ssl certificate on Amazon Linux + Apache

mkdir certbot
cd certbot
chmod a+x certbot-auto
sudo ./certbot-auto certonly  --server https://acme-v02.api.letsencrypt.org/directory  --manual --preferred-challenges dns  -d *.domainname.com

For putting TXT record in NameCheap:
In HostName, put _acme-challenge, in value put the string given on the command line.

To check whether the txt record is deployed:
dig -t txt _acme-challenge.domainname.com

Then in httpd.conf:

<VirtualHost *:443>
    DocumentRoot "/var/www/html/somepath"
    ServerName other.domainname.com
    ServerAlias *. domainname .com
    SSLCertificateFile /etc/letsencrypt/live/ domainname .com-0001/cert.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/ domainname .com-0001/privkey.pem
    SSLCertificateChainFile /etc/letsencrypt/live/ domainname .com-0001/fullchain.pem
    ErrorLog logs/ domainname -error_log
    CustomLog logs/ domainname -access_log common

    <Directory "/var/www/html/somepath">
        Options Indexes FollowSymLinks
        AllowOverride All
        Order allow,deny
        Allow from all
    </Directory>
</VirtualHost>


Friday, March 23, 2018

Convolutional neural network course Coursera - Week 1

Edge detection with convolutional filter.
Image is nxn, fxf filter (f is usually odd).

Valid convolution when you don't pad the original image which is nxn, with fxf filter you get n-f+1 x n-f+1 output image which tells where are the edges.
Same convolution when you pad the original image so that every pixel gets equal opportunity to participate in the final output => (n + 2p -f)/s + 1 = n

Strided convolution - when you do the convolution while making a stride. Output image size will be (n + 2p -f)/s + 1.

Convolutions over 3D volumes:

For e.g. RGB image.
Image is 6x6x3 and filter 3x3x3, then you get 4x4 output. First 9 numbers will detect edges in red channel and so on..

Multiple Filters
What if you want to use multiple filters at the same time? For e.g. detect Vertical/Horizontal edges together? Or detect edges at various angles?
In the above example if you apply 2 3x3x3 filters, you will get output as 4x4x2.
Which is n -f + 1 x n - f + 1 x (number of filters).

How to tune parameters
But for now, maybe one thing to take away from this is that as you go deeper in a neural network, typically you start off with larger images, 39 by 39. And then the height and width will stay the same for a while and gradually trend down as you go deeper in the neural network. It's gone from 39 to 37 to 17 to 14. Excuse me, it's gone from 39 to 37 to 17 to 7. Whereas the number of channels will generally increase. It's gone from 3 to 10 to 20 to 40, and you see this general trend in a lot of other convolutional neural networks as well.

Similar to convolutional layer, there is pooling layer:
for e.g. Max pooling - if a feature is detected anywhere - preserve it.
It has some hyperparameters but no parameters(to learn for gradient descent)
Hyperparameters -> f,s (filter size, stride)

Similarly, average pooling:

Tuesday, January 30, 2018

Cryptography course week 6

Public key encryption

Trapdoor function(TDF)
Secure TDF - G,F,F-1 is secure if F(pk, .) is a "one-way" function: can be evaluated but can't be inverted without sk(secret key). pk is public key. 
Secret key is the trapdoor.

Public key encryption from TDFs

Encryption
1. Choose a random x
2. k <= H(x) where H is a hasher
3. y = F(pk, x) where G,F,F-1 is a secure TDF and pk,sk are generated from G
4. c <= E(k,m) where E,D is symmetric auth. encryption defined over (K,M,C)
5. Output is y,c

Decryption
1. x <= F-1(sk,y)
2. k = H(x)
3. m = D(k,c)

If we apply F directly to m, it becomes deterministic. There is no randomness (which was provided by X).

The RSA Trapdoor permutation
Review: arithmetic mod composites
Let N = p.q where p,q are primes and roughly same size => p,q are almost equal to sqrt(N)
Z_N = (0,1,2...N-1) and Z_N* = set of invertible elements in Z_N
x E Z_N is invertible if gcd(x,N) = 1
number of invertible elements = phi(N) = (p-1)(q-1) = N -p -q +1 ~= N - 2.sqrt(N) ~= N since N is very large(for e.g. 600 digits, so sqrt will be like 300 digits)
So Z_N* ~= Z_N => almost every element in Z_N will be invertible.

Euler's thm For all x E Z_N * => x ^ phi(N) = 1

How RSA works
0. choose random primes p,q roughly 1024 bits, set N = p*q
1. choose e,d s.t. e*d = 1 mod phi(N)
2. pk = (N,e), sk = (N,d) where e is encryption exponent and d is decryption exponent
3. for x E Z_N*, F(pk,x) is RSA(x), RSA(x) = x^e in Z_N
4. to decrypt
5. RSA_1(y) = y^d = (RSA(x))^d = (x^e)^d = x^(e*d), now e*d = 1 mod phi(N) means e*d = k*phi(N) + 1 where k is some integer
6. RSA_1(y)  = x^(k*phi(N) + 1) = x^(k*phi(N))*x, from Euler's thm. x^(phi(N)) = 1 since x E Z_N* => RSA_1(y) = x

Textbook RSA is insecure
Encrypt C = m^e
Decrypt C^d = m

PKCS1
Uses RSA. Insecure since attacker could check if MSB of a cipher text's original message == 2. And could decode the entire message in this way. It's used in HTTPS so they fixed it by reverting to a random 46 byte string in case of erroneous message, so that attacker doesn't get any information about the message.

PKCS2 - OAEP (Optimal Asymmetric Encryption Padding)
Improvement over PKCS1

Public key encryption built from Diffie Hellman Protocol
ElGamal
IDH - Interactive Diffie Hellman
Twin ElGamal


Saturday, January 13, 2018

speed test results

Primary router - TP-Link Archer C60 AC1350 - Dual band
Repeater(bridge) - tp link wr841n
ISP: ACT Broadband
Device: MI A1

Format
ping time, download speed, upload speed

5g - primary
1, 43.92, 34.93
1, 60.44, 48.84
1, 65.8, 51.82
1, 56.5, 48.67
1, 60.67, 51.61

2.4g - primary
1, 32.63, 32.53
1, 49.3, 33.92
1, 44.85, 23.27
1, 40.19, 33.38
1, 58.69, 30.95
1, 37.07, 32.14

2.4g- bridge
1, 39.21, 22.05
1, 31.09, 21.7
1, 22.55, 20.02

Sunday, January 7, 2018

Cryptography course - Basic key exchange

Online Trusted third party(TTP)
If A,B want to communicate, Eavesdropper sees E (K_a, "A,B" || K_ab) and E (K_b, "A,B" || K_ab)
Similar mechanism is basis of Kerberos system.
It's only safe against evaesdropping attacks not against an active attacker.
TTP should always be online.

Active attack
 - If a money transaction is taking place, what if the attacker just replays the request? Since the key is still the same, another transaction would take place.

Key question
 - can we design key exchange protocols without online TTPs?
- Yes! Public key cryptography.

Merkle puzzles
- Quadratic gap between participants and attackers (2^32 vs 2^64)
 - This looks like the best we can achieve from symmetric block ciphers

Diffie Hellman protocol
 - exponential gap
- Fix a large prime p (e.g. 600 digits), Fix an integer g in {1,...,p}
- Alice: choose random a in {1,...,p-1}
- Bob: choose random b in {1,...,p-1}
- A <- g^a mod p
- B <- g^b mod p
- Alice sends A to Bob and she sends B back
- Now the shared key is : g^ab mod p since both of them can compute it.

How hard is DH function mod p?
- suppose Prime p is n bits long
- best known algo (GNFS): run time exp (O(n^1/3)), so exponential in cube root of n.
- to achieve same security as AES 256 bits, we need modulus size 15360 bits in DH
- but only 512 bits if we use Elliptic curves in place of mod p
- as a result there is slow transition away from (mod p) to elliptic curves

The way we have defined it so far it's insecure against MiTM
Public key encryption

Intro. to Number theory
Z_N = {1,2...N-1} a ring where addition and multiplication mod N can be done.
x.(y+z) = x.y + x.z
For all ints x,y there exist ints a,b s.t. a.x + b.y = gcd(x,y) and a,b can be found efficiently using the extended Euclid alg. For e.g. 2.12 - 1.18 = 6 = gcd(12,18)

If gcd(x,y)=1 we say that x and y are relatively prime.

Modular inversion
Over the rationals, inverse of 2 is 1/2. 
Def: The inverse of x in Z_N is an element y in Z_N s.t. x.y = 1 in Z_N

Lemma: x in Z_N has an inverse iff gcd(x,N) = 1 so Z_N* = set of invertible elements in Z_N = all x s.t. gcd(x,N) = 1

Solving modular linear equations
Solve a.x + b= 0 in Z_N
=> a.x = -b
=> x = -b.a^-1
Find a^-1 in Z_N using extended Euclid. Run time: O(log^2 N)

Fermat's theorem
Let p be a prime. For all x in (Z_p)*: x^(p-1) = 1 in Z_p
Example p=5. 3^4 = 81 = 1 in Z_5

This gives us another way to compute inverses, but less efficient than Euclid
x e (Z_p)* => x.x^(p-2) = 1 => x^(-1) =x x^(p-2) in Z_p

but it doesn't work for non primes.
Run time O(log^3 N)
So, less general and less efficient.

Application of Fermat's theorem - Generating random primes
Let's say we want to generate a large random prime
say, prime p of length 1024 bits (i.e. p ~ 2^1024)

Step 1: choose a random integer p e [2^1024, 2^1025 -1]
Step 2: test if 2^(p-1) = 1 in Z_p
If so, output p and stop. Else goto Step 1.

For 1024 bits prime Pr[p not prime] < 2^-60

We can also get False primes through this method.

Structure of (Z_p)*
It's a cyclic group, there exists g e (Z_p)* {1,g,g^2,...g^p-2} = (Z_p)*

g is called a generator of (Z_p)*
Not every element is a generator.

Lagrange theorem: ord_p(g) always divides p-1
ord_p(g) = |<g>| = generated group of g

Euler's generalization of Fermat
phi(N) = |(Z_N)*|
phi(12) = |{1,5,7,11}| = 4
Phi(p) = p - 1 where p is prime.

If N = p.q where p,q are prime then phi(N) = N-p-q+1  = (p-1)(q-1)

Euler's theorem - For all x in (Z_N)* x^phi(N) = 1 in Z_N - basis of RSA

Example : 5^phi(12) = 5^4 = 625 = 1 in Z_12

Practice questions:
2^10001 % 11 = 1 (Fermat), since gcd(2,11) = 1 and 11 is prime => 2^(11-1) = 2^10 % 11 = 1 => 2^10001 % 11 = 2^1 % 11 = 2
2^245 % 35 = 1 (Euler's generalization) since gcd(35,2) =1 and 35 is not prime, N = 35 = 7.5, so |phi(N)| = 7-1.5-1 = 24 => 2^24 % 35 = 1 => 2^245 % 35 = 2^5 = 32


Modular e'th root
When does the root exist?

e=2, square roots
x, -x => x^2
If p is an odd prime then gcd(2, p-1) !- 1
In Z_11 * , (1)^2 = 1 (-1)^2 = 1 where -1 = 10 (since mod 11)
similarly 2 and 9 map to 4, 3,8 map to 9 and so on.

x in Z_p is a quadratic residue if it has a square root in Z_p.
p odd prime => the number of Q.R. (Quadratic Residue) in Z_p is (p-1)/2 + 1 , extra 1 is for 0.

Euler's theorem about when does a number have a Q.R.
This theorem is not constructive, i.e. it tells us about existence but not how to construct it.

Arithmetic algorithms
Addition,subraction - linear in n (input size)
Division O(n^2)
Multiplication is naively O(n^2) if inputs are n-bits. Karatsuba's algorithm O(n^1.585)
Best(asymptotic) algo: On(n.logn).but is practical on very large numbers.
But Karatsuba's more practical and most crypto libraries use it.

Modular exponentiation is O(n^3).

Some hard problems
District log base 2 mod p for (1) (Z_p)* for large p, (2) Elliptic curve groups mod p
An application: collision resistance
If H(x,y) = g^x.h^y where g,h are generators of G where G = (Z_p)* for large p the finding collisions of H is as difficult as DLog problem.

Now look at some difficult problems modulo composites(above is modulo prime)






Wednesday, January 3, 2018

Making deleted files unrecoverable in Windows


2 ways:

For C: (similarly for other drives)
1. Download sdelete from here: https://docs.microsoft.com/en-us/sysinternals/downloads/sdelete then sdelete -z C: for a folder (sdelete folder/), for a file (sdelete file)
2. cipher /w:C

Blog Archive