Thursday, August 11, 2016


  1. Fire & forget, Sync, Async with callback(Future, future.get())
  2. 2 kind of errors : retriable(connection error, no leader) & non-retriable(message too large)
  3. RecordMetadata rec = producer.send(new ProducerRecord()); rec has offset of message
  4. ProducerRecord needs topic name, key serializer, value serializer, bootstrap.servers(at least 2 so that if one is down other can take over, rest will be discovered by them)


  1. One consumer group per application. If #consumers > #partitions, rest of the consumers will remain idle.
  2. One consumer per thread is the rule.
  3. How does consumer commit offset:
    1. auto commit after every 5 seconds during poll.
    2. or commit manually - consumer.commitSync()
    3. or commitAsync() with callback
    4. commitAsync() for regular commits, commitSync() in finally block
    5. commitSync and commitAsync can be called with explicit topic,partition,offset too.
  4. RebalanceListener
    1. OnPartitionsRevoked
    2. OnPartitionsAssigned
  5. SeekBeginning, SeekEnd, seek specific offset
  6. Consumer clean exit: consumer.wakeup() from shutdownhook. consumer.close(). wakeupException().
  7. Single Consumer also possible as opposed to the Consumer Group.


  1. Custom serializers
  2. Avro: Schema can be changed without affecting existing messages

  1. Default: hash
  2. Custom

Friday, July 22, 2016


Basic unit: Message(Like a row in a table). Message can have a key(metadata) associated.
Message schema: Apache Avro is most popular. JSON/XML also fine.
Topic(like a DB table)/Partition: One topic can have multiple partitions.
Consumer(Group): A consumer can select specific partition(s) to listen on.

1. There is no guarantee of time-ordering of messages across the entire topic, just within a single partition. 

2. Each partition can be hosted on a different server, which means that a single topic can be scaled horizontally across multiple servers to provide for performance.

3. The term stream is often used when discussing data within systems like Kafka. Most often, a stream is considered to be a single topic of data, regardless of the number of partitions. This represents a single stream of data moving from the producers to the consumers. This way of referring to messages is most common when discussing stream processing, which is when frameworks, some of which are Kafka Streams, Apache Samza, and Storm, operate on the messages in real time.

Broker(a single server) - receives messages from producers and services consumers.
Broker cluster - one of them is controller.

Retention of data by time/size.

Topics may also be configured as log compacted, which means that Kafka will retain only the last message produced with a specific key. This can be useful for changelog-type data, where only the last update is interesting.

MIrror Maker - for data replication across clusters.

Thursday, June 23, 2016


docker logs

docker logs -f

docker run -d -p

docker run -it

docker attach

docker port webserver 80

docker diff webserver

docker cp

docker inspect webserver

docker rm -f $(docker ps -aq)

docker search postgres
docker pull postgres:latest

Thursday, June 16, 2016


always safe : git checkout -b new_branch

git reset --cached
git rm --cached
git reset -- tracking.txt
git rm
git mv

git add -A
git add -a
git add -u

git add --update
git add --all

git init
git clone
git --version


git clone --bare
git log --abbrev-commit --abbrev=4 --pretty=oneline -10
git log --author="John Resig"
git log --pretty=oneline --since="2012-12-20" --until="2013-01-01" -5
git log --pretty=format:"%an --- %H"
Joe Doe --- 123456...

git log --pretty=oneline | wc -l
 git shortlog -s | wc -l

git log --pretty=format:%cd --date=short | uniq | wc -l

 du -h -s .git
du -h -s --exclude=.git

git alias

git stat
git info

git shortlog -n -s

    bare = true

git status -s -b

username different for every repo
global vs local config

git init

git reflog
git reset --hard HEAD@{1}
git chekcout -b new_branch

git reflog expire --all --expire=now
git prune --dry-run

2 status

git rm
git mv
git rm --cached
git log --oneline --graph --decorate --all
git show info^{tree}
git show SHA-1^{tree}
git branch -v -> latest revision in every branch
git pack-refs --all
git internal files
cat .git/refs/heads/new_one
cat .git/HEAD
cat .git/packed-refs

git branch -vv
git remote -v
ordinary local branches
local tracking
remote tracking
different pull push urls
git branch -a
git branch -r

create branch : git branch doc
switch branch : git checkout doc
show branches : git branch
git checkout -b new-branch existing-branch
git checkout -b new-branch HEAD
Show all branches : 
git branch -a -vv
git branch -r : remote
Ordinary local branches 
Local tracking branches 
Remote tracking branches 
git remote rm origin
list-remote-branches = "!listRemoteBranches() {
    git branch -r | sed \"/->/d; s/  origin\\///g\";
}; listRemoteBranches"
checkout-remote-branches = "!checkoutRemoteBranches() {
    for name in `git list-remote-branches`; do
        git checkout $name;
}; checkoutRemoteBranches"

clone-with-branches = "!cloneWithBranches() {
    git clone $1 $2;
    cd $2;
    git checkout-remote-branches;
    git remote rm origin
}; cloneWithBranches"
git remote -v
For untracked filed : 
git clean -f
git clean -n
git pack-refs
git cherry-pick <rev> -> creates a new rev
git branch -d
git branch -D
git branch --merged
git branch --no-merged
branch rename
git branch -m info information
git branch -M info information
checkout file from a different revision/branch
git checkout info -- i1.txt
git show doc:m1.txt
git clone won't preserve reflog, cp will
 git merge --ff-only feature : merge only if fast forward case
git log --oneline --merges : show only merge commits
git log --oneline --no-merges : show only no merge commits
git log --oneline --max-parents=X --min-parents=Y
 git merge --no-ff feature : force creation of merge commit even if fast forward is possible
git log --oneline --graph --all
git merge a b c d
git rebase
git rev-parse master
git checkout `git rev-parse master` : to enter detached HEAD
rebase alternative : git checkout `git rev-parse master`; git am *.patch; git checkout -B feature 
rebase can also be done with cherry-pick

Thursday, June 9, 2016


git log --graph --all --oneline --decorate


git merge --squash feature : commit all the pending commits of branch
<feature> in current branch

$ git checkout --ours [filename]
$ git checkout --theirs [filename]

git checkout --merge [filename]

git checkout --conflict=diff3 numbers.txt

git commit --no-edit

git merge --abort

echo "numbers.txt binary" > .gitattributes

Wednesday, June 8, 2016


git log --format="%h" --grep=XXX --all

git log --oneline
master feature brave-idea
^`git merge-base master feature`
^`git merge-base feature brave-idea`

Partial rebase
git rebase --onto a b c

git commit --amend

git rebase -i HEAD∼3 for Squash,reordering commits,removing commits

Blog Archive