Tuesday, May 23, 2017

Probabilistic algorithms

https://www.safaribooksonline.com/oriole/probabilistic-data-structures-in-python

Set Cardinality - HyperLogLog
Set Membership - With False positives but no False negatives - Bloom Filter
Set similarity (document similarity) - MinHash with Jaccard
Frequency Summaries (Leaderboards in games) - Count-min sketch
Streaming Quantiles - stream of data where you have to aggregate metrics without stopping - T-Digest (like min/max/percentile)



No comments:

Blog Archive