## Entropy as a measure of “surprise”

I recently read an article that shed light on how the mathematical definition of “entropy” in Information Theory (the study of the quantification of information) came about. It was certainly more informative than plonking the definition

$H := - \displaystyle\sum_{i} p_i \log_2 p_i$

and expecting students to take that as truth!

I remember the questions running in my head the first time I saw this formula. How in the world did they hit on this definition? Why this definition? Why the negative sign? Why the log? HUHHH??

Now I know better: information can be thought of as some measure of “surprise” and entropy can be thought of as a measure of “average surprise”.