data compression and the entropy limit
Though this video by Computerphile may seem a little dated (think July of 2013), it’s entirely dependent on an even older source coding theorem proved by Claude Shannon in 1948 … along with just about everything else you see on a computer screen. In this video, Professor David Brailsford uses the transmission of San Francisco weather states to illustrate an example of lossless compression. Because transmission is expensive, the aim is to use a minimal number of bits for representing 4 distinct weather states in San Francisco. As there are 4 (equally probable) states, the minimum number of bits required to transmit a current state in this example is 2.
Brailsford explains that this is the entropy limit for lossless compression of the information transmitted. If the scenario was more predictable, perhaps in the desert, then only 1 bit would be required as there are only 2 possible (equally probable) states. Brailsford gives us a formula for determining the minimum bits needed to transmit a given piece of information. Using base 2, we’ll get the log of the number of equally probable states. When asked specifically about JPEG compression, near the end of the 12 min. video, Brailsford answers, "In one respect or another, probability is at the root of all of this. That's particularly true if you want lossless compression. Because you run up against unstoppable entropy limits about how far you can compress and not lose anything in the compression."
Originally referred to as Shannon entropy, this kind of entropy limit is described in Claude Shannon’s 1948 paper "A Mathematical Theory of Communication". His source coding theorem proves that entropy places an absolute limit on how well data can be compressed without loss of information. In other words, when you're looking for an optimal compression it's likely to depend more on the information sent rather than the hardware used for transmission. The formula Brailsford offers for compression here echoes a more general formula used to compute the amount of information generated by the reduction of n equally likely possibilities to 1. In his 1981 treatise, Knowledge and the Flow of Information, Fred Dretske writes I(s) “to denote the amount of information associated with, or generated by [a source]”, so that
I(s) = log n
where n is the number of possibilities, and the base depends on how we choose to represent those possibilities. In our first example above, we know we have 4 distinct states to communicate. If we use base 2, then the minimum number of bits required is 2.
When Brailsford tells us that probability is at the root of all of this, he’s referring to the probability of a particular event’s occurrence. By way of illustration, Dretske uses the monsoon season where (similar to Brailsford’s desert example) there are only two distinct possible states. During monsoon season the probability of rain is 0.9, or ninety percent likely. Otherwise there’s a 50% chance of rain. Dretske maintains that during monsoon season forecasts convey less information because it’s more likely to be raining.
Once the states transmitted are no longer equally probable, then as Dretske tells us, we must use a sum to determine the “average amount of information generated by that source“, also known as the entropy of that source. When probability is introduced, we find our original formula becoming a little more complex. This does little to change the fact that a 75-year-old formula is still the beating heart behind every computer screen you see…