Calling this 'alternative' construction seems like coming full circle since this line of combinatorial argument is how Boltzmann came up with his H-function in the first place, which inspired Shannon's entropy.
rkp8000 162 days ago [-]
Yep! This relationship is well known in statistical mechanics. I was just surprised that in many years of intersecting with information theory in other fields (computational neuroscience in particular) I'd never come across it before, even though IMO it provides an insightful perspective.
The key (which is not in OP) is not the construction of E log(p), but in being able to prove that the “typical set” exists (with arbitrarily high probability), and that the entropy is its size.
xigoi 163 days ago [-]
The site is unreadable on mobile because it disables overflow on the equations (which it shows as images, even though it’s 2024 and all modern browsers support MathML).
masfuerte 163 days ago [-]
Today I learned! I'd missed the news that MathML was back in Chrome. I've been publishing web pages using MathML for years, along with a note that the equations don't render in Chrome. I can finally remove the note.
ivan_ah 163 days ago [-]
Nice!
The key step of the derivation is counting the "number of ways" to get the histogram with bar heights L1, L2, ... Ln for a total of L observations.
I had to think a bit why the provided formula is true:
The story I came up with for the first term, is that in the sequence of lenght L, you need to choose L1 locations that will get the symbol x1, so there are choose(L,L1) ways to do that. Next you have L-L1 remaining spots to fill, and L2 of those need to have the symbol x2, hence the choose(L-L1,L2) term, etc.
Maro 162 days ago [-]
In Physics, the log part comes in when you use the Stirling approximation for large N.
https://web.stanford.edu/class/ee376a/files/2017-18/lecture_...
The key (which is not in OP) is not the construction of E log(p), but in being able to prove that the “typical set” exists (with arbitrarily high probability), and that the entropy is its size.
The key step of the derivation is counting the "number of ways" to get the histogram with bar heights L1, L2, ... Ln for a total of L observations.
I had to think a bit why the provided formula is true:
The story I came up with for the first term, is that in the sequence of lenght L, you need to choose L1 locations that will get the symbol x1, so there are choose(L,L1) ways to do that. Next you have L-L1 remaining spots to fill, and L2 of those need to have the symbol x2, hence the choose(L-L1,L2) term, etc.Ideal gas: https://bytepawn.com/entropy-of-an-ideal-gas-with-coarse-gra...
Physical gas: https://bytepawn.com/the-physical-sackur-tetrode-entropy-of-...