Welcome to dbForumz.com!
FAQFAQ    SearchSearch      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

Entropy and Quantity of Information

 
   Database Forums (Home) -> Technology and Theory RSS
Next:  Relationships?  
Author Message
David Cressey

External


Since: Mar 19, 2007
Posts: 550



(Msg. 1) Posted: Fri Jan 11, 2008 8:02 pm
Post subject: Entropy and Quantity of Information
Archived from groups: comp>databases>theory (more info?)

Instead of hijacking another topic, I'll start this topic.

David BL suggested that a way to quantify information is the number of bits
needed to encode, allowing for compression. I said I preferred entropy as
the measure of information, and suggested that the two measures might in
some way be equivalent. Someone else recalled the concept of entropy from
the study of statistical mechanics and thermodynamics in physics.

The use of the word "entropy" definitiely comes from physics. The formulas
that Shannon (qv) determined for the amount of information in a message bear
an uncanny resemblance to the formulas for entropy in physics. I personally
don't believe that this is mere coincidence.

Next, I talked a little about entropy being context sensitive. There is a
much more precise way than mine of stating this, but I've forgotten what it
is. I think it goes something like this: the self information in a
message is context insensitive but the mutual information between a pair of
messages is dependent on both of them. My use of the word "database" in
place of message in my earlier comment is a little sloppy in this regard.

Next, the use of logarithms, and the number of bits needed to store
(represent?) a message are closely related. If we have 8 equally likely
messages, it will take precisely 3 bits to store one message. Note that
log base 2 of 8 is 3. If you use the formulas for entropy, and use 2 as
the base for the logarithms, the unit of information comes out to be
precisely the bit.

There is a form of compression called Huffman encoding on an ensemble of
messages that takes advantage of mathematical expectation to compress, on
the average, an ensemble of messages when the eight possible messages are
not a priori equally likely. If for example, message number 1 is 50%
likely, you and use a message consisting of 1 bit to represent it, and use
somewhat longer messages for the other 7 possible messages. I don't want to
belabor the point in cdt, but there are ample references for Huffamn
encoding on the web. If you study Huffman coding in detail, you'll start
to get logarithms into the picture.

All of this goes back to the 1960s, and some of it to the 1940s. Is entropy
still widely used in information science? Is it relevant to database
theory?

 >> Stay informed about: Entropy and Quantity of Information 
Back to top
Login to vote
Tegiri Nenashi

External


Since: Oct 31, 2007
Posts: 57



(Msg. 2) Posted: Fri Jan 11, 2008 8:02 pm
Post subject: Re: Entropy and Quantity of Information [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Jan 11, 10:41 am, "David Cressey" wrote:
>  Is entropy
> still widely used in information science?  Is it relevant to database
> theory?

http://www.sigmod.org/pods/proc00/papers/dalkilic.pdf

 >> Stay informed about: Entropy and Quantity of Information 
Back to top
Login to vote
Joe Thurbon

External


Since: Jan 19, 2007
Posts: 48



(Msg. 3) Posted: Sat Jan 12, 2008 2:01 am
Post subject: Re: Entropy and Quantity of Information [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

David Cressey wrote:
> Instead of hijacking another topic, I'll start this topic.
>
> David BL suggested that a way to quantify information is the number of bits
> needed to encode, allowing for compression.

I think the term being looked for is

http://en.wikipedia.org/wiki/Kolmogorov_complexity

There's been a _lot_ of work done in this area.

> I said I preferred entropy as
> the measure of information, and suggested that the two measures might in
> some way be equivalent. Someone else recalled the concept of entropy from
> the study of statistical mechanics and thermodynamics in physics.

It also gets a good work out in machine learning. Many of the classical
algorithms (decision trees in their various forms) have
information-gain (which is defined in terms of entropy) at their heart.
(The precise definition is closely related to the number of bits
required for representation)

I vaguely recall that some of the more theoretical machine learning
results (like learnability and optimality results) rely on a notion entropy.

[... snip interesting perspective ...]

> All of this goes back to the 1960s, and some of it to the 1940s. Is entropy
> still widely used in information science?

Yes.

> Is it relevant to database theory?

Good question.

Cheers,
Joe
 >> Stay informed about: Entropy and Quantity of Information 
Back to top
Login to vote
David Cressey

External


Since: Mar 19, 2007
Posts: 550



(Msg. 4) Posted: Sat Jan 12, 2008 11:08 am
Post subject: Re: Entropy and Quantity of Information [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

"Joe Thurbon" wrote in message

[snip good stuff]

> I vaguely recall that some of the more theoretical machine learning
> results (like learnability and optimality results) rely on a notion
entropy.


I once wrote a little program to play Mastermind (a code guessing game)
using entropy to measure a move's worth. Mastermind can be played using a
brute force tree search. But my program tried out only about 10 moves, and
picked the one with the best entropy. Its play was only slightly inferior
to the play of a brute force tree searcher.
 >> Stay informed about: Entropy and Quantity of Information 
Back to top
Login to vote
Display posts from previous:   
Related Topics:
RM formalism supporting partial information - Nearly two weeks ago I posted to the thread "atomic" regarding a join operator I had defined back in January. A discussion with Paul C inspired me to more carefully develop the mathematical formalism. I have been surprised to discover some el...

Non Sequitur - <satire> Meteorologists have noted that there is an unusually high number of hurricanes in the Caribbean this year. Experts are in disagreement as to what the fundamental cause is. However, one frequent observer has conjectured that, "this...

The word "symbol" - A few days ago, VC commented on my use of the word "symbol" saying that I was inventing new terminology. I'm trying to restrain the urge to rant, and just give a sober reply. There is a book on my shelves, thanks to Joe Celko, who mailed it...

Semiotics - If semiotics is all about writers like Humberto Eco, then it's a little far afield even for my tastes. Now, if Eco had written about say, how we name data, I might have read him a little more. Let's say that Humberto Eco wrote a paper on practical..

APL, J or K? - Anyone here have any experience with the APL, J or K programming languages? (Yes, I recognize the redundancy in "APL programming language.") What about with Kdb? They look fairly interesting, if somewhat thrown together. It doesn't appear that...
   Database Forums (Home) -> Technology and Theory All times are: Pacific Time (US & Canada)
Page 1 of 1

 
You can post new topics in this forum
You can reply to topics in this forum
You can edit your posts in this forum
You can delete your posts in this forum
You can vote in polls in this forum



[ Contact us | Terms of Service/Privacy Policy ]