3. Bayes' Theorem for Structure Predictions

If we assume conditional independence, so that the different types of data are statistically uncorrelated, we need only determine Pr(Di|Hj) for each individual type of data (rather than the joint probability distribution for all data simultaneously). Conditional independence, also called 'naive Bayes', can be a useful approximation even when some of the variables are in fact correlated. In general, it tends to exaggerate the relative support for the best solutions rather than changing the rank order, and is widely used in pattern classification.

We can thus apply Bayes' Theorem with marginalization to each of the types of data in turn (the results do not depend on the order in which the types of data are considered):

Bayes Theorem for Structure Predictions

... where Pr(Hj|D0) = 1/imax, since we are starting with a uniform prior probability distribution. More details on the above mathematics can be found, for example, in E. T. Jaynes' book Probability Theory as Extended Logic available at http://omega.albany.edu:8008/JaynesBook.html.