For structural comparisons, the researcher typically seeks to choose among a finite number N of hypotheses about the structure Hj, given an alignment containing a number n of aligned sequences Sk (we will use this notation throughout the paper, and will use 'structure' interchangeably for 'structural hypothesis'). Each structural hypothesis consists of a list of positions of bases that must be paired: all other bases are unpaired. The structures must be ranked according to their posterior probabilities (the probability that each is the true structure) once all the data are taken into account. This captures the common situation where the researcher has folded each of a set of closely related sequences individually, perhaps returning a few structures of similar energy for each. When the structures conflict, it is hard to predict objectively which structure is most plausible for all the sequences. Worse yet, the true structure (as revealed by chemical or physical techniques) is often not the least-energy structure for any of the sequences. Since there is little basis for choosing among structures at this early point, we specify the prior probability Pr(Hj) of each of the N structural hypotheses Hj as 1/N.
Thus, for a predefined list of structures obtained by any method (automatic or manual), BayesFold assigns each structure a probability by successively taking into account each of the types of data. For this version of BayesFold, we generate the structures Hj by suboptimally folding each sequence using version 1.4 of the Vienna RNA folding package. We use an energy window of 2 kcal per mol, and take a maximum of ten suboptimal structures for each sequence, to reduce computing time: however, these parameters are adjustable. BayesFold's assumption is that all the sequences fold to give an identical active structure; it therefore works best on relatively short sequences that are at least 90% identical (such as the 'sequence families' routinely isolated from SELEX). However, it may also be useful for folding sequences from closely related organisms. The current version of BayesFold does not allow pseudoknots, but we believe it will be possible to address this issue in future versions.
The following discussion assumes that all of the sequences fold into precisely the same structure (i.e. the positions of the paired and unpaired bases are identical in every sequence). If the sequences do not fold into a common structure, the posterior probabilities are unreliable. However, the results often indicate which sequences cannot share the best overall fold, making it easy to re-fold without these outliers. While only hypotheses about the whole structure can be tested now, we plan to add the ability to assess local structures (e.g. 'active sites') later. In this version of BayesFold, better results can be obtained by entering a few sequences that definitely fold into the same overall structure rather than entering many sequences that might actually have different structures.