2. Understanding Consensus Sequences

BayesFold augments the entered sequences with two consensus sequences.

  1. IUPAC consensus: composed of the International Union of Physical and Applied Chemist's set of degenerate nucleotide symbols representing the bases present at every position in the alignment. These symbols are:

    • R -> A or G (puRine)

    • Y -> C or T (pYrimidine)

    • M -> A or C (aMino)

    • K -> G or T (Keto)

    • S -> G or C (Strong interaction--3 H bonds)

    • W -> A or T (Weak interaction--2 H bonds)

    • H -> A, C, or T

    • B -> G, T, or C

    • V -> G, C, or A

    • D -> G, A, or T

    • N -> G, A, T, or C (aNy)

    For more information, see "Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences" at http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html.

  2. Majority consensus: composed of the most frequent nucleotide at each position in the alignment. When two or more nucleotides have equal frequency at a given position, one is chosen arbitrarily for inclusion in the majority consensus sequence.

Data for the consensus sequence evidence columns are determined by averaging across the corresponding evidences for each non-consensus sequence, except in the case of the best index. The best index for each evidence corresponds to the structure with the highest average score in that evidence across all sequences. See Section 5, “Understanding the Evidence Columns” for a more detailed description of the evidence columns.