**Supporting Online Tables for "Chromatin-associated
periodicity in genetic variation downstream of transcriptional start sites."**

*Substitution
Models.*

For the ** bidirectional
and transcribed strand unidirectional substitution model**, we counted
the number of occurrences of individual substitutions in the alignments between
the Hd-rR and HNI genomes and assign the number to sub(

*Bidirectional
substitution model ( |LXR |=3, |L|=|R|=1 )*

*Bidirectional
substitution model ( |LXR|=5, |L|=|R|=2 )*

*Transcribed
strand unidirectional substitution model ( |LXR|=3,
|L|=|R|=1 )*

*Transcribed
strand unidirectional substitution model ( |LXR|=5,
|L|=|R|=2 )*

To estimate the substitution
rate at the position of *X*, denoted by
subRate(*LXR*), the above sum is
divided by N(*LXR*), the number of
occurrences of *LXR* and its reverse
complement * RXL* (or, the number
of occurrences of

subRate(*LXR*) = S {sub(*LXR*,
*LYR*) | *Y* is a nucleotide other than *X.
*} / N(*LXR*).

To compute N(*LXR*), we need to
consider all alignments of *LXR* that
may involve insertions, deletions, and substitutions; however, the combination
of these mutations makes it difficult to enumerate the occurrences of *LXR*. To resolve this issue, we utilize
the fact that the majority of these alignments represent perfect matches or substitutions
of *LXR*, while indel frequencies are
typically less than 1%. Thus, we approximate N(*LXR*) as

N(*LXR*) ~ S {sub(*LXR*,
*M*) | *M* is a nucleotide string of the same length of *LXR. *}

The values of subRate(*LXR*) for
the bidirectional substitution model ( |*L*|=|*R*|=1 or 2 ) can be found:

subRate(*LXR*) for the
bidirectional substitution model ( |*LXR*|=3,
|*L*|=|*R*|=1 or |*LXR*|=5, |*L*|=|*R*|=2
) (MS Excel)

*Indel
Models.*

** The bidirectional indel model** employs the above transformations and calculates
the frequency of all occurrences of

*LR* => *LX*R*
and *LX*R* => *LR*

together with their complements

* RL* =>

in the alignments between the Hd-rR and HNI
genomes. The model assigns the frequency to n(*LR*, *l*),
where *l* is the length of *X**. The indel rate at a
position where *LR* occurs is
estimated as

S { n(*LR*, *l*)
| 1 __<__ *l* } / N(*LR*),

where N(*LR*)
is the number of occurrences of *LR*
and its reverse complement * RL* in
the alignments between the Hd-rR and HNI genomes. As before, we approximate N(

N(*LR*) ~ S { sub(*LR*,
*M*) | *M* is a nucleotide string of the same length of *LR. *}.

The 1bp indel rate
is estimated by setting *l *to 1;
namely,

n(*LR*, 1) / N(*LR*).

The indel rate at
a position where *LR* occurs is:

Indel
rates estimated by bidirectional indel model ( |*LR*|=4 or 6 ) (MS Excel)

** The
k-mer motif indel model** searches a local
region around an indel for a continuous

(S_{1< l}* indel(M*, *l*, *d*))
/ sub(*M,M*).

The above ratios for |*M*|=3
and 4 are available in the following table in which the rows present all 3- / 4-mer
strings for *M* and the columns
indicate distance *d* relative to the
occurrence of *M*. The tables present the
probabilities for 1 __<__ *l*, 1 =* l*, or 1 < *l*.

*k-mer motif indel model (k=3 or 4, 1 < l,
1 = l, or 1 < l)* (MS Excel)