Snippets 18

Spliceosomes, CRISPR repeats, snRNAs and Trypanosome Conserved Sequence Blocks 1, 2, and 3 as seen through a G:U intermediate

Alice C Lichtenstein, M.S.

aclsnippets@gmail.com

 

Introduction

When this author was making comparisons for the Snippets 17 installment, looking for matches to the universally Conserved Sequence Block 3 (CSB-3) from trypanosome mitochondrial minicircles, she also found matches to CSB-1 and CSB-2, two other short sequences less stringently conserved in mitochondrial minicircles. The three conserved sequences, CSB-1, CSB-2 and CSB-3, were described by Ray, DS (1989) as being found within a 100 base pair region in all minicircles with CSB-3 being universally conserved. (see table below).  CSB-3 is also known as the Universal minicircle binding sequence because it is the binding site of a protein shown to be involved in the initiation of minicircle replication. (Sela, D and Shlomai, J (2008)).

Dr. Ray's summary of the sequences for CSB-1,2 and 3 are shown below and are the ones used for comparisons in this installment of Snippets..

 

Additionally, CSB-3 has another attribute.  In T. equiperdum, there is a smaller sequence, the kinetoplastid gap (TTGGTGTAAT) that overlaps and is partially embedded in CSB-3.  Judging from the online literature, the kinetoplastid gap was first described by Ntambi, J and Englund, P (1985) as a single stranded region in a newly replicated kinetoplastid double-stranded minicircle.  In their next paper, Ntambi, J and Englund, P (1986) reported that they found ribonucleotides covalently linked to the 5' end of the new strand in the gap, "perhaps  the remnants of a replication primer".

                                                                                from fig. 6  Ntambi, J and Englund P (1985):

            5' Kinetoplastid Gap 3'    
T T T C C C C   T G T
A A A G G G G T T G G T G T A A T A C A
     

Conserved Sequence Block 3 (CSB-3)

       

According to Ray's paper, In kinetoplast mitochondrial minicircles, CSB-1 and CSB-2 are separated by roughly 28-30 base pairs, and CSB-2 and CSB-3 are separated by 38 to 47 base  pairs. In other words, the conserved sequences are separated by intervening sequences that are not conserved among kinetoplastid species.

This seemed analagous to the variety of sequences of  hammerhead ribozymes where conserved sequences are spaced between intervening sequences of three helices. When this author compared miRNAs and hammerhead ribozymes in preparation for Snippets 19, she omitted the intervening helices and got matches for the conserved sequences  (Snippets 19 was finished and posted before this installment, Snippets 18). This approach was very useful and fruitful, so she decided to do the same with CSB-1, CSB-2 and CSB-3 and look for matches.

Using  a G:U intermediate as described under methods she found matches to small nuclear RNAs (SnRNA) specifically the SnRNA U6, SnRNA U2 and SnRNA U4 portions involved in the spliceosome of yeast (Madhani, (1992)), and humans (Madhani, 2013)) Also matches were made to at least one CRISPR repeat found and conserved in an assortment of bacteria. (Kunin, V. et. al. (2007)),

 

Methods

When using a "Rosetta Stone" of a GU complement (or GU intermediate), what seem to be disparate sequences can sometimes be remarkably similar to one another (matches). To do this, a complement sequence is made to the sequence of interest using only G and U. After this, another set of complement sequences are made to the GU intermediate using all four bases. To do comparisons of a sequence of interest with conserved sequences to find matches, many of the usual conventions that are used when looking for homologies, are dispensed with:

Treat sequences as a “string of beads”. They can go 5’-3’ or 3’ to 5’ during comparisons and they can be from DNA or RNA

T and U are interchangeable using this method

            T complements G and A

U complements G and A

C complements G

A complements U or T

            G complements U, T or C

Don’t worry (for the moment) about where the sequence came from. i.e. what species, or biological mechanism.

Do not worry about methylation or modified bases. For instance, D or dihydroxyuridine is considered Uridine because the methodology of making matches only uses complementation.

Also, matches are made using both the positive strand of interest  and the G:U complement strand, not just along one strand only (see last line in the example below).

It is important to understand that there can be more than one match to a given sequence of interest unlike a homology that has exactly the same bases (give or take one or two)

Proteins are not considered when generating matches, only nucleotides. However, conserved nucleotide sequences that are protein binding sites are analyzed for matches.

Consensus sequences are avoided as they mask a variety of different sequences.

 

Example of generating matches, not homologies, using a G:U intermediate:

 

First, a two base complement of G and U is made to the sequence of interest and then four base complements are made from the G:U intermediate to generate matches. The G:U complement is simply a device to figure out the matches. In short, one is substituting a pyrimidine for a pyrimidine or a purine for a purine to make a match..

 

C

C

A

G

G

G

Sequence of interest

G

G

U

U

U

U

G:U Complement

C

C

A

G

G

G

Homology

T

T

A

G

G

G

Example of Match

C

T

G

G

G

A

Example of Match

C

C

A

A

A

G

Example of Match

T

C

G

G

A

G

Example of Match

A

A

C

C

T

C

Match from G:U complement

G

A

T

T

T

C

Match from G:U complement

G

A

A

A

G

G

Match from sequence and  G:U Complement ("hybrid")

 This author arbitrarily accepted a match that was a "hybrid" from both the sequence of interest and the G:U complement if each part of the hybrid was at least 4 bases long.

 

Results and Discussion

Spliceosome Configurations

The spliceosome is the site where introns in pre-mRNA are excised out and their neighboring exons are ligated together to form the mRNA that will be translated into proteins, or transcribed into non-coding RNAs.  Splicing  involves a number of small nuclear RNAs (snRNAs), but to simplify the displayed results, the snRNA sequences shown  (Madhani, H, et. al. (1992 and 1994)) are  snU4, snU6, and snU2. The configuration of  snU2 and snU6 is described below because it is generally accepted as the catalytic site of a spliceosome.

Madhani, H (1992)) posits that the catalytic reaction starts when  a snU4/snU6 (U4/U6) complementation somehow "readys" U6 for its role in catalysis. Just before catalysis, Madhani, N hypothesizes that U6 dissociates from U4 and associates with U2 to form the catalytic site in the spliceosome.  It is not clear from the literature if the U4/U6 construct actually happens, but the U4/U6 graphic below is useful to show the matches with CSB-1,2 and 3.

Also involved, is a "lariat" sequence in the pre-mRNA that binds to a complementary sequence from snU2 and facilitates intron excisions and exon ligations..

Proteins are also involved in the intron excision but are not shown below.  Most molecular biology textbooks have a schematic graphic of the series of events in pre-mRNA intron excision and you are referred to those for a more detailed explanation of splicing.

The nucleotide sequences shown below are derived from the papers referenced at the top of each graphic.

snRNAU6/snRNAU4 (U6/U4) complementation vs CSB-1, CSB-2 and CSB-3

 

 

N N N CSB-1
N N N CSB-2
N N N CSB-3

 

The 5' part of the U6 sequence shown above is a good match for CSB-3 (blue highlighted letters), but that part of U6 is not carried over to the U6/U2 spliceosome catalytic site after dissociation from U4.  The 3' part of U6 shown that contains overlapping CSB-1 (yellow highlighted letters) and CSB-2 sequences (green highlighted letters) is eventually used in the catalytic site of the spliceosome.

What may not be acceptable to some is that the sequences for the CSB-1 and CSB-2 matches overlap one another on both the U4 and U6 sequences. However, if one looks at the complementation of U4/U6, then the CSB-2 sequence on U6 is pretty much the complement of the CSB-1 sequence on U4 and they don't overlap. (Please remember that the sequences are considered "strings of beads" and that using CSB-1 and CSB-2 sequences are simply a handle to figure out matches)

Importantly, two strands of RNA are involved. 

The part of the U6 sequence that has the overlapping CSB-1 and CSB-2 sequences becomes part of the spliceosome catalytic region of S. cerevisiae (Madhani, HD and Guthrie, C (1992)) and humans (Madhani, H (2013)). In the spliceosome, the part of the catalytic region that is a match for CSB-3 is from the U2 and not U6.  The U2 sequence serves to anchor the pre-mRNA lariat sequence to the catalytic site. Of course proteins are involved in the hydrolytic splicing of the pre-mRNA, but they are ignored here.

 

 

 

The Catalytic Region of the Yeast (S. cerevisiae) Spliceosome

 

 

In the above spliceosome, CSB-1 and CSB-2 (overlapping) are conserved in the stem-loop formation of U6 and CSB-3 is a match for the U2 region that binds to the pre-mRNA lariat sequence. This does not mean that kinetoplast minicircles are somehow responsible for spliceosomes or vice-versa, it only means that a match can be made for conserved "strings of beads" (sequences).

Let's not forget, that the kinetoplastid gap (TTGGTGTAAT) part of CSB-3 is single stranded and in this case its match is in the region that serves to join the pre-mRNA lariat  with snRNA U2.

 

CRISPR Repeats

CRISPR repeats were also compared to CSB-1,-2, -3 as well as the portion of snU6 involved in the catalytic site after this author noted that CSB-1 and CSB-2 sequences could be found in a similar configuration to the spliceosome catalytic site. The graphic displayed below is based on the CRISPR repeat found in Kunin, V. et. al.'s paper (2007) figure 2. They analyzed the sequence of clusters of CRISPR repeats in at least nine bacteria, and found that their sequences were mostly conserved. The CRISPR sequence below is from Syntrophus acidotrophicus but is very similar to eight other bacterial clusters.

 

A CRISPR Repeat versus CSB-1,-2,-3 and the U6 portion of the spliceosome.

 

 

 

Because both the CRISPR repeat and U6 had matches to CSB-1, CSB-2 and CSB-3 the next comparison was made between the CRISPR repeat and U6 directly.  The resulting match is notable in that it is very good and doesn't have any bases derived directly from the GU complement.

 

 

 

 

snU6 portion of spliceosome vs. Consensus Hammerhead Ribozyme Sequence

Although a consensus sequence is not usually used for comparisons, the hammerhead ribozyme sequence below was used in a search for matches to the stem loop portion of U6 that is part of the spliceosme of yeast. If the tulane.edu graphic is no longer on line, one can find the same sequence in google images.

http://www.tulane.edu/-biochem/nolan/lectures/rna.htm

                      - -                
                    N     N   HELIX 3      
                    N     N              
                      N N'                
                      C G                
                      A U                
    N N       G A A       H ĺ     N N    
  I     G N C               N N N     I  
  I     C N' G               N' N' N'     I  
    N N       A           C       N N    
                G       U                
    HELIX 2         U   G         HELIX 1    
                    A                    
                                         
       

ribozyme sequence without helices

     
ĺ

cleavage site

     
H

A, T or C not G

     

 

 

The first result is without any intervening sequence, and the second result has one.

Please ignore that T and U were used interchangeably. It has no bearing on the matching results although it is understood that two RNAs are being compared and the result should have probably been displayed with U instead of T in the graphic. More importantly, unlike the match with Kunin, V. et. al.'s  CRISPR repeat, the first part of the conserved hammerhead ribozyme match is derived directly from the GU complement and the second part from the sequence of interest. The reason that the second result was included is that it was approximately the same length as the CRISPR repeat.

The minimal statement that can be made is that the hammerhead ribozyme, the CRISPR repeat and the U6 portion of the spliceosome are involved in splicing.

If the matches shown in this Snippets installment reflect anything real, the different matches bring up many, many questions. One can't really conclude anything for now, but it is interesting to ponder the history (over billions of years) of the sequences. 

The author would like to thank the New York Public Library for accepting her as a MaRLI "independent scholar" that gives her access to the journal databases of a consortium of libraries.

 

 

References

Kunin, V et al (2007), Evolutionary conservation of sequence and secondary structures in CRISPR repeats, Genome Biology vol 8 issue 4 R61

Madhani, H., and Guthrie, C.,  (1992), a Novel Base-Pairing Interaction between U2 and U6 snRNAs Suggests a Mechanism for the Catalytic Activation of the Spliceosome, Cell, vol 71, pp 803-817

Madhani, H and Guthrie, C, (1994), Dynamic RNA-RNA Interactions in the Spliceosome, Ann. Rev. Genetics vol 28, pp 1-26

Madhani, H., (2013), snRNA Catalysts in the Spliceosome's Ancient Core, Cell, vol 155, pp 1213-1215.

Ntambi, JM and Englund, PT (1985) A Gap at a Unique Location in Newly Replicated Kinetoplast DNA Minicircles from Trypanosoma equiperdum, The Journal of Biological Chemistry, vol 260, pp 5574-5579

Ntambi, JM et al (1986), Ribonucleotides Associated with a Gap in Newly Replicated Kinetoplast DNA Minicircles from Trypanosoma equiperdum., The Journal of Biological Chemistry, vol 261, pp 11890-11895

Ray, D., (1989) Conserved Sequence Blocks in Kinetoplast Minicircles from Diverse Species of Trypanosomes, Molecular and Cellular Biology, vol 9, no 3, pp 1365-1367

Ryan, D. and Abelson, J., (2002) The conserved central domain of yeast U6 snRNA: Importance of U2-U6 helix 1a in spliceosome assembly, vol 8, pp 997-1010

Sela, D. and Shlomai, J., (2008) Regulation of UMSBP activities through redox-sensitive protein domains, Nucleic Acids Research, vol 37, no 1, pp 279-288