The Structure of DNA

The year 1953 was a momentous one for biology. James Watson and Francis Crick announced their discovery of the structure of DNA, the molecule that encodes the instructions for being alive. This discovery transformed our perspective of biology and genetics, and heralded the arrival of the ‘molecular age’ in biology.

Please read the original article by Watson and Crick. This is placed as a PDF file on the course webpage (DNA Structure (JDW &FC).

DNA is a beautiful molecule, elegant in its chemistry, fitting of the grandeur of life.

The X-ray diffraction pattern of DNA fibers (obtained by Rosalind Franklin and Maurice Wilkins) indicated a helical structure. Model building exercises led Watson and Crick to the double helical structure that one finds everywhere on textbook covers and T-shirts.

Structural features of B form DNA

We will describe below the essential features of the most common biological form of DNA, B-DNA. You can find differences between B form and the other two forms, A-DNA and Z-DNA, listed on the course web page.

1. The backbones that intertwine to form the double helix are made of repeating units of ‘sugar-phosphate’. The sugar is a pentose-2’-deoxy ribose. The 3’ and 5’ hydroxyl groups from adjacent sugar units are esterified with phosphoric acid to form a 3’, 5’-phosphodiester linkage. The sugar-phosphate units are repeated over and over.

Note that the 3’,5’ linkage gives the sugar-phosphate chain directionality. That is, you could move along the chain in the 5’ to 3’ direction or in the 3’ to 5’ direction.

2. One of the backbones runs in the 5’ to 3’ direction; the other in the 3’ to 5’ direction. The chains are intertwined in a right handed fashion. Therefore, DNA is an antiparallel double helix.

The intertwining of strands makes it impossible to separate the two strands by pulling them apart. Strand separation requires unwinding, rotation in the left handed sense. This type of double helix is called a ‘plectonemic’ double helix. If the two strands (each one a right handed helix) were side-by-side , they can be separated by pulling them away from each other. Such a double helix would be called a ‘paranemic’ double helix.

3. There are four bases that provide the coding alphabets of DNA. They are comprised of two purines, A and G and two pyrimidines, C and T. There is a pairing rule by which a purine is hydrogen bonded to a pyrimidine in DNA. A is paired with T through two hydrogen bonds, and G is paired with C through three hydrogen bonds. The bases are inside of the double helix, with complementary bases from opposite strands taking part in hydrogen bonding.

4. The helix axis runs perpendicular to the plane of the base pairs which are stacked one above the other, like the rungs of a ladder. The stacking interactions (hydrophobic forces) between base pairs and hydrogen bonding interactions between bases provide the stabilizing forces for the structure. Note that the hydrophobic interactions provide the major share of the stabilizing forces.

Proof: the melting temperature of DNA is lower in 15% ethanol than in aqueous solution. We will discuss this experimental result in class.

5. DNA has a diameter of approximately 2 nm or 20Ǻ. The distance between two neighboring base pairs is 3.4Ǻ. There are 10.05 base pairs (bp) per one turn of DNA, or for simplicity we will assume 10 bp per turn. The pitch of the helix, the distance one travels along the DNA axis for one complete turn, is 34Ǻ. The sugar conformation in B-DNA is 2’-endo (we will discuss this in class).

6. The strands winding around each other form two types of grooves through which one can see the base pairs in the interior. These grooves are called major and minor grooves. When one looks at DNA from one side, one sees alternating major and minor grooves. Every major groove has a minor groove behind it; and every minor groove has a major groove behind it. If two of us stand on opposite sides of a DNA model and direct our eyes to the same base pair, one of us will be seeing it from the major groove side; the other from the minor groove side.

We will define major and minor grooves in class. In B form DNA, they are easily distinguished. The rims of the major groove are more widely separated than those of the minor groove.

7. Interesting ways to think about DNA:

1. If one wants to make contact with a specific base pair or a set of base pairs in DNA, one could so by approaching them from the major groove side or the minor groove side. Proteins that regulate the function of genes do contact base pairs in this manner.

2. Specificity of recognition, or discrimination, is the ability to tell one base pair from the other possible pairs: AT from TA, GC and CG. Proteins recognize DNA through hydrogen bonding and hydrophobic (van der Waals contacts) interactions. Hydrogen bonding possibilities from the major groove side provide more stringent discrimination than those from the minor groove side. Hence, most often specific DNA-protein interactions are mediated through the major groove.

3. The rims of each major groove (and minor groove) are formed by the oppositely oriented strands. Hence the two grooves also wind around each other in a right handed fashion. You can imagine them to form a plectonemic double helix.

4. You can think of two skiers perched on top of a B-form DNA tower. Let us say that one is on the major groove side, the other on the opposite minor groove side. If they start skiing at the same time and at the same speed, each would be coming down the intertwining slopes, one negotiating the major groove slope and the other the minor groove slope, positioned on opposite faces of the DNA at every instant. At the bottom of the slopes (as they end their runs), they would be headed away from each other.

In fact protein machines move along DNA over major and minor grooves performing important functions. They burn energy, by hydrolysis of ATP, and travel in one direction along the DNA track. In our skiing example, we imagined a tower, so gravity was on the side of the skiers. They could have tumbled down clumsily without expending energy, but that would not be pretty. They have to burn calories to stay the course, and make the correct turns. One can see the grace and elegance of the adept skier in the biological machines that traverse DNA at great speeds, while performing important biological functions.

Life is Beautiful. That beauty is derived ultimately from the structure of DNA that makes life possible.

A little bit of DNA physics and chemistry:

1. When DNA is heated, the hydrogen bonds break, and strands separate. The process is called ‘melting’ or denaturation. The melting curve has a typical shape: a flat line (native DNA), a sharp rise (phase transition) and a flat line (completely denatured DNA.

DNA absorbs UV light, because of the conjugated double bonds in the bases. In the hydrogen bonded state, the mobility of the ‘pi’ electron cloud is restricted compared to that in the free base. Hence, double stranded DNA has less UV absorbance than single stranded DNA. Hence ‘melting’ of strands can be followed by UV absorbance.

If the molten strands are rapidly chilled, they stay denatured. If they are cooled slowly, strands will collide, and depending on complementarity, they will renature (or reanneal) to form a double helix. Hence, hybridization can be used to find the extent of relatedness between DNAs from different sources.

We will discuss more details in class.

2. Methods of DNA separation

DNA molecules can be separated by centrifugation or by gel electrophoresis. Centrifugation can be performed, based on sedimentation velocity or on density. One can intentionally decrease the density of DNA by intercalating flat molecules (chloroquine or ethidium bromide) between base pairs. The gel matrixes used are agarose (for relatively large molecules) or polyacrylamide (for relatively small molecules). Very large molecules (for example chromosomes of yeast) can be separated by pulse field gel electrophoresis.

More on these methods in class.

3. DNA sequencing

The first method of sequencing DNA molecules (chemical method) was developed by Maxam and Gilbert. The idea is to label a DNA strand at one end, and then use chemical reactions to break the chain at specific bases, A or G or C or T. Each broken fragment will carry the label placed at the end. When separated by electrophoresis in a polyacrylamide gel, they will separate according to their sizes, the resolution between neighboring fragments on the gel being a single base. One can read off the sequence starting from the smallest and proceeding stepwise to the largest of the resolved fragments.

The more efficient and widely used enzymatic method for DNA sequencing was developed by Fred Sanger. Here, instead of breaking preexisting DNA chains at specific bases chemically, new chains are allowed to grow but are terminated at specific base positions, A, G, T or C. One ends up again with a whole slew of fragments, the minimum difference between tow fragments in the population being a single base. Once again, gel electrophoresis is used to separate these fragments, and read off the sequence.

Further details of the two methods will be discussed in class.

Genome Projects:

The ability to rapidly sequence large fragments of DNA laid the foundation for genome projects, meaning sequencing genomes of organisms to the last nucleotide. Many bacterial and fungal species have been completely sequenced. Some of these are model organism, Escherichia coli and Saccharomyces cerevisiae, for example. Others are pathogenic, and are important from a medical standpoint.

The crown jewel among these projects is the human genome project completed over several years with public funded multinational effort (led by the National Institutes of Health) complemented by private funded effort through Celera genomics.

The NIH led project first created a detailed physical map of each human chromosome. To do this, one must have large chunks of the human DNA cloned into suitable vectors to create what are called bacterial or yeast artificial chromosomes (BACs or YACs). The method of DNA hybridization can be used to identify overlapping and non-overlapping clones. This information can be exploited to order the clones in a sequential fashion (‘Contigs’ or contiguous sequences) to recreate the organization of individual chromosomes. Known DNA sequences (for example, previously studied genes, or sequences randomly obtained from different clones), referred to as STS or sequence tag sites, provide landmarks on the chromosomes. The individual clones that provided the source for the arrangement of contigs or the physical map of each chromosome were sequenced, and the sequences assembled in overlapping fashion to obtain the complete sequence of the genome.

The Celera effort used enormous sequencing power to obtain sequences of clones without constructing detailed physical maps. Powerful computational methodologies were used to align the sequences using overlaps. This method was faster, however left gaps in many places where, even the most advanced computational algorithms proved incompetent to produce gap-free alignments.

By combining the information from the publically and privately supported projects, gaps were filled, mistakes were corrected, and the complete human genome sequence assembled and annotated.

Animated presentation of DNA structure

After listening to the class lecture and reading the notes, please go the website given below for an animated presentation of DNA structure:

http://www.sumanasinc.com/webcontent/animations/content/DNA_structure.html

If clicking on the link does not work, you may have to cut and paste the URL address into the window of your browser.

Finally address the ‘Points to Consider’ section to make sure that your concepts are clear.

Static snapshots of DNA structure and organization

See also the series of slides put together on ‘DNA Structure and Organization’. See if these make sense in the light of what we discussed in class.

The slides towards the end anticipate what we will focus on next: DNA higher order structure, organization and topology. Don’t pay too much attention to these at the moment. They preview some of the more interesting aspects to be covered later.

1. The chromatin in higher cells is DNA wound around histone proteins (nucleosomes) and then superwound in a higher order compact structure.

2. DNA can bend and form interesting strand connections: three-way and four-way junctions. A three-way junction bound by three monomers of a sequence-specific protein is shown in a space filling model.

3. Proteins bound at specific sites on DNA molecules (for example a replication origin) can be viewed by electron micrography (EM). EM is also a powerful tool for viewing the path of DNA, for example, the crossings in knots, links or in plain supercoils.

4. Like a ball of wool or a long and unwieldy garden hose, DNA can get entangled easily because of its large size. The entanglements can often result in formation of DNA knots and links. The formation of linked circles during replication is a direct consequence of the interwound structure of the DNA duplex.

5. The cell must be enzymes that can unknot and unlink DNA. Unlinking is an important step in the segregation of replicated genomes into daughter cells. These enzymes can also change the degree of interwinding of the two strands of DNA (changing twist and writhe). They change the linking number in DNA, and alter DNA topology.

6. Negative supercoiling (a topological term indicating underwound DNA strands) helps biological information processing, for example DNA replication, and transcription. The single stranded RNA molecules are handled by a protein machinery called the ribosomes to convert ‘one dimensional’ nucleic acid information into ‘three-dimensional’ protein information. The simple topology of single strands (RNA molecules) makes it easier to manipulate them biochemically and biophysically.

7. Enzymes that change the topology of DNA are called topoisomerases. The action of one such enzyme involves cutting of the DNA strands, passage of a DNA segment through the gate formed by the cutting and sealing of the strands to close the gate again. Such an enzyme in E. coli, DNA gyrase, plays critical during DNA replication.