Molecular keys to speciation: DNA polymorphism and the control of genetic exchange in enterobacteria

  1. Marin Vulić,
  2. Francisco Dionisio,
  3. François Taddei, and
  4. Miroslav Radman

+ Author Affiliations

  1. Laboratoire de Mutagenèse, Institut Jacques Monod, 2 Place Jussieu, 75251 Paris Cedex 05, France
  1. Communicated by M. S. Meselson, Harvard University, Cambridge, MA (received for review May 6, 1997)



Speciation involves the establishment of genetic barriers between closely related organisms. The extent of genetic recombination is a key determinant and a measure of genetic isolation. The results reported here reveal that genetic barriers can be established, eliminated, or modified by manipulating two systems which control genetic recombination, SOS and mismatch repair. The extent of genetic isolation between enterobacteria is a simple mathematical function of DNA sequence divergence. The function does not depend on hybrid DNA stability, but rather on the number of blocks of sequences identical in the two mating partners and sufficiently large to allow the initiation of recombination. Further, there is no obvious discontinuity in the function that could be used to define a level of divergence for distinguishing species.

A species may be defined as a population of organisms capable of sharing their gene pool through mating and genetic recombination. The inability to undergo genetic recombination with each other isolates related species independently of geographic isolation. The structural basis of the barrier to genetic recombination on the molecular level is the difference in their DNA sequences (1, 2).

Intergenomic recombination between the enterobacteria Escherichia coli and Salmonella typhimurium (official designation Salmonella enterica serovar typhimurium) is controlled negatively by mismatch repair (MMR) proteins (principally MutS and MutL) and positively by the induction of the SOS system (principally through the overproduction of the RecA protein) (2). A consequence of the activity of these two systems is the establishment of a potent genetic barrier through a 105-fold reduction in recombination frequency between the two ≈16% divergent genomes, whereas these two systems have no (MMR) or little (SOS) influence on recombination in isogenic crosses.

In yeast, DNA sequence divergence inhibits both intra- and interchromosomal recombination in mitosis and meiosis through the activities of MutS and MutL homologs (36), implying mechanisms of genetic barriers similar to those studied in bacteria (see also ref. 7 for effects of mouse MutS homolog on mitotic recombination in mice).

Two main questions concerning the establishment of the genetic barriers remaining unanswered by previous studies are as follows: (i) At which level of sequence divergence do the mismatch repair and SOS systems start to exert their effects; in other words, how diverged must two genomes be for an efficient genetic barrier to be established? (ii) Given that MMR deficiency and strong SOS induction disrupt the genetic barrier (2) and thus “reverse” the process of species separation, is the manipulation of these two systems in the opposite direction sufficient to speed up the process and create new genetic barriers, mimicking a speciation event? To address these questions we have performed a quantitative analysis relating recombination frequencies in conjugational crosses and genomic sequence divergence between different enterobacterial strains and species under different levels of expression of the key components of the MMR and SOS systems.

Because these two DNA repair systems are influenced by environmental and physiological factors, and because they control both genetic barriers and the mutation rate (8), they may be thought of as molecular links between environmental changes and the creation of genetic diversity, influencing directly the rate of bacterial evolution, including pathogenicity (9) and speciation (2).


Two sets of conjugational crosses were performed: (i) using one donor (Hfr E. coli K-12) and different enterobacterial F recipients or (ii) using one recipient (FE. coli K-12) and different enterobacterial Hfr donors with the same origin of transfer. The strains used in one-donor crosses were E. coli K-12 “maria” [isogenic derivative of HfrPK3 (10), rifampicin-resistant, argE::Tn10, malB2::Tn9], E. coli K-12 “m+” [F derivative of Hfr maria; argE+, malB+, nalidixic acid-resistant (nalR)], nalR derivatives of F strains of E. coli B, Shigella flexneri 5 BS176 (from the collection of P. J. Sansonetti, Institut Pasteur, Paris), Escherichia fergusonii ATCC 35471 (11), and S. typhimurium SL4213 (12). The donor strains used in one-recipient crosses were rifampicin-resistant derivatives of the following Hfr strains: E. coli K-12 BW113 (13), E. coli C c-1073 (14), Sh. flexneri 256 (15), and S. typhimurium SA965 (16). The recipient strain referred to as wild type was a nalR derivative of FE. coli K-12 AB1157 (17). Other strains were its derivatives carrying mutS201::Tn5, lexA1 malB2::Tn9, recAo98 srlC300::Tn10 (2), lexA3 (18) alleles or multicopy plasmids pBA40 and pMQ339 that overproduce wild-type MutS and MutL, respectively (19) (referred to as pmutSL), or pJWL118 (20) carrying lexA3 (Ind) gene (18) (referred to as plexA3).

Logarithmic-phase bacteria (2–4 × 108) were mixed in a 1:1 (Hfr:F) ratio, deposited on a 0.45-μm pore size filter (Schleicher & Schuell), and incubated on prewarmed rich-medium agar. After (i) 45 or (ii) 60 min at 37°C, the conjugants were resuspended in 10−2 M MgSO4 and separated by swirling with a Vortex mixer. The exconjugants were plated on (i) rich-medium agar plates supplemented with chloramphenicol (30 μg/ml) to select for chloramphenicol-resistant recombinants in one-donor crosses or on (ii) M63 medium supplemented with histidine, leucine, proline, threonine, methionine, aspartic and nicotinic acids (100 μg/ml each), thiamin (30 μg/ml), and glucose (0.4%), lacking arginine to select for arg+ recombinants in one-recipient crosses. Donors were counterselected by 40 μg/ml nalidixic acid. For strains containing plasmids conferring antibiotic resistance, the appropriate antibiotics were included in the medium at concentrations of 100 μg/ml (ampicillin), 30 μg/ml (chloramphenicol), and 12.5 μg/ml (tetracycline). Recombinants were scored after 36 and 48 hr in one-donor and one-recipient crosses, respectively. Recombination frequencies per donor were calculated after subtracting the number of unmated revertants. The log10 of the median values of at least three independent crosses per mating pair were plotted against overall genomic sequence divergence, expressed as the fraction (percentage) of two parental genomic DNAs that does not hybridize at 60°C under standard conditions (11, 21) [value used for E. coli C is the mean value of 28 E. coli strains (21)].


Conjugational crosses to measure the frequency of genetic recombination were performed between enterobacteria with DNA sequence divergence of up to ≈16%, under different levels of expression of the key components of the MMR and SOS systems. In the absence of adequate knowledge of the genomic sequences of all mating partners, we consider the global genomic sequence divergence as the fraction of two parental genomic DNAs that does not hybridize at 60°C (11, 21). The observed frequencies of recombination fit the log–linear regressions whose parameters are listed in Table 1.

Table 1

Parameters of log–linear regressions: log (frequency of recombination) vs. genomic sequence divergence

The results of crosses with either the same donor or the same recipient show that recombination frequency decreases exponentially with increasing sequence divergence (Fig. 1). The close correspondence of the two classes of crosses suggests a similar recombinational capacity of all tested bacteria.

Figure 1

Relationships between frequency of recombination and genomic sequence divergence. The global genomic sequence divergence is indicated as the fraction of two parental genomic DNAs that does not hybridize at 60°C. The scale on the upper x-axis is the approximate corresponding percentage of sequence divergence [conversion according to the formula given in ref. 40, and using 1.18% sequence mismatch per 1°C of ΔTm depression (31)]. Shown below are regressions with the corresponding coefficients of determination (r2). Frequencies of recombination in one-donor (E. coli K-12) crosses and one-recipient (E. coli K-12) crosses are represented by ○ and □, respectively.

Inactivation of the MMR system (Fig. 2A, mutS) increases the level of recombination proportionally to the extent of sequence divergence—i.e., decreases the slope of the graph without changing the intercept—whereas overproduction of mismatch-binding proteins (pmutSL) increases the slope. If MMR is functional, the level of recombination is reduced by the same factor at every level of divergence examined, by the presence of a stable mutant SOS-repressor (Fig. 2B, lexA1) or even more so by the overproduction of such a repressor (plexA3), both of which change the intercept but not the slope of the graph. Similarly, overproduction of RecA shifts the curve higher without a major effect on the slope.

Figure 2

Effect of MMR (A) and SOS (B) systems on frequency of recombination in one-recipient crosses. The recipient strain indicated as wild type was AB1157 nalR (○). All other recipients were AB1157 nalR derivatives carrying (A) MutS deficiency (□) or multicopy plasmids that overproduce wild-type MutS and MutL proteins (▵); or (B) a block to induction of the SOS regulon caused by the lexA1 allele, which encodes an uncleavable LexA repressor (□); constitutive overproduction of RecA conferred by the recAo98 allele (◊); or a plasmid carrying the lexA3(Ind) gene (▵). In the case of plasmid-bearing strains, vectors were confirmed to have no relevant phenotypes of their own (data not shown). The log–linear regressions and corresponding coefficients of determination are given in Table 1. The Salmonella point in MutSL-overproducing background (A, ▵) was omitted from linear regression analysis (explanation in the text).


The experiments described here were designed to examine the nature of the genetic barriers among enterobacteria, in particular to determine the relationships between DNA sequence divergence, extent of genetic recombination, and speciation.

The extent of genetic isolation we observe between enterobacteria is a simple mathematical function of DNA sequence divergence. It increases exponentially with increasing sequence divergence. The point on the sequence divergence axis which would delineate a species boundary by eventual abrupt decrease in gene exchange does not exist because the frequency of genetic recombination varies continuously with sequence divergence. The data obtained from Bacillus species (22) and yeast (41) can be described by the same function, so it appears that it could be a general relationship.

Although sequence divergence is clearly the structural basis for the genetic barrier, the effectiveness of this barrier is under the control of cellular systems, particularly those responsible for the initiation of genetic recombination and those responsible for editing recombinational intermediates (1, 2), SOS and MMR, respectively (Figs. 2 and 3). By changing these parameters it is possible to disrupt or modify the genetic barriers between different species and to create new ones between closely related strains of the same species.

Figure 3

Maximal and minimal frequencies of recombination vs. sequence divergence in one-recipient crosses. The recipient strain indicated as wild type was AB1157 nalR (○). Other recipients were AB1157 nalR derivatives carrying either block to induction of the SOS regulon caused by the lexA3 allele, which encodes an uncleavable LexA repressor, and multicopy plasmids that overproduce wild-type MutS and MutL proteins (□), or MutS deficiency and constitutive overproduction of RecA (▵). The log–linear regressions and corresponding coefficients of determination are given in Table 1.

When mismatch-binding activity is enhanced and SOS induction is blocked (Fig. 3, lexA3 pmutSL) the genetic isolation is extremely sensitive to sequence divergence, creating an efficient genetic barrier between closely related strains such as E. coli K-12 and E. coli C, or E. coli K-12 and Sh. flexneri. This corresponds formally to a speciation event. On the other hand, inactivation of MMR and overproduction of RecA (Fig. 3, mutS recAo98) relaxes the genetic barrier in the range of divergence examined in these experiments, allowing efficient recombination between bacteria as diverged as E. coli and S. typhimurium. For example, the same frequency of recombination, ≈3 × 10−3 (Fig. 3), can be found between E. coli K-12, Sh. flexneri, and S. typhimurium donors and E. coli K-12 recipient, depending solely on the genetic background of the recipient.

This shows that the effectiveness of a genetic barrier is not a constant of a given species, raising the question of its regulation.

The MMR and SOS systems are major determinants of mutation rate. Fully operative MMR and noninduced SOS keep the mutation rate low and set an upper limit of the sensitivity of genetic isolation to sequence divergence. Although the overproduction of MutS and MutL can greatly increase this limit (Fig. 3), it does not lower the mutation rate (R. S. Harris, G. Feng, K. J. Ross, R. Sidhu, C. Thulin, S. Longerich, S. K. Szigety, M. E. Winkler, and S. M. Rosenberg, personal communication). Therefore, increased levels of these proteins are not likely to be under positive selective pressure. Inactivation of the MMR components and/or induction of the SOS response confers an increase in mutation rate and the relaxation of intra- and intergenomic recombinational barriers [between similar repetitive sequences (23) in the former case and between entire similar chromosomes (2) in the latter case].

Thus, barriers to recombination are the result of both long-term and immediate effects of these two cellular systems. Their long-term effects determine the rate of formation of its structural basis, DNA sequence polymorphism within an evolutionary lineage, through the control of mutation rate. Their immediate effects, on an individual level, control the efficiency of genetic barriers through their effects on recombination.

In yeast, DNA sequence divergence inhibits both intra- and interchromosomal recombination in mitosis and meiosis through the activity of yeast homologs of bacterial MutS and MutL proteins (36). In intrachromosomal mitotic recombination between two repeated sequences the relationship between frequency of recombination and sequence divergence is log–linear and follows the rules described here, although in a smaller range of divergence, presumably because of a higher concentration, or efficiency, of MMR proteins (41).

The evolutionary conservation of the key MMR and recombination components encourages the extension of the ideas discussed to the eukaryotic world (see also ref. 7 for similar effects of Msh2 function on gene targeting recombination in mouse cells).

A number of environmental and physiological factors have been found to affect the state of the SOS and MMR systems in bacteria (refs. 2426 and R. S. Harris, G. Feng, K. J. Ross, R. Sidhu, C. Thulin, S. Longerich, S. K. Szigety, M. E. Winkler, and S. M. Rosenberg, personal communication). Under conditions of metabolic stress, the SOS system is induced, whereas the MMR system is inhibited, which leads to an increase in point mutations, intragenomic plasticity, and horizontal gene transfer, all resulting in rapid genetic diversification (for review see ref. 27). Acquisitions of pathogenicity and chromosomal resistance to antibiotics result from such genetic mechanisms (9, 2830). When an environmental stress is overcome by an adaptation, then the return to MMR proficiency and repressed SOS maintains genomic stability and separates different genetic lineages (“species”) by restricting gene exchange. Therefore these two systems can be regarded as mechanisms of “evolutionary homeostasis.” Their action, in response to changing environments, contributes to the variability necessary for adaptation, and when the adaptation is reached, they restore genetic stasis. Such control of genetic diversity implies a stepwise mechanism of evolution, because each environmental change of significant amplitude and persistence would be met by a diversification within a given lineage, setting the basis for the speciation event(s).


We thank I. Matić for advice and criticism in the course of this work, R. Wagner, M. Bianchetta, S. Rosenberg, D. D’Ari, and F. Iris for help in the writing of this paper, S. Jinks-Robertson, M. Lipsitch, A. Datta, M. Hendrix, and S. Rosenberg for helpful discussion and for sharing their results prior to publication, and G. Bertani, J. Lawrence, and P. J. Sansonetti for bacterial strains. This work was supported by the Centre National de la Recherche Scientifique (Actions Concertées Coordonnées–Sciences du Vivant) and the French Genome Project (Groupement de Recherches et d’Etudes sur les Genomes). M.V. was in part supported by the Open Society Institute (Supplementary Grant Program), F.D. was supported by the Gulbenkian Foundation and Program PRAXIS XXI. F.T. is on leave of absence from the Ecole Nationale du Génie Rural des Eaux et des Forêts.


DNA Sequence Divergence and Mechanism of Recombination.

The observed recombination–divergence relationship is not a simple reflection of the decrease in stability of hybrid DNA, which is linear (31). Instead, the exponential decrease observed may reflect the number (N) of mismatch-free blocks of a given length (H). This number is given by Formula1 where H is the length of the block of sequence identity in bp and L is the length of potentially recombining DNA sequences of divergence d (see below for the demonstration). Assuming that all blocks of length H have equal probability of serving as initiators of recombination, recombinational events between sequences of length L will happen at a frequency equal to hN, where h is the probability of recombination of a single block of length H. [Note that the frequency of recovery of recombinants in a conjugational cross is the probability of completing two presumably independent recombinational events (crossovers). Therefore the actual H parameters are half of the value calculated using the slope of the experimental curves.]

According to Eq. 1, log–linear relationship of N vs. divergence varies with a change in length of the target DNA (L), which affects the intercept (Fig. 2B; Table 1), whereas the length of the mismatch-free block (H) affects the slope (Fig. 2A; Table 1) and possibly the intercept when L might be comparable to H in length (Table 1, lexA1 vs. lexA3 pmutSL).

It follows that the concentration of mismatch-binding proteins alters the apparent length of the segment of sequence identity needed for successful recombination (H), and that of RecA alters the total length of DNA available for recombination (L). This could be explained by recalling their respective roles in vivo (1) and their biochemical activities in vitro (32).

The initiation of recombination in bacteria is largely under the control of RecA, the level of which is controlled by the SOS system; i.e., induction of the SOS system increases the intracellular level of RecA. Because DNA is recombinagenic when coated with the RecA protein (33), an increase in RecA concentration could increase the total length of DNA available for recombination—i.e., L in Eq. 1 (Fig. 4A). Although a short segment of sequence identity is necessary for the initiation of recombination (3436), RecA-catalyzed elongation of heteroduplex is only weakly affected by sequence divergence up to quite high levels of divergence (32, 37).

Figure 4

Increasing sequence identity requirement for recombination with increasing concentration of mismatch-binding proteins. (A) After the transfer into recipient cell and subsequent replication of the donor DNA, single-stranded tails are produced by the action of the RecBCD enzyme. The RecA protein (shaded ellipses) polymerizes on single-stranded DNA, which becomes the substrate for the homology search and subsequent strand-exchange reactions. L is the length of that nucleoprotein filament available for recombination. (B) Shown are heteroduplexes formed by strand transfer after the pairing of the short mismatch-free sequence (H0) at three different loci (a, b, and c). H0 corresponds to the minimal sequence identity sufficient to initiate recombination or MEPS (minimum efficient processing segment) as defined by Shen and Huang (34). If the concentration of functional mismatch-binding proteins is zero, all such products will give recombinant molecules (a, b, and c). If these proteins are functional, only a certain fraction of such heteroduplexes will yield recombinant molecules, depending on the average number of mismatched bases within the heteroduplex region. The higher the concentration of these proteins, the longer the segment of mismatch-free DNA required to complete the event. As a result only those heteroduplexes with high sequence identity will give rise to recombinant molecules (b and c with moderate concentration of MutSL, and only c with high concentration of MutSL) which is reflected in an apparent increase in the length of H (H*).

Editing of recombinational intermediates is performed by the methyl-directed MMR system, which recognizes mispaired and unpaired bases in the joint heteroduplex regions and aborts recombination events between nonidentical sequences (1, 2, 32).

After the initial strand transfer, requiring sequence identity, extension of the strand exchange process produces mismatches, the substrates for mismatch-binding proteins (MutSL). For a given concentration of MutSL there will be a corresponding value of local sequence divergence (within the heteroduplex region) for which mismatches are rare enough not to be recognized by MutSL before recombinational resolution occurs (Fig. 4B). It follows that the length of the segment of sequence identity needed for successful recombination is the sum of the length of the segment necessary for initiation and of the average distance between two mismatches in the heteroduplex region consisting of two parental strands of such local sequence divergence. Thus, H in Eq. 1 may have at least two components: H0, the length of the segment required for the initiation step, which is constant, and the variable H*, which is a function of the concentration of mismatch-binding proteins capable of preventing strand exchange and is inversely proportional to the highest local sequence divergence that escapes abortion by those proteins (Fig. 4). [A weak influence of SOS induction on parameter H*, presumably through the overproduction of the RuvAB enzymes required for resolution, cannot be excluded (Fig. 2B, recAo98).]

However, when sequence divergence is so great as to saturate MMR (38, 39), N will decrease only as a function of −H0d, the product −H*d becoming a constant. That level of divergence will be inversely proportional to the concentration of MutSL. Indeed, at high concentrations of MutSL (Fig. 2A, pmutSL) and in the range of divergence between Shigella and Salmonella, the frequency of recombination decreases similarly as in the mutS genetic background.

Recombination–Sequence Divergence Relationship.

Let L be the length of DNA in base pairs available for recombination and H be the minimal length of uninterrupted sequence identity required to initiate recombination. Let n be the number of base differences within L, and N the number of possible sites to initiate recombination.

If n = 0, N = N0 = LH + 1. If n > 0, the number of sites to initiate recombination is N0E1 + E2E3 + E4 − … + (−1)min{H,n} EH, where Ej is the expected number of blocks of bases of length H with exactly j mismatches: Formula with j ∈ {1, 2, … , min{H,n}} and min{H,n} being the smaller value of the two. Then N is given by Formula This equation can be simplified to Formula in that d = n/L = divergence between the two DNA strands. With n/L sufficiently small, and using the Taylor series for an exponential function, the last equation approximates to Formula This equation describes the experimental curves. The H parameters which give the best fit are estimated at approximately 15, 62, and 212 bp in mutS, mutS+, and overproduced MutSL background, respectively. The relative changes of L with respect to the wild type which fit found intercept changes are approximately 1.5-fold increase and 11- and 30-fold decrease for recAo98, lexA1, and plexA3, respectively.



 article with references here



PDF format soon (\"past research\" is under construction)