Different structures were detected between the Brazilian and US genetic bases
Principal component analysis (PCA) revealed that most Brazilian cultivars (red circle) were grouped with a subgroup of US cultivars (green circle). Most of them belonged to MG VI, VII, VIII and IX (Fig.1A). Based on the Evanno criterion (Fig.1B), the structure results based on four groups (K=4) showed a high K value (312.35), but the upper-most level of the structure was in two groups (K=2; K=1885.43).
Population structure analysis between Brazilian and US germplasms. (A) Principal component analysis of Brazilian and US soybean cultivars based on SNPs markers; (B) Delta K as a function of the number of groups (K); (C) assignment coefficients of individual cultivars (bar plots) considering K=2; and (D) considering K=4.
Considering K=2 (Fig.1C), the Brazilian cultivars jointly presented an assignment to the Q1 group (green) equal to 86.7% which was much higher than that observed for the US cultivars (43.9%). Considering K=4 (Fig.1D), the Brazilian cultivars jointly presented an assignment to the Q2 group (red) of only 4.7% while the US cultivars jointly presented an assignment to the Q2 group of 27.4%. The Q1 group (green) has a lower assignment in Brazilian cultivars than US accessions (11.1%, and 30.1%, respectively). These results demonstrate that the set of Brazilian cultivars has a narrower genetic base compared to US cultivars.
When we compared the cultivars between maturity groups, we observed a clear differentiation between early and late groups. The highest genetic distances (0.4158) observed were between MG 000 and MG VIII-IX cultivars (Supplementary Table S1).
To examine the influence of maturity groups on population structure, we analyzed the average assignment coefficients (K=4) of Brazilian and US cultivars for each maturity group (Supplementary Figure S1). Brazilian cultivars from maturity group V presented Q1, Q2, Q3, and Q4 equal to 30.4%, 1.9%, 32.1, and 32.0%, respectively; US cultivars from this same maturity group (V) presented means of Q1, Q2, Q3, and Q4 equal to 9.2%, 8.2%, 65.1%, and 17.6%, respectively. This result indicates that, although belonging to the same maturity group, the Brazilian group V cultivars present considerably different allelic frequencies than the US cultivar group V cultivars, especially for Q3 and Q4. US cultivars belonging to earlier maturity groups (00, 0, I, and II) had significantly higher mean assignment coefficient to Q2 group (red) compared to other later maturity groups (V=8.2%, VI=8.1%, VIII=5.0%, and IX=13.6%). In the case of Brazilian cultivars, the average assignment coefficients for Q2 were much lower (V=1.9%, VI=4.2%, VII=5.6%, VIII=4.9% and IX=4.9%). These results demonstrate an important allelic pool that distinguishes early to late genetic materials present in Q2.
In general, the Brazilian germplasm showed few differences between maturity groups (Supplementary Table S1 and Fig.2A). This was also observed when we generated a population structure analysis exclusively with these cultivars (Fig.2C). In contrast, the US germplasm showed a high variation of genetic distance when we analyzed their maturity groups (Supplementary Table S1) with a clear clustering of cultivars (Fig.2B), which is more obvious when we observed their exclusive population structure analysis (Fig.2D). The results show that early cultivars tend to be genetically distant from late cultivars in the US. The maturity groups from the southern-breeding program of the US (V, VI, VII, VIII, and IX) tend to be less genetically divergent versus northern groups (00, 0, I, II, III, and IV). This agrees with previous studies indicating distinct Northern and Southern genetic pools in the US6. There is a low divergence among US soybean cultivars from maturity groups higher than V (Fig.2B). In contrast, cultivars from MG 00 and 0 were more genetically distant from cultivars of MG III and IV while maturity groups I-II were an intermediate group. The population structure analysis showed a high influence of Q2 in cultivars with MG 00-II. For cultivars in MG III and IV, we observed an increase of Q1. Finally, there is a high influence of Q3 in cultivars with maturity groups higher than V, which agrees with the genetic distance data.
Population structure analysis of Brazilian and US cultivars according to their maturity groups. Principal component analysis (PCA) within Brazilian (A) and US (B) germplasms for each maturity groups; population structure of the Brazilian (C) and the US (D) genetic basis arranged according to their maturity groups.
The results demonstrate that both genetic bases had few increases in genetic distance among modern genetic materials (releases after 2000) when compared to cultivars from the 1950s to 1970s (Supplementary Table S2). According to the IBS genetic distance mean, the Brazilian genetic base was more diverse over the decades compared to US germplasm especially when we compared cultivars released before the 1970s and released after the 2000s (Supplementary Table S2).
Average assignment coefficients (Q1, Q2, Q3, and Q4) from genetic structure results were calculated for both germplasm pools. All accessions were sorted according to their origin and decade of release (Fig.3). We observed high genomic modifications over the decades in the Brazilian germplasm. Modern genetic materials (20002010) had Q1, Q2, Q3, and Q4 values of 36.8%, 2.3%, 31.7%, and 26.0%, respectively, while old accessions (1950-1960s) had means of Q1, Q2, Q3, and Q4 equal to 1.6%, 6.6%, 7.0%, and 84.7%, respectively. A high decrease was observed for Q4 starting in the 1990s whereas Q1 and Q3 highly increased during the same period. For the US genetic base, we observed an increase of Q3 and a decrease of Q2 over time. Old cultivars (19501970) had Q1, Q2, Q3, and Q4 values of 36.0%, 33.7%, 12.3%, and 18.1%, respectively, while modern cultivars (20002010) had Q1, Q2, Q3, and Q4 of 24.3%, 17.5%, 40.3%, and 17.8%, respectively.
Mean assignment coefficients of the Brazilian and US cultivars belonging to the different decades of release (1950 to 2010) to STRUCTURE groups (Q1, Q2, Q3, and Q4) considering K=4.
Modification during the 1990s became more evident upon analysis of the PCA and genetic structure results of the Brazilian genetic base considering the decades of release (Fig.4A and C). We observed an increase in the influence of the Q2 in modern genetic materials (20002010) when we compared the results to old genetic materials (19501970). In contrast, the US genetic base showed few variations over time according to the average of genetic distance (Supplementary Table S2), PCA, and the exclusive population structure analysis (Fig.4B and D). These results suggest a large influence of new alleles in the Brazilian germplasm after the 1990s.
Population structure of Brazilian and US cultivars according to their decade of release. Principal component analysis (PCA) within Brazilian (A) and US (B) germplasm for each decade; population structure of the Brazilian (C) and the US (D) genetic bases arranged according to their decade of release.
Seventy-two SNPs with FST0.4 between Brazilian and US cultivars were identified (Supplementary Table S3). These SNPs are located on chromosomes 1, 4, 6, 7, 9, 10, 12, 16, 18, and 19 (Supplementary Figure S2). Twenty-six 100-Kbp genomic regions with a high degree of diversification between Brazilian and US genetic bases were also found (Table 1). The results for Tajimas D showed that these regions had balancing events that maintained the diversity of their bases. Two regions on chromosome 6 (47.3 47.4 Mbp and 47.347.4 Mbp) and another on chromosome 16 (31.1031.20 Mbp) had few variations in Brazilian accessions (Supplementary Table S4). In contrast, the allele distribution for most of the SNPs present in these genomic regions in US germplasm was higher compared to Brazilian germplasm. An opposite scenario was observed for the other three regions located on chromosomes 7 (6.30 6.40 Mbp), 16 (30.70 30.80), and 19 (3.00 3.10) (Supplementary Table S4). The allele variance was higher in the Brazilian genetic base than US germplasm for these three intervals.
Six SNPs located close to maturity loci E1 (Chr06: 20,207,077 to 20,207,940bp)14, E2 (Chr10: 45,294,735 to 45,316,121bp)15, and FT2a (Chr16: 31,109,999 to 31,114,963)16 had a large influence on the differentiation of the Brazilian and US genetic bases (Fig.5). For the SNPs ss715607350 (Chr10: 44,224,500), ss715607351 (Chr10: 44,231,253), and ss715624321 (Chr16: 30,708,368), we found that the alternative allele was barely present in US germplasm whereas the Brazilian genetic base had an equal distribution between reference and alternative alleles. When we examined the SNPs ss715624371 (Chr16: 31,134,540) and ss715624379 (Chr16: 31,181,902), the frequency of the alternative allele remains low in the US germplasm. However, the alternative alleles of these two SNPs were present in more than 78% of the Brazilian accessions in contrast to the previous three SNPs. Finally, the alternative allele for SNPs ss715593836 (Chr06: 20,019,602) and ss715593843 (Chr06: 20,353,073) were extremely rare in Brazilian germplasm with only 2% of the accessions carrying them. In contrast, the US germplasm had an equal distribution of reference and alternative alleles in their accessions. However, all accessions with the alternative alleles belonged to MGs lower than VI with less than five cultivars in MG V.
The allele frequency distribution for SNPs close to loci (A) E1 (chromosome 6), (B) E2 (chromosome 10), and (C) FT2a (chromosome 16) in Brazilian and US germplasms.
Ten SNPs were identified related to the genes modifier mutations present in Brazilian and US germplasm; these were distributed on chromosomes 4, 6, 10, 12, 16, and 19 (Supplementary Table S5). These SNPs had differing allele frequencies and could distinguish both genetic bases. Six modifications had a clear influence on the maturity of the accessions whereas two of these had a large influence in some decades of breeding (Supplementary Figure S3). The SNP ss715593833 had a similar haplotype as two SNPs described as close to the E1 loci (ss715593836 and ss715593843) due to the linkage disequilibrium (LD) among them. At the end of this chromosome, we also observed another three relevant SNPs in LD: ss715594746, ss715594787, and ss715594990. In the US germplasm, we observed a decrease in the alternative allele in accessions with MG values lower than IV. We detected other relevant modifications on chromosome 12 for SNPs ss715613204 and ss715613207. Both SNPs had a minor allele frequency higher than 0.35 in Brazilian germplasm with an increase in the alternative allele in cultivars with MGs higher than VII. In contrast, alternative alleles for both SNPs were extremely rare in the US germplasm except for accessions with MG higher than VII.
There were 312 genomic regions that differentiate northern (00 IV MG) and southern (V IX MG) cultivar groups (Supplementary Table S6), which included the Dt1 locus. We compared the SNPs observed in the genomic region close to the Dt1 gene (Chr19: 45.2045.30 Mbp) with the growth habit phenotype data available for 284 lines at the USDA website (www.ars-grin.gov). The phenotypic data suggests that these SNPs are associated with growth habit. Moreover, our diversity analysis demonstrated a putative selective sweep for the Dt1 gene in the northern germplasm, which has the dominant loci fixed for Dt1; the southern lines tend to be more diverse compared to the northern US cultivars (Supplementary Table S7). In contrast, other genomic regions have lower nucleotide diversity in southern accessions compared to the northern accessions. An important disease resistance gene cluster was observed on chromosome 13 bearing four loci: Rsv1, Rpv1, Rpg1, and Rps317,18,19,20. In this interval, we observed two genomic regions (29.70 29.80 Mbp and 31.90 32.00 Mbp) under putative selective sweeps in the southern germplasm (Supplementary Table S8).
Besides these regions, 1,401 SNPs with FST values higher than 0.40 between northern and southern US cultivars were also identified (Supplementary Table S9). In addition, there were 23 SNPs with FST values higher than 0.70 spread on chromosomes 1, 3, 6, and 19. Seven of them were located close to another important soybean locus: E1 (involved in soybean maturity control) (Supplementary Table S10). These SNPs clearly differentiate northern and southern US cultivars with the reference allele fixed in northern genetic materials, and the alternative alleles in southern accessions. Gene modification in US germplasm was also detected in our study. One hundred twenty-six SNPs were identified in FST analysis modifying 125 genes (Supplementary Table S11).
Finally, we detected 1,557 SNPs with FST values higher than 0.40 between super-early cultivars (00 0 MG) and early cultivars (III IV MG) (Supplementary Table S12). Seventeen SNPs had FST values higher than 0.70 spread on chromosomes 4, 7, 8, and 10. The SNPs identified on chromosome 10 were close to the E2 locus. We also detected 168 SNPs associated with modifications in 164 genes (Supplementary Table S13).
We observed two SNPs with large differences in allelic frequencies in the Brazilian germplasm (Supplementary Figure S4). On chromosome 4, SNP ss715588874 (50,545,890bp) had a decrease of the allele A in cultivars released after 2000 with only nine of the 45 Brazilian cultivars with this allele. A similar situation was observed on chromosome 19 for ss715633722 (3,180,152bp) with half of the modern accessions having the presence of allele C. Both SNPs had similar distribution according to their decades in the US genetic base with a large influence of reference alleles.
There were 126 genomic regions spread on almost all soybean chromosomes in Brazilian cultivars. The only exception was chromosome 20 (Supplementary Table S14). Our analysis between cultivars released before and after 1996 identified 30 putative regions under breeding sweep events. Thirteen regions had a decrease in diversity in modern genetic cultivars according to Tajimas D and results. Two genomic regions observed were close to important disease resistance loci: one on chromosome 13 (30.30 30.40 Mbp) close to the resistance gene cluster (with Rsv1, Rpv1, Rpg1, and Rps3)17,18,19,20 and another on chromosome 14 (1.70 1.80 Mbp) with a southern stem canker resistance loci21,22. In contrast, thirty-one genomic regions had an increase in diversity in modern cultivars, which suggested putative introgression events in these accessions. Two genomic regions were observed, on chromosome 2 (40.90 40.10 Mbp) and 9 (40.3040.40 Mbp). Thesewere previously reported to have an association with ureide content and iron nutrient content, respectively23,24.
Besides these regions, there were also 409 SNPs with FST values higher than 0.40, distributed across all soybean chromosomes. There were 73 SNPs with FST values higher than 0.70 (Supplementary Table S15). Some of these SNPs were also reported to be associated with important soybean traits such as plant height, seed mass, water use efficiency, nutrient content, and ureide content23,24,25,26,27.
We also identified gene modifications with a high impact on the Brazilian genetic base when we compared cultivars according to their decade of release. Of the 409 SNPs identified in FST analysis, we observed 40 SNPs causing modifications in 39 soybean genes (Supplementary Table S16). Three SNPs with FST values higher than 0.70 were associated with non-synonymous modifications: ss715588896 (Glyma.04G239600 a snoaL-like polyketide cyclase), ss715607653 (Glyma.10g051900 a gene with a methyltransferase domain), and ss715632020 (Glyma.18G256700 a PQQ enzyme repeat).
Follow this link:
Genetic relationships and genome selection signatures between soybean cultivars from Brazil and United States after decades of breeding | Scientific...