通过全基因组测序数据进行脊髓性肌萎缩症诊断和携带者筛查

Xiao Chen and Michael A Eberle

Introduction

Spinal muscular atrophy (SMA), an autosomal recessive neuromuscular disorder characterized by loss of alpha motor neurons, causes severe muscle weakness and atrophy presenting at or shortly after birth¹. SMA is the leading genetic cause of infant death after cystic fibrosis. The incidence of SMA is 1 in 6000-10,000 live births, and the carrier frequency is 1:40-80 among different ethnic groups^2–4. Early detection of SMA can be critical for long term quality of life due to the availability of two early treatments, Nusinersen⁵ and Zolgensma⁶, which have received FDA approval for the amelioration of SMA symptoms.

The disease-causing gene, SMN1, together with its paralog SMN2, reside in a ~2Mb region on chromosome 5 with a large number of complicated segmental and inverted segmental duplications. SMN2 was created by an ancestral gene duplication that is unique to the human lineage⁷. The genomic region around SMN1 and SMN2 is subject to unequal crossing-over and gene conversion, resulting in variable copy numbers (CNs) of SMN1 and SMN2. Importantly, SMN2 has >99.9% sequence identity to SMN1 and one of the base differences, c.840C>T in exon 7, has a critical functional consequence. By interrupting a splicing enhancer, c.840T promotes skipping of exon 7, resulting in the vast majority of SMN2-derived transcripts being unstable and not fully functional⁸. Approximately 95% of SMA cases result from biallelic absence of the functional c.840C nucleotide caused by either a deletion of SMN1 or gene conversion to SMN2 (c.840T)⁹. In the remaining 5% of SMA cases, patients also have other pathogenic variants in SMN1¹⁰. SMN2 can produce a small amount of functional protein, and the number of SMN2 copies in an individual is inversely correlated with the disease severity¹¹.

Due to the high incidence rate and disease severity, population-wide SMA screening is recommended by the American College of Medical Genetics. The key to screening for SMA is: 1) determining the copy number of SMN1 for SMA diagnosis and carrier testing and 2) determining the copy number of SMN2 for clinical classification and prognosis. Traditionally, SMA testing and carrier testing are done with polymerase chain reaction (PCR) based assays, such as quantitative PCR (qPCR), multiplex ligation-dependent probe amplification (MLPA) and digital PCR. These methods primarily determine the copy number of SMN1 based on the c.840C>T site that differs between SMN1 and SMN2. Enabling SMA testing with WGS would greatly benefit precision medicine initiatives, but it faces challenges including the almost perfect sequence identity between SMN1 and SMN2, and frequent gene conversion between the two genes leading to hybrid genes. These challenges demand an informatics method specifically designed to overcome the difficulties of this region.

To solve this problem, we developed SMNCopyNumberCaller¹², a novel method that detects the CN of both SMN1 and SMN2 based on WGS data. This method was developed using population data from the 1000 Genomes Project¹³ (1kGP). In our publication of this method, we characterized SMN1 and SMN2 in 12,747 genomes, identified 1568 samples with SMN1 gains or losses and 6615 samples with SMN2 gains or losses, and calculated a pan-ethnic carrier frequency of 2%, consistent with previous studies. Additionally, 99.8% of our SMN1 and 99.7% of SMN2 CN calls agreed with orthogonal methods, with a recall of 100% for SMA and 97.8% for carriers, and a precision of 100% for both SMA and carriers. This caller enables SMA testing to be offered as a comprehensive test in neonatal care and an accurate carrier screening tool in WGS sequencing projects.

Here, we describe how, by using multi-ethnic samples, we were able to identify higher genetic variability in African populations and exclude variable sites that cannot reliably differentiate SMN1 from SMN2, optimizing accuracy across all populations. This highlights the importance of using ethnically diverse populations when developing novel informatic methods. Additionally, we describe and present a visualization tool that produces static images which allow users to review the evidence supporting the copy number calls made in these genes. This information will be an essential tool for clinical labs that wish to implement WGS-based SMA calling.

SMNCopyNumberCaller

To address the challenge of high sequence similarity throughout the two genes, SMNCopyNumberCaller first calculates the summed copy number of SMN1 and SMN2, collectively termed SMN, by analyzing the sequencing coverage of reads aligned to either gene. Since there exists a truncated form of SMN2, termed SMN2∆7-8, with exons 7-8 deleted, we calculate the copy number of intact and truncated SMN forms by dividing the genes into two regions: the 22.2kb region that includes exons 1-6 and the 6.3kb region that includes exons 7-8. The CN calculated from the exons 7-8 region provides the number of intact SMN genes. Samples with SMN2∆7-8 have a higher CN of the exons 1-6 region compared to the CN of the exons 7-8 region, and this difference represents the CN of SMN2∆7-8 (Figure 1).

Copy numbers of total and intact SMN (SMN1+SMN2) are calculated from read depth — Figure 1. Copy numbers of total and intact *SMN* (*SMN1* + *SMN2*) are calculated from read depth

After calculating the summed copy number, we differentiate SMN1 from SMN2 using supporting read counts at base differences between SMN1 and SMN2. The individual CN of SMN1 (SMN2) at each site is calculated by considering the summed SMN CN and the fraction of SMN1 (SMN2) supporting reads out of all SMN1+SMN2 supporting reads. During the development of the caller, we called the CNs of SMN1 and SMN2 at the 16 base difference sites between them, extracted from the reference genome, in 1kGP samples, and determined if the CN calls for each position were concordant with the CN calls at the c.840C>T splice variant site. There was a notable difference between the concordance of calls in the African and non-African populations (Figure 2). Excluding the African samples, there were 13 sites that had high (>85%) CN concordance with the splice variant site. Conversely, for the African samples there were only seven sites that had high CN concordance with the splice variant site, and the concordance values were lower at all sites than in non-African populations. This is consistent with within-gene variation at many of these positions in the African population. This analysis highlights the importance of using ethnically diverse populations when developing novel informatic methods to resolve difficult clinically relevant regions of the genome.

We selected eight SMN1/2 base differences, including the splice variant site and the seven positions that were highly concordant with the splice variant site in both African and non-African populations. By selecting just these sites, this caller should perform consistently, independent of ethnicity. SMNCopyNumberCaller makes SMN1 and SMN2 CN calls based on the consensus of CN calls at the eight selected sites.

Figure 2. Multi-ethnic analysis shows difference between populations

Visualizing the caller result

An important component of variant calling in clinical settings is the need to review the supporting evidence when signing out a clinical report. Since the release of SMNCopyNumberCaller, we have developed a visualization tool to produce static images that represent the data and QC the calls (Figure 3). The summed CNs of total (exons 1-6, Figure 3A) and intact (exons 7-8, Figure 3B) SMN (SMN1+SMN2) are plotted against the population distribution. The difference between the total and intact SMN CN represents the CN of SMN2∆7-8. Individual CNs of SMN1 and SMN2 are calculated based on the summed intact CN and the supporting read counts at eight base differences between SMN1 and SMN2 (#7-8 and #10-15, Figure 3C). Figure 3D provides a rough estimate of the SMN1 and SMN2 copy numbers based purely on read counts at the differentiating sites.

Figure 3. Visualization of SMNCopyNumberCaller result, using HG03458 as an example

Performance validation

To demonstrate the accuracy of this method, we compared CN calls using digital PCR and MLPA with our WGS-based calls and showed a concordance of 99.8% for SMN1 and 99.7% for SMN2, with a recall of 100% for SMA and 97.8% for carriers, and a precision of 100% for both SMA and carriers (Table 1).

Table 1. Validation against samples with known SMN1/SMN2 copy numbers.

	CN by orthogonal method	Total	Concordant	Discordant	Agreement
SMN1	0	64	64	0	100.0%
	1	45	44	1	97.8%
	2	897	897	0	100.0%
	3	174	174	0	100.0%
	4	43	43	0	100.0%
	6	1	0	1	0.0%
	Total	1224	1222	2	99.8%
SMN2	0	117	117	0	100.0%
	1	486	465	1	99.8%
	2	541	539	2	99.6%
	3	60	60	0	100.0%
	4	9	8	1	88.9%
	Total	1193	1189	4	99.7%
SMN2∆7-8	0	1089	1089	0	100.0%
	1	80	80	0	100.0%
	2	4	4	0	100.0%
	Total	1173	1173	0	100.0%

Copy number of SMN1, SMN2 and SMN2∆7-8 by population

We applied SMNCopyNumberCaller to 2504 unrelated samples from the 1000 Genomes Project (1kGP) and 10,243 unrelated samples from the NIHR BioResource Project¹⁴ to report on the population distributions of SMN1 and SMN2 copy numbers (Figure 4). The carrier frequencies for SMA (samples with one copy of SMN1) using this method agreed with those reported by previous PCR-based studies^2,4. Particularly, the variability of SMN1 copy number is much lower than that of SMN2 copy number in most populations, and Africans have a much higher SMN1 copy number than other populations.

Figure 4. Distribution of SMN1/SMN2/SMN2Δ7–8 copy numbers in diverse populations. — Figure 4. Distribution of *SMN1/SMN2/SMN2*Δ7–8 copy numbers in diverse populations.

Summary

Our SMNCopyNumberCaller can be used to identify both carrier and affected status of SMA, enabling SMA testing to be offered as a comprehensive test in neonatal care and an accurate carrier screening tool in WGS sequencing projects. While there exist difficult regions in the genome where normal WGS pipelines do not deliver variant calls, here we demonstrate the ability to apply WGS paired with a targeted informatics approach to solve one such difficult region. WGS provides a valuable opportunity to assess the entire genome for genetic variation and the continued development of more targeted informatics solutions for difficult regions with WGS data will help bring the promise of personalized medicine one step closer to a reality.

Acknowledgements

We thank our co-authors Alba Sanchis-Juan, Courtney French, Isabelle Delon and Lucy Raymond at University of Cambridge, Andrew Connell and Matthew Butchbach at Nemours Alfred I. duPont Hospital for Children, as well as Zoya Kingsbury, Aditi Chawla, Aaron Halpern, Ryan Taft and David Bentley at Illumina. We thank Andrew Warren at Illumina for developing the visualization tool.

External links

Publication: https://www.nature.com/articles/s41436-020-0754-0
Software: https://github.com/Illumina/SMNCopyNumberCaller

References

Lunn MR, Wang CH. Spinal muscular atrophy Lancet. 2008; 371(9630):2120–2133.
Sugarman EA, Nagan N, Zhu H, Akmaev VR, Zhou Z, Rohlfs EM, et al. Pan-ethnic carrier screening and prenatal diagnosis for spinal muscular atrophy: clinical laboratory analysis of >72 400 specimens. Eur J Hum Genet. 2012;20(1):27–32.
MacDonald, WK, Hamilton, D, Kuhle S. SMA carrier testing: a meta-analysis of differences in test performance by ethnic group. Prenat Diagn. 2014;34(12):1219-1226.
Hendrickson BC, Donohoe C, Akmaev VR, Sugarman EA, Labrousse P, Boguslavskiy L, et al. Differences in SMN1 allele frequencies among ethnic groups within North America. J Med Genet. 2009;46(9):641–644.
Finkel RS, Chiriboga CA, Vajsar J, Day JW, Montes J, De Vivo DC, et al. Treatment of infantile-onset spinal muscular atrophy with nusinersen: a phase 2, open-label, dose-escalation study. Lancet. 2016;388(10063):3017–3026.
Mendell JR, Al-Zaidy S, Shell R, Arnold WD, Rodino-Klapac LR, Prior TW, et al. Single-dose gene-replacement therapy for spinal muscular atrophy N Engl J Med. 2017;377(18);1713–1722.
Rochette CF, Gilbert N, Simard LR. SMN gene duplication and the emergence of the SMN2 gene occurred in distinct hominids: SMN2 is unique to Homo sapiens. Hum Genet. 2001;108(3):255–266.
Lorson CL, Hahnen E, Androphy EJ, Wirth B. A single nucleotide in the SMN gene regulates splicing and is responsible for spinal muscular atrophy. Proc Natl Acad Sci U S A. 1999;96(11):6307–6311.
Wirth B. An update of the mutation spectrum of the survival motor neuron gene (SMN1) in autosomal recessive spinal muscular atrophy (SMA). Hum Mutat. 2000;15(3):228–237.
Burghes AH, Beattie CE. Spinal muscular atrophy: why do low levels of survival motor neuron protein make motor neurons sick? Nat Rev Neurosci. 2009;10(8):597-609.
Butchbach ME. Copy number variations in the survival motor neuron genes: implications for spinal muscular atrophy and other neurodegenerative diseases. Front Mol Biosci. 2016;3:7.
Chen X, Sanchis-Juan A, French CE, Connell AJ, Delon I, Kingsbury Z, et al. Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data. Genet Med. 2020;22(5):945-953.
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526(7571):68-74.
Turro E, Astle WJ, Megy K, Gräf S, Greene D, Shamardina O, et al. Whole-genome sequencing of patients with rare diseases in a national health system. Nature. 2020;583(7814):96-102.

上一篇文章

下一篇文章

NovaSeq X 创新路线图

因美纳 5-碱基解决方案

NGS 流程助手——现已支持肿瘤工作流

Illumina Connected Multiomics

基于新一代测序的蛋白质组学服务

创新路线图

创新路线图

创新路线图

创新路线图

创新路线图

创新路线图

创新路线图

SOMAmer 蛋白质组学的未来

SOMAmer 蛋白质组学的未来

SOMAmer 蛋白质组学的未来

SOMAmer 蛋白质组学的未来

SOMAmer 蛋白质组学的未来

SOMAmer 蛋白质组学的未来

SOMAmer 蛋白质组学的未来

MiSeq i100 系列

MiSeq i100 系列

MiSeq i100 系列

MiSeq i100 系列

MiSeq i100 系列

MiSeq i100 系列

深入探索癌症，助力精准测试

深入探索癌症，助力精准测试

深入探索癌症，助力精准测试

深入探索癌症，助力精准测试

深入探索癌症，助力精准测试

Illumina COVIDSeq Test

Illumina COVIDSeq Test

Illumina COVIDSeq Test

Illumina COVIDSeq Test

基因组与甲基化组一次检测同时获得

基因组与甲基化组一次检测同时获得

基因组与甲基化组一次检测同时获得

基因组与甲基化组一次检测同时获得

基因组与甲基化组一次检测同时获得

分秒必争。无PCR的新制备方法可加快全基因组测序

分秒必争。无PCR的新制备方法可加快全基因组测序

分秒必争。无PCR的新制备方法可加快全基因组测序

分秒必争。无PCR的新制备方法可加快全基因组测序

Illumina COVIDSeq Test

Illumina COVIDSeq Test

Illumina COVIDSeq Test

Illumina COVIDSeq Test

Illumina COVIDSeq Test

基因组与甲基化组一次检测同时获得

基因组与甲基化组一次检测同时获得

基因组与甲基化组一次检测同时获得

基因组与甲基化组一次检测同时获得

基因组与甲基化组一次检测同时获得

Hear about VeriSeq NIPT from Our Customers

Hear about VeriSeq NIPT from Our Customers

Hear about VeriSeq NIPT from Our Customers

Hear about VeriSeq NIPT from Our Customers

时间就是生命—全新PCR-Free Prep建库试剂加速全基因组测序

时间就是生命—全新PCR-Free Prep建库试剂加速全基因组测序

时间就是生命—全新PCR-Free Prep建库试剂加速全基因组测序

时间就是生命—全新PCR-Free Prep建库试剂加速全基因组测序

因美纳实验流程解决方案

通过全基因组测序数据进行脊髓性肌萎缩症诊断和携带者筛查

Introduction

SMNCopyNumberCaller

Visualizing the caller result

Performance validation

Copy number of SMN1, SMN2 and SMN2∆7-8 by population

Summary

Acknowledgements

External links

References