Pan-Genomic Analysis
المؤلف:
Sunil Thomas
المصدر:
Vaccine Design: Methods and Protocols: Volume 1:Vaccines for Human Diseases
الجزء والصفحة:
p90-91
2025-05-18
631
Apart from being a valuable approach to investigate the characteristics of a specific phylogenetic clade, pan-genomic analysis is indispensable for identifying conserved target proteins within a set of genomes of pathogenic strains within a single clade. The term “pan-genome” was first coined by Tettelin [ 1 ] and is defined as the entire genomic repertoire accessible to the clade studied. It encompasses two subsets: the “core genome” and the “dispensable” or “accessory genome.” While the former describes the intersection of genes (or ORFs) shared by all strains of the clade, the latter comprises genes only found in subsets of strains. Such a classification is biologically meaningful as it allows us to differentiate between (core) genes considered essential for growth, and (accessory) genes encoding, e.g., for supplementary pathways and functions which confer a selective advantage, such as antibiotic resistance or virulence genes that are limited to certain strains [2 ].
Similarity between proteins is usually determined by pairwise alignment. Particular thresholds are set for the percentage of sequence identity of the protein sequence over a percentage of pairwise-aligned sequence length. However, depending on the phylogenetic resolution and the available quality and quantity of genomes, it might be necessary to increase sensitivity. This can be done by incorporating additional methods such as orthology prediction, i.e., the prediction of genes among species or strains that originated by vertical descent from a single gene of their last common ancestor, as well as structural alignments. Relying solely on pairwise sequence alignments, Tettelin [ 1 ] chose a minimum of 50 % identity over 50 % of the sequence lengths, while Hiller [3 ] chose 70 % to identify similar proteins within strains of Streptococcus agalactiae and S. pneumoniae , respectively. For the purpose of identifying target proteins it is nonetheless beneficial to choose considerably higher threshold values to exclude false positives early on in the workflow. The potential loss of immunogenic sequences due to the high threshold values is relatively low. In addition, given the high specificity of the immune system receptors, this is a good trade- off for the reduction of the number of proteins to analyze in subsequent steps.
References
-------------
[1] Tettelin H, Masignani V, Cieslewicz MJ et al (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae : implications for the microbial pan-genome. Proc Natl Acad Sci USA 102:13950–13955
[2] Vernikos G, Medini D, Riley DR et al (2014) Ten years of pan-genome analyses. Curr Opin Microbiol 23C:148–154
[3] Hiller NL, Janto B, Hogg JS et al (2007) Comparative genomic analyses of seventeen Streptococcus pneumoniae strains: insights into the pneumococcal supragenome. J Bacteriol 189:8186–8195
الاكثر قراءة في اللقاحات
اخر الاخبار
اخبار العتبة العباسية المقدسة