Link Your Sites (LYS) Scripts: Automated Search of Protein Structures and Mapping of Sites Under Positive Selection Detected by PAML

The visualization of the molecular context of an amino acid mutation in a protein structure is crucial for the assessment of its functional impact and the understanding of its evolutionary implications. Currently, searches for fast evolving amino acid positions using codon substitution models like those implemented in PAML (Yang and Nielsen in Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 17(1):32–43, 2000; Zhang et al. in Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol 22(12):2472–2479, 2005) are done in almost complete proteomes, generating large numbers of candidate proteins making the analysis of individual protein structures and models very time-consuming. Here we present the package Link Your Sites (LYS) that can be used to reduce the number of analysed targets to those for which structural information can be retrieved. LYS con- sists of two python wrapper scripts, where the first one (i) mines the RCSB database (Berman et al. in The protein data bank. Nucleic Acids Res 28(1):235–242, 2000) using the BLAST alignment tool to find the best matching homologous sequences, (ii) fetches their domain positions by using Prosites (Hamelryck and Manderick in Pdb file parser and structure class implemented in python. Bioinformatics 19(17):2308–2310, 2003; Sigrist et al. in Prosite: a documented database using patterns and profiles as motif descriptors. Brief Bioinf 3(3):265–274, 2002; Sigrist et al. in New and continuing developments at prosite. Nucleic Acids Res 41(D1):D344–D347, 2012), (iii) parses the output of PAML extracting the positional information of fast-evolving sites and transforms them into the coordinate system of the protein structure, (iv) outputs one file per gene with the equivalence among the positions in the input sequence and homologous structure. The second script produces figures to be used in publica- tions highlighting the positively selected sites mapped on regions that are known to have functional relevance.