Tuesday, October 28, 2014

Poly Peak Parser can be useful for identifying new #CRISPR indels in F1s using PCR + Sanger chromatogram data.

Thank you to my colleague Max for pointing this out to me.  If you are doing CRISPR on mice, fish or whatever creatures you are working one, and you are direct-Sanger-sequencing PCR products from #CRISPR animals, you know that animals with more than one type of allele will generate confusing, overlapping sequence traces usual extending past the CRISPR cut site.  This is because CRISPR (or TALENS and ZFNs for that matter) usually generates indel mutations.   Although the sanger data is fine to confirm something got altered by CRISPR, the overlapping peaks make it hard to identify exactly what the indel is.     Poly Peak Parser is an easy to use web interface for pulling the alternate allele (e.g. the newly generated indel) out of an .abi or .scf file that has double peaks due to indel heterozygosity.  It is designed to be used on PCR Sanger data from F1 heterozygote animals.  

Poly peak parser: Method and software for identification of unknown indels using sanger sequencing of polymerase chain reaction products.Hill JT, Demarest BL, Bisgrove BW, Su YC, Smith M, Yost HJ.Dev Dyn. 2014 Aug 27. doi: 10.1002/dvdy.24183. [Epub ahead of print]         

Web tool:   http://spark.rstudio.com/yostlab/PolyPeakParser/

The catch is that you have to supply the reference allele sequence (such as wild type), and it really only works well if there are 2 and only 2 alleles embedded in the sanger data, one of which is the reference sequence.  Then it will extract the alternate allele from the double peak data.    I tried it out using sanger data from a mouse that was confirmed to be a heterozygote for wild type allele + a new, short deletion; Poly Peak Parser quickly returned the alternate allele confirming the 1 bp deletion.   

In practice, however, founder animals from CRISPR injections are usually not simply heterozygous for wild type and a new indel allele.  They usually have at least two mutated alleles and sometimes more, if they are mosaic.  As the authors of this tool state in their paper, Poly Peak Parser is really designed for analyzing F1 animals.   So here is a suggested workflow:

1.  If you have sanger files from PCRs of founder animals and they clearly have double peaks, try inputting the .scf along with a wild type reference sequence into Poly Peak Parser and see if it returns an alternate allele that looks like it mostly has unambiguous base calls.   

2. If the "alternate allele" has lots of ambiguous bases, the animal may be mosaic, or simply has two new but distinct mutations.

Either way, breed founder animals to wild type to get F1s and the data will be much more clear.

