Recent papers prompted me to write about this issue some more - this is building on the April 23 post.
Will your CRISPR target work efficiently? First, does Cas9 “prefer” any bases within the protospacer sequence?
1. Wang et al, Science 2014, examined a library of protospacers to determine bases that were either generally enriched on Cas9 after loading of the guide RNA, or generally depleted from the cellular guide RNA pool after loading, which in theory should give similar results. They found (Fig. 3F) that U in positions -4, -2 and -1 (relative to the PAM) lowered Cas9 affinity, and similarly reduced ability to be depleted from the free guide RNA pool. G’s in these positions had the opposite effect. C in position -3 helped both loading and depletion of free guide RNAs. These are trends, but clearly not rules, as simple examination of many published, highly efficient protospacers show obvious exceptions to those data. Moreover, the trends were not observed in the paper below.
2. The new paper from Gagnon et al, PLoS ONE 2014, examines bases preferences in slightly different way, by actually sequencing mutations caused by Cas9 cleavage. This data does not support most of the base preferences as I interpret from Wang et al above, except that a G in the -1 position relative to the PAM (= position #21 in the Gagnon paper Fig 1) had a positive effect on indel generation. These papers used different systems (cell culture vs fish injections).
Some more important points:
3. GC richness overall is a good thing for protospacer effect. See Gagnon et al Fig 1B. This general observation had been reported before, but that’s a nice data figure. Finally - a reason to be happy that your gene is so GC rich!
4. More about extending the 5' end of the protospacer: This one bears repeating. Ran et all Cell 2014 - one of the first “Nickase” papers - actually tested extending the 5’ sequencing of the protospacer portion of the guide RNA, in hopes it might increase efficiency and/or, crucially, specificity. That would be nice, wouldn’t it? Alas, adding more 5’ bases doesn’t seem to have any effect (Fig 1A,B). But a take-home is this: Since the RNA Pol III requires G to initiate, simply add a G to the 5’ end of the protospacer if one doesn’t exist already. In other words, there is no need to choose protospacers that have a G at the 5’ end. Just add it if it’s not there. Bam - I just increased your target choices 4 fold (you’re welcome). Now, truth be told, many target choice search programs don’t require a G in the first position anyway. But they may not suggest that you add the G, which will enhance transcription of the guide RNA.
Caveat: while shortening the protospacer 5’ end increases specificity, if you do use that strategy, you do need to select targets with an actual G base at the truncated 5’ end of the protospacer genomic target. Otherwise - if you just add a G base to the truncated PS - you will be adding it within the spacing of a normal length protospacer, thus creating a mismatch. This would likely be tolerated at the 5’ end of a 20 base protospacer - as was reported by Gagnon et al - but maybe not for a shorter one - this is yet another thing that needs testing, but would be a useful tweak.