1 Running PinAPL-Py


Step 1: SET UP A RUN






1.2.1 Parameters

1.2.2 Uploading a custom library

2 Description of the PinAPL-Py Analysis output

2.1 Enrichment/Depletion

2.2 Statistics

2.3 Scatter Plots

2.4 Heatmap

2.5 Run Info

3 References

1 Running PinAPL-Py


Step 1: SET UP A RUN

Enter a project name for your analysis run. This name will help you identify your results in case you do multiple runs in a row. Provision of an email address is optional, but will let you safely close the browser during the analysis and receive a notification after completion.


Upload your files via the drag-and-drop frame. Uncompressed format (.fastq) is supported, but compressed (.fastq.gz) is recommended.


Enter the name of the condition each file represents. Files representing replicates of the same condition have to be given the same name. Do not number your replicates. Numbering is done automatically by the program and displayed on the results page after completion of the analysis.

Please mark all control replicates with the checkbox to the right.


First, choose the screen type. Choose between “enrichment” (e.g. a drug resistance screen) or “depletion” (e.g. a gene-essentiality screen), depending on whether your screen aims at finding sgRNAs of high or low abundance, respectively.Next, choose the sgRNA library used in your screen from the dropdown menu. If your screen uses a library not present in the list or a custom library, see “Uploading a custom library” in the Advanced Options below.

Optional: If you would like to edit the default parameter settings, click Advanced Options. For instructions on these parameters, see “Parameter description” in the Advanced Options section.


You can follow the program’s execution log by refreshing the page repeatedly. In case another run was started shortly before yours, your run will be queued and start after completion of the previous.

If you provided an email address, you can close the browser; you will be notified by email and sent a link to the results after completion. Otherwise, please leave the progress screen open.

The results will remain on the server for 5 days. You can download all content shown on the results page in a single ZIP archive.


1.2.1 Parameters


sgRNA Sequence Length (default = 20)

The length of your sgRNA sequence in the reads.

Adapter error rate (default = 0.1)

Error rate (mismatches and indels) allowed for the identification of the 5’ adapter (Refer to the cutadapt manual for more details). Increasing this rate can help to control for poor sequence quality.

Matching threshold (default = 40)

Minimal alignment score required to consider a read successfully matched. For a perfect match this must be double the sgRNA sequence length (Refer to the Bowtie2 manual for more details on calculation of the alignment score). Decreasing this threshold will include reads with a less than optimal match to a library entry which can be helpful to increase sensitivity or control for sequence quality.

Ambiguity threshold (default = 2):

Minimum tolerated difference between primary (best) and secondary (second-best) alignment to consider a read successfully matched. Reads with a difference lower than this threshold will be considered ambiguous and discarded. Increasing this threshold increases stringency. Decreasing this threshold increases sensitivity. With a threshold of 0, the program will accept reads even if they match multiple library entries equally well.

Seed length (default = 11):

Seed length parameter for Bowtie2 alignment (-L, refer to the Bowtie2 manual for more details). Changing this parameter is generally not required.

Seed number (default = 1):

Number of allowed mismatches for Bowtie2 seed alignment (-N, refer to the Bowtie2 manual for more details). Changing this parameter is generally not required.

Seed interval function (default = ‘S,1,0.75’):

Bowtie2 seed interval function (-i, refer to the Bowtie2 manual for more details). Changing this parameter is generally not required.


Normalization: (default = ‘cpm ’):

Method of read count normalization.

Cutoff (default = 0):

Cutoff threshold (given in cpm) to filter out low sgRNA counts. sgRNAs with counts lower than the cutoff will be set to 0 counts. If low counts are of minor interest for the experiment (e.g. in an enrichment screen), this can be helpful to reduce noise in the data.

Round counts (default = No):

Round counts after normalization to avoid fractional counts. Rounding only affects visualization, but not significance analysis.


Gene Metric (default = "αRRA:"):

Method to combine the sgRNA enrichment/depletion data for ranking of genes:

For more details on these methods, please refer to the original publications.

Number of permutations (default = 1000):

Number of permutations for p-value estimation of the gene ranking score. CAUTION: STARS is more computationally demanding than aRRA, so reducing the number of permutations is recommended in this case.

sgRNA percentage (STARS only) (default = 10):

Percentage of sgRNAs to be included in the ranking analysis. Only relevant if “STARS ”method is chosen.

P0 (aRRA only) (default = 0.0005):

Critical p-value for individual sgRNAs to be included in the ranking analysis. Only relevant if “aRRA” method is chosen.


Significance level (sgRNAs) (default = 0.001)

Significance threshold for the fold-change enrichment/depletion of sgRNAs.

Significance level (genes) (default = 0.01)

Significance threshold for the gene ranking score.

p-value adjustment (default = ‘fdr_bh’):

Method for p-value adjustment for multiple tests.


Cluster by… (default = ‘variance’):

Criterion for sample clustering.

Number of sgRNAs for clustering (default = 25):

Specify how many sgRNAs are used for clustering with the method selected above. In case of clustering by counts, the top x sgRNAs from each sample are combined.


Dotsize (default = 10):

Size of dots in replicate scatterplots.

Transparency level (default = 0.1):

Transparency of points in scatterplots. A low level is helpful to visualize density.

sgRNA annotation (default = No):

Annotate sgRNA with their IDs when highlighting individual genes in scatterplots.

Highlight non-targeting controls (default = No):

Highlight non-targeting control sgRNAs in scatterplots.

Table format (default = Text only):

File format for sgRNA and gene tables in the download archive. Use “Text only” for optimal workflow speed. Text files (.tsv) can be manually opened and converted with Excel. Use “Excel” to have the workflow automatically convert all text tables into .xlsx format (WARNING: This increases computation time).

PNG resolution (default = 300):

Resolution for PNG output (dpi).

1.2.2 Uploading a custom library:

Prepare your library file (e.g. in Excel) as a spreadsheet with 3 columns (with headers):

You can choose other header names for these columns. See example below.

gene_ID sgRNA_ID Seq

Save the spreadsheet as either tab-separated format (.tsv) or comma-separated format (.csv). You can use the "Save As" menu item in Excel to do this.

Use the file browser to select and upload your library file.

Next, specify the following parameters:


Enter the sequence of the 5’-adapter. Adapters are simply sequences lying 5’ or 3’ of the 20bp sgRNA. There are no restriction to length of your adapter definition, but it is generally recommended to define the 20-25 bp immediately 5’ of the sgRNA sequence (see image below). Also, it is recommended to let the adapter sequence end in an ‘N’ to allow possible mismatches (see example below). A sequence mapping program like SnapGene Viewer is helpful to define the adapter. Definition of the 3’ adapter is not necessary.

Example: If your reads have the following structure


you can, for example, define TCTTGTGGAAAGGACGAAACACCN as the 5’-adapter.

Identifier for non-targeting controls:

If your library contains non-targeting controls, enter an identifier in the library spreadsheet to define sgRNAs containing non-targeting controls. The identifier is a part of the gene_ID that is unique to the non-targeting controls (see example below). If your library does not contain non-targeting controls, enter “none”

gene_ID sgRNA_ID Seq
Non_Target_0001 CustomLib34556 ACGGAGGCTAAGCGTCGCAA
Non_Target_0002 CustomLib34557 CGCTTCCGCGGCCCGTTCAA
Non_Target_0003 CustomLib34558 ATCGTTTCCGCTTAACGGCG

Example: An identifier in this case would be “Non_Target”.

Number of sgRNAs per gene:

Specifies the number of sgRNAs targeting a single gene (excluding non-targeting controls, miRNAs and other non-genes in your library).

2 Description of the PinAPL-Py Analysis output

The PinAPL-Py output is structured by logical order into tabs and subtabs on the results page. In addition, all output can be downloaded via the “Download Results Archive” button as a single .zip file. Images are saved both as high-resolution .png as well as as .svg vector graphics which can be further processed in Adobe Illustrator or similar image processing software. Tables are saved as raw text (.tsv), but can be manually opened with Excel and saved as Excel spreadsheets. For convenience, PinAPL-Py can convert tables on-the-fly (see the “Table Format” parameter on the configuration page), at the cost of additional computation time. NOTE for Windows users: To view text files (.txt/.tsv/.csv), Notepad++ is recommended

NOTE: When the analysis is run with two or more replicate samples for a condition, PinAPL-Py will show an additional sample for that condition (named "<condition name>_avg") where results are averaged across the replicates.

2.1 Enrichment/Depletion

Gene Rankings:

This tab contains the results of the gene ranking analysis in a sortable table. The columns are:

Results are sorted by number of significant sgRNAs by default.

NOTE: When the analysis is run with two or more replicate samples for a condition, PinAPL-Py will show an additional sample for that condition (named "<condition name>_combined") where p-values from the individual replicates are combined according to Fisher's method.

sgRNA Rankings:

This tab contains the results of the sgRNA enrichment/depletion analysis. The columns are:


This plot shows information about the overall efficacy of sgRNAs targeting the same gene. Genes are categorized by the number of targeting sgRNAs reaching statistical significance. Genes having no significant sgRNAs are omitted.


This tab contains various plots visualizing the fraction of sgRNAs and genes that reached statistical significance in the ranking.

2.2 Statistics

Read Count Distribution:

This tab contains information about the statistical distribution of sgRNA read counts.

Read Count Dispersion:

This tab shows the distribution of read counts in the control samples. The data shown is used to estimate the parameters for the negative binomial distribution describing the read counts of each sgRNA.


This tab summarizes the read alignment process.

Adapter Trimming:

This tab shows the log of the adapter trimming process, as reported by cutadapt. The output is explained in detail in the cutadapt manual.

Sequence Quality:

This tab contains graphs for sequence quality control (produced by fastqc). For the full fastqc output, click the “See full report” link

Sequencing Depth:

This tab shows the sequencing depth (number of total reads) per sample. Results from the alignment analysis are superimposed on each bar.

2.3 Scatter Plots

Treatment vs Control:

Scatterplots of normalized sgRNA counts in the sample versus the average normalized count in the controls. The fraction reaching significant enrichment/depletion (dependent on screen type) compared to the control is marked in green.

Replicate Correlation:

Scatterplots showing the normalized sgRNA counts in one replicate of each condition versus another. Pearson and Spearman correlation coefficients are reported.

2.4 Heatmap

Clustering of all samples in the dataset, based on to the most variable or most abundant/depleted sgRNAs (as set up on the configuration page). Log10 normalized read counts are color-coded from lowest (yellow) to highest (red).

2.5 Run Info

Output Log:

This shows the program execution log.


This file shows the parameter settings used in the run.

Sample Names

This table linkes file names and sample names (Replicates of the same condition are automatically numbered).


Anders,S. and Huber,W. (2010) Differential expression analysis for sequence count data. Genome Biol. , 11 , R106.

Doench,J.G. et al.  (2016) Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. , 34 , 184 –191.

Li,W. et al.  (2014) MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. , 1 –12.