import igv_notebook
igv_notebook.init()
Basic Rendseq Usage
In this Notebook we run through a simple example of how to use rendseq
for peak calling, demoing both HMM peak fitting and a simple thresholding approach, on some raw reads stored in wig format. These raw reads are included as part of our package and can be found in the Example Data
folder. Note that we will make use of the library igv-jupyter
for visualizations here - but this is not inlcuded as part of rendseq
and will need to be seperately installed.
Step 0 - Misc Installs
For all the rendseq
related functionality - we will install it bit by bit below so it is easier to see what functionality comes from what part of the package. This section is just to import other useful libraries for this demo:
Step 1 - Load data
In rendseq.file_funcs
we have some functions for opening/creating relevant files. We will use that to extract our raw reads from the wig files in Example Data
. Wig files are files with read count-density as a function of nucleotide density. These files are generated from Rend-seq data by recording the 5’-mapping and 3’-mapping read density into two files. In total this means a single Rend-seq experiment gives rise to 4 wig files: the 3’ and 5’ mapping reads for both the forward and reverse strands. The different direction strands and ends can be dealt with seperately - in this tutorial we will focus on the forward strand 5’-end mapping reads.
from rendseq.file_funcs import open_wig
= '../Example_Data/example_wig_Jean_5f_subset.wig'
example_wig = open_wig(example_wig) # chrom information will be used when we save either our z-scores or peaks as wig files again and for the IGV browser:
reads, chrom
= igv_notebook.Browser(
igv_browser
{"reference":
{"id": chrom,
"name": "B. subtilus (NC_000964.3)",
"fastaURL": "https://storage.googleapis.com/rendseq_genomes/NC_000964.3.fasta"
},"locus" : f"{chrom}:1,930,147-2,285,461",
"tracks": [
{"name": "Genes",
"type": "annotation",
"format": "gff3",
"url": "https://storage.googleapis.com/rendseq_genomes/NC_000964.3_B_subtilis_168_genes.gff3",
}]
} )
Step 2 - Transform the raw data
Our peak finding algorithms require that the raw data is transformed using our modified z-score transformation first. (Visit our algorithm page to learn more about our modified z score). The function we use to do this is in rendseq.zscores
.
from rendseq.zscores import z_scores
= z_scores(reads) # if desired z_score_transformed_data can be saved as a wig file and can be loaded and viewed in igv or a similar browser. z_score_transformed_data
Step 3 - Fit some peaks! (Part 1)
Now that we have z_score transformed data, we can find our peaks. We will do this using two approaches - thresh
- where all positions with a score above a threshold are called as peaks and hmm_peaks
which will use a Hidden Markov Model to annotate our sequence with as peak/not peak. Lets start with the thresholded approach:
from rendseq.make_peaks import thresh_peaks
= thresh_peaks(z_score_transformed_data, thresh = 12) thresh_peaks_found
from rendseq.make_peaks import hmm_peaks
= hmm_peaks(z_score_transformed_data) hmm_peaks_found
{"name": "Raw 5' Mapping Reads",
"path": example_wig,
"format": "wig",
"color": "orange",
}]