Basic Rendseq Usage

In this Notebook we run through a simple example of how to use rendseq for peak calling, demoing both HMM peak fitting and a simple thresholding approach, on some raw reads stored in wig format. These raw reads are included as part of our package and can be found in the Example Data folder. Note that we will make use of the library igv-jupyter for visualizations here - but this is not inlcuded as part of rendseq and will need to be seperately installed.

Step 0 - Misc Installs

For all the rendseq related functionality - we will install it bit by bit below so it is easier to see what functionality comes from what part of the package. This section is just to import other useful libraries for this demo:

import igv_notebook
igv_notebook.init()

Step 1 - Load data

In rendseq.file_funcs we have some functions for opening/creating relevant files. We will use that to extract our raw reads from the wig files in Example Data. Wig files are files with read count-density as a function of nucleotide density. These files are generated from Rend-seq data by recording the 5’-mapping and 3’-mapping read density into two files. In total this means a single Rend-seq experiment gives rise to 4 wig files: the 3’ and 5’ mapping reads for both the forward and reverse strands. The different direction strands and ends can be dealt with seperately - in this tutorial we will focus on the forward strand 5’-end mapping reads.

from rendseq.file_funcs import open_wig

example_wig = '../Example_Data/example_wig_Jean_5f_subset.wig'
reads, chrom = open_wig(example_wig)  # chrom information will be used when we save either our z-scores or peaks as wig files again and for the IGV browser:

igv_browser= igv_notebook.Browser(
    {
        "reference":
        {
            "id": chrom,
            "name": "B. subtilus (NC_000964.3)",
            "fastaURL": "https://storage.googleapis.com/rendseq_genomes/NC_000964.3.fasta"
        },
        "locus" : f"{chrom}:1,930,147-2,285,461",
        "tracks": [
        {
            "name": "Genes",
            "type": "annotation",
            "format": "gff3",
            "url": "https://storage.googleapis.com/rendseq_genomes/NC_000964.3_B_subtilis_168_genes.gff3",
        }]
    }
)

Step 2 - Transform the raw data

Our peak finding algorithms require that the raw data is transformed using our modified z-score transformation first. (Visit our algorithm page to learn more about our modified z score). The function we use to do this is in rendseq.zscores.

from rendseq.zscores import z_scores

z_score_transformed_data = z_scores(reads)  # if desired z_score_transformed_data can be saved as a wig file and can be loaded and viewed in igv or a similar browser.

Step 3 - Fit some peaks! (Part 1)

Now that we have z_score transformed data, we can find our peaks. We will do this using two approaches - thresh - where all positions with a score above a threshold are called as peaks and hmm_peaks which will use a Hidden Markov Model to annotate our sequence with as peak/not peak. Lets start with the thresholded approach:

from rendseq.make_peaks import thresh_peaks

thresh_peaks_found = thresh_peaks(z_score_transformed_data, thresh = 12)

from rendseq.make_peaks import hmm_peaks

hmm_peaks_found = hmm_peaks(z_score_transformed_data)

        {
            "name": "Raw 5' Mapping Reads",
            "path": example_wig,
            "format": "wig",
            "color": "orange",
        }]