scRNAseq Ambient RNA Correction
The first step in the workflow, prior to analysis in R, is to correct the raw count matrices for ambient RNA contamination. This is performed using CellBender (Fleming et al. 2023)
Environment Setup
The exact conda environment used in this run can be installed using this yml file using conda env create -f cellbender.yml
Alternatively you can create the environement as follows
## Installing cellbender
conda create -n cellbender python=3.7
conda activate cellbender
pip install cellbender==0.3.0Running Cellbender
The following shell script is submitted to a high-performance computing (HPC) cluster as a SLURM job array to process each of the 48 samples in parallel.
The script pulls a cellbender_manifest.txt file from data directory with at two columns: sample_id and cellranger_output path for each sample as follow
| sample_1 | path/to/sample_1/cellranger_output |
| sample_2 | path/to/sample_2/cellranger_output |
| sample_3 | path/to/sample_3/cellranger_output |
#!/bin/bash
#SBATCH --account=
#SBATCH --job-name='Cellbender_scRNAseq_%a'
#SBATCH --output=logs/Cellbender_scRNAseq_%a.log
#SBATCH --partition=gpu
#SBATCH --mem=128G
#SBATCH --cpus-per-task=16
#SBATCH --gres=gpu:1
#SBATCH --time=04:00:00
#SBATCH --array=1-48
## Setting up environment
echo -e ">>> Start time $(date) <<<"
start_time=$(date +%s)
source activate cellbender
outputDir=../outputs/scRNASeq_Analysis/cellbender/
mkdir -p $outputDir
## Importing samples info
samples_info=../data/cellbender_manifest.txt
sample=$(cat $samples_info | cut -f1 | sed -n $[SLURM_ARRAY_TASK_ID]p)
path=$(cat $samples_info | cut -f2 | sed -n $[SLURM_ARRAY_TASK_ID]p)
outputDir=${outputDir}/${sample}
mkdir -p $outputDir
echo -e ">>> Processing $sample <<<"
echo -e ">>> Sample path $path <<<"
echo -e ">>> Output in $outputDir <<<"
## Running CellBender
cd $outputDir
cellbender remove-background \
--cuda \
--input ${path}/raw_feature_bc_matrix.h5 \
--output ${outputDir}/${sample}.h5
## Reporting time
echo -e ">>> End time $(date) <<<"
end_time=$(date +%s)
runtime_seconds=$((end_time - start_time))
runtime_minutes=$((runtime_seconds / 60))
echo "Total runtime: $runtime_minutes minutes"
References
Fleming, Stephen J, Mark D Chaffin, Alessandro Arduini, Amer-Denis Akkad, Eric Banks, John C Marioni, Anthony A Philippakis, Patrick T Ellinor, and Mehrtash Babadi. 2023. “Unsupervised Removal of Systematic Background Noise from Droplet-Based Single-Cell Experiments Using CellBender.” Nature Methods 20 (9): 1323–35.