Friday, March 12, 2021

Generate Manhattan plots with ggplot2

Since the Manhattan plot draws millions (sometimes tens of millions) of points, it is not wise to output the plots in vector format. To generate and save Manhattan plots, I usually output them in PNG format, basically using an R package 'qqman.' But it's not beautiful when enlarged, and more importantly, it's obviously inferior when lined up with other raster format graphics.
Finally, today I decided to combine ggplot2 and ggrastr to generate a beautiful Manhattan plot. Using ggrastr, we can rasterize plots but keep lines, text, and labels in vector format (link).

Launch R.
R
Load packages.
library(data.table)
library(ggplot2)
library(ggrastr)
library(patchwork)
Load EWAS result (consist of chr, pos, and p-value).
df=as.data.frame(fread("EWAS.res",header=T),stringsAsFactors=F)
head(df,4)
# chr   pos   pval
#   1 12345 0.9882
#   1 12349 0.6454
#   1 12368 0.8322
#   1 12552 0.2339
df$chr=factor(df$chr,levels=as.character(1:22))
Calculate cummulative value of position. This will be used as the position on X-axis on Manhattan plot.
df$pos2=df$pos
for(chr in 2:22){
 tmp=chr-1
 tmp2=max(as.numeric(df[df$chr==tmp,"pos2"]),na.rm=T)
 df[df$chr==chr,"pos2"]=df[df$chr==chr,"pos"]+tmp2
}
Calculate center positions of each chromosome on X-axis. This will be used to set the positions of chromosome numbers.
df2=data.frame(chr=1:22,center=NA,stringsAsFactors=F)
for(chr in 1:22){
 start=min(df[df$chr==chr,"pos2"],na.rm=T)
 end=max(df[df$chr==chr,"pos2"],na.rm=T)
 center=mean(c(start,end))
 df2[df2$chr==chr,"center"]=center
}
Make a repeating pattern of two colors. My favorite is:
colour1="#2e4057"
colour2="#8c8c8c"
Get Bonferroni-corrected threshold.
bonf=0.05/nrow(df)
Ready to plot.
g=ggplot(df,aes(x=pos2,y=-log10(pval),colour=as.character(chr)))+geom_hline(yintercept=-log10(bonf),linetype="dashed")+geom_point_rast(size=0.5)+scale_colour_manual(values=c("1"=colour1,"2"=colour2,"3"=colour1,"4"=colour2,"5"=colour1,"6"=colour2,"7"=colour1,"8"=colour2,"9"=colour1,"10"=colour2,"11"=colour1,"12"=colour2,"13"=colour1,"14"=colour2,"15"=colour1,"16"=colour2,"17"=colour1,"18"=colour2,"19"=colour1,"20"=colour2,"21"=colour1,"22"=colour2))+theme_classic()+theme(legend.position="NONE")+scale_x_continuous(breaks=df2$center,labels=c(1:15,"",17,"",19,"",21,""))+xlab("Chromosome")+ylab(expression(paste(-log[10],"(",italic(P), " value)")))+geom_hline(yintercept=-log10(bonf),linetype="dashed")
ggsave("mahnattan.pdf",g,width=6,height=3)
Points are generated in raster format while lines and texts are in vector format.

Tuesday, February 9, 2021

Trinity: salmon ... died with ret (256)

I had some trouble running de novo assembly using Trinity.
Error, cmd:
salmon --no-version-check index -t .../trinity_out_dir/read_partitions/Fb_0/CBin_161/c16184.trinity.reads.fa.out/Trinity.fasta.tmp -i .../trinity_out_dir/read_partitions/Fb_0/CBin_161/c16184.trinity.reads.fa.out/Trinity.fasta.tmp.salmon.idx --type quasi -k 25 -p 1
 died with ret (256) at .../trinityrnaseq-Trinity-v2.6.6/util/support_scripts/../../PerlLib/Process_cmd.pm line 19.
	Process_cmd::process_cmd("salmon --no-version-check index -t ...) called at .../trinityrnaseq-Trinity-v2.6.6/util/support_scripts/salmon_runner.pl line 23
Trinity run failed. Must investigate error above.
Trinity was installed with "brew install -v trinity". Version was 2.6.6 which is old. According to Google, it was probably due to a mismatch with the salmon version.
First, I tried docker and singularity but another error appeared.
Error, cmd: jellyfish count -t 8 -m 25 -s 100000000  --canonical  both.fa died with ret 9 at /usr/local/bin/trinityrnaseq/util/insilico_read_normalization.pl line 793.
Error, cmd: /usr/local/bin/trinityrnaseq/util/insilico_read_normalization.pl --seqType fq --JM 80G  --max_cov 200 --min_cov 1 --CPU 8 --output .../trinity_out_dir/insilico_read_normalization --max_CV 10000  --left FASTAfiles --right FASTAfiles --pairs_together  --PARALLEL_STATS   died with ret 512 at /usr/local/bin/trinityrnaseq/Trinity line 2826.
	main::process_cmd("/usr/local/bin/trinityrnaseq/util/insilico_read_normalization"...) called at /usr/local/bin/trinityrnaseq/Trinity line 3379
	main::normalize(..., ARRAY(0x55bbade24fd8), ARRAY(0x55bbade24f78)) called at /usr/local/bin/trinityrnaseq/Trinity line 3319
	main::run_normalization(200, ARRAY(0x55bbade24fd8), ARRAY(0x55bbade24f78)) called at /usr/local/bin/trinityrnaseq/Trinity line 1372
I have no idea to solve them. Then I tried another way: install Trinity using conda install. A newer version of Trinity (v2.11.0) is available.
conda install -c bioconda trinity
Finally, Trinity de novo assembly successfully worked on my mac.

"sleuth" hdf5. file accessibilty. unable to open file (Mac)

I was informed that an error occurs when loading kallisto files into sleuth using sleuth_prep().
"sleuth" hdf5. file accessibilty. unable to open file
When I tried to run the program, I got the same error. I couldn't find any useful solutions on google, so I tried to figure it out by myself.

In my case, this error seems to be caused by kallisto, not sleuth. Because kallisto's quant() command gave a warning message and the .hd5 file was not created.
Warning: kallisto was not compiled with HDF5 support so no bootstrapping will be performed. Run quant with --plaintext option or recompile with HDF5 support to obtain bootstrap estimates.

I think the problem was caused by a system update, but I couldn't figure out why. Probably, it can be handled if we can prepare a clean environment.

1. Create an independent clean conda environment. Here, I named it kallisto. And activate the environment. You can delete the environment when you no longer need it.
conda create -n kallisto -y
conda activate kallisto

2. Install kallisto and sleuth in the new environment.
conda install kallisto
conda install -c bioconda r-sleuth

3. Run kallisto as you always do. Then, you will probably get .hd5 and .tsv files.
Make sure that you use kallisto which was installed by above command.
kallisto quant --index=${rootdir}/ref/kallisto.idx --output-dir=SRR1551005 --bootstrap-samples=2 --threads=2 ${rootdir}/seq/SRR1551005_1.fastq ${rootdir}/seq/SRR1551005_2.fastq

4. Run sleuth as you always do.
R
library(sleuth)
so=sleuth_prep(...) # kallisto data was successfully loaded in my case 
...
q()

5. Deactivate the environment. Files will be retained.
conda deactivate

Hello blogger!

Hello blogger.

Generate Manhattan plots with ggplot2

Since the Manhattan plot draws millions (sometimes tens of millions) of points, it is not wise to output the plots in vector format. To gene...