Not very good at blogging for sure. But here is a trick I just found for which I have been fighting for a long time: how to align boxplot and violin plots in ggplot2? Nothing really tricky a priori thanks to the great “geom” system of ggplot2, but problems start to arise when grouping factors are involve. geom_boxplot will “stack” boxes side by side within a group, while violin_plot will set them apart. I finally found out this is due to default “dodging” behaviors, which can be fixed as follow:

p <- ggplot(data = dat, aes(x=group1, y=value))
p <- p + geom_violin(aes(fill=group2), position = position_dodge(width = 0.8))
p <- p + geom_boxplot(notch = FALSE, alpha = 0.5, aes(colour = group2), position = position_dodge(width = 0.8), width = 0.5)
p <- p + theme_bw() + guides(color = "none")
p

Sorry, no picture… but if you came across that problem you’d know what I’m talking about. And the trick is “position = position_dodge(width = 0.8))”!

Advertisements

Jumping between genomes

When it turns to do comparative genomics, a very common task is to look at homologous positions in several genomes. While this may sound as a trivial task, in practice it is not! Two challenges are to be faced when attempting to do so:

  1. Genomes to be compare have to be aligned,
  2. Looking up for individual coordinates can end up very inefficient when several thousands coordinates are to be searched.

Several tools have been proposed for tackling that issue. The most versatile one is the LiftOver utility from the UCSC genome browser utilities. Using LiftOver requires however multiple steps, and several (rather complex) protocoles are available online. One important note is that LiftOver should in principle be used only form comparing distinct assemblies of the same genome, not to compare genomes from distinct species for instance.

Confronted to the task, I recently implemented a simple approach in our MafFilter software (http://biopp.univ-montp2.fr/forge/maffilter). MafFilter can do all sort of processing of MAF files. With this new tool, one can simply convert between coordinates of one species to another based on a pairwise genome alignment of the two species (provided, of course, as a MAF file!). MafFilter takes coordinates from feature files in GFF, GTF or BedGraph, and outputs a table with the original coordinates and the translated ones. The syntax is rather short:

DATA=pairwise_aln
input.file=$(DATA).maf.gz
input.file.compression=gzip
input.format=Maf
output.log=$(DATA).maffilter-liftover.log
maf.filter= \
    LiftOver( \
        ref_species=species1, \
        target_species=species2, \
        feature.file=features_species1.bed, \
        feature.file.compression=none, \
        feature.format=BedGraph, \
        file=sp1_to_sp2.tln) \

One will typically want to combine coordinates, for instance in R:

features <- read.table("features_species1.bed")
tln <- read.table("sp1_to_sp2.tln")
features2 <- merge(features, tln, by.x = c("Chr", "Start", "Stop", "Strand"), by.y = c("chr.ref", "strand.ref", "begin.ref", "end.ref"), all = TRUE)

The conversion is relatively fast for hundreds of thousands of features to convert. The only requirement is a MAF file sorted according to the reference species (for instance using maf_project from the TBA package). Pairwise alignments can be obtained with BlastZ (for instance).