Jumping between genomes

When it turns to do comparative genomics, a very common task is to look at homologous positions in several genomes. While this may sound as a trivial task, in practice it is not! Two challenges are to be faced when attempting to do so:

  1. Genomes to be compare have to be aligned,
  2. Looking up for individual coordinates can end up very inefficient when several thousands coordinates are to be searched.

Several tools have been proposed for tackling that issue. The most versatile one is the LiftOver utility from the UCSC genome browser utilities. Using LiftOver requires however multiple steps, and several (rather complex) protocoles are available online. One important note is that LiftOver should in principle be used only form comparing distinct assemblies of the same genome, not to compare genomes from distinct species for instance.

Confronted to the task, I recently implemented a simple approach in our MafFilter software (http://biopp.univ-montp2.fr/forge/maffilter). MafFilter can do all sort of processing of MAF files. With this new tool, one can simply convert between coordinates of one species to another based on a pairwise genome alignment of the two species (provided, of course, as a MAF file!). MafFilter takes coordinates from feature files in GFF, GTF or BedGraph, and outputs a table with the original coordinates and the translated ones. The syntax is rather short:

maf.filter= \
    LiftOver( \
        ref_species=species1, \
        target_species=species2, \
        feature.file=features_species1.bed, \
        feature.file.compression=none, \
        feature.format=BedGraph, \
        file=sp1_to_sp2.tln) \

One will typically want to combine coordinates, for instance in R:

features <- read.table("features_species1.bed")
tln <- read.table("sp1_to_sp2.tln")
features2 <- merge(features, tln, by.x = c("Chr", "Start", "Stop", "Strand"), by.y = c("chr.ref", "strand.ref", "begin.ref", "end.ref"), all = TRUE)

The conversion is relatively fast for hundreds of thousands of features to convert. The only requirement is a MAF file sorted according to the reference species (for instance using maf_project from the TBA package). Pairwise alignments can be obtained with BlastZ (for instance).


4 thoughts on “Jumping between genomes

  1. When I used the LiftOver in MafFilter v1.1.2, I got the following error. Could you help me to solve this problem?
    Command: maffilter input.file=test.maf input.file.compression=none input.format=Maf output.log=test.maffilter.log maf.filter= \
    LiftOver( \
    ref_species=hg19, \
    target_species=mm9, \
    feature.file=test.bed, \
    feature.file.compression=none, \
    feature.format=BedGraph, \
    file=test.out.bed) \
    Error: bash: syntax error near unexpected token `(‘


  2. I have run the above command and got an error: “Unknown filter: LiftOver”.
    Would you mind helping to resolve this issue? Thank you.

    The detailed output:
    Parsing options:
    Parsing file option_new.bpp for options.
    Reading file………………………: ref_HetGla1mm10_swap.maf as Maf
    Output log file……………………: ref_HetGla1mm10_swap.maffilter-liftover.log
    Adding filter……………………..: LiftOver
    WARNING!!! Parameter verbose not specified. Default used instead: 1
    — Verbose………………………..: yes
    Unknown filter: LiftOver


    • Dear Ben,

      The LiftOver option is a recently developed feature and is not present in the current version. It will be included in the next one, in the mean time you have to compile maffilter (and the Bio++ libraries) from the GIT repository to use it.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s