SonicParanoid


SonicParanoid is a stand-alone software tool for the identification of orthologous relationships among multiple species.

For more details refer to the paper below:

From version 1.2 SonicParanoid uses Markov Clustering for inferring ortholog groups by default. To use single linkage clustering, used in the published version of SonicParanoid, use the parameter --single-linkage
features

Fast

SonicParanoid, executed in the fast mode, predicted orthologous relationships for 40 eukaryotic proteomes in about 70 minutes, or in less than 5 minutes for 26 prokaryotes, using only 8 CPUs. Moreover, it processed the InParanoid8 input dataset, composed of 273 proteomes (246 eukaryotes), in about one and a half days (38 hours).

web results example

Accurate

SonicParanoid was tested using a benchmark proteome dataset from the Quest for Orthologs consortium, and the correctness of its predictions was evaluated using a public Orthology Benchmarking service. When compared to other 13 orthology prediction tools, SonicParanoid showed a balanced trade-off between precision and recall, with an accuracy comparable to those of well-established inference methods.

minimum hardware

Easy to use

SonicParanoid only requires the Python programming language and the MMseqs2 alignment tool, to be installed in your laptop/server in order to work. The low hardware requirements make it possible to run SonicParanoid on modern laptop computers, while the "update" feature allows users to easily maintain collections of orthologs that can be updated by adding or removing species.


Get it from PyPI Source Code @


Installation

Hardware requirements

SonicParanoid requires a system with a 64-bit multi-core (at least 4) CPU and 8 Gigabytes of memory.


Supported operative systems


Software requirements

Before installing SonicParanoid make sure that the following software is installed in your system:


Installation and test

Linux using a Python3 virtual environment (No root previleges required)

Apple MacOSX High Sierra using a Python3 virtual environment (No root previleges required)

Install GCC and Python3 (ver. 3.6 or above) on Linux Skip this step if this software is already installed in your system

Install GCC and Python3 (ver. 3.6 or above) on Apple MacOSX High Sierra

Usage

SonicParanoid can be executed through the command line by running the program sonicparanoid.
The command:

sonicparanoid --help
provides extra information on the command line parameters.


Input format

SonicParanoid input files must be valid FASTA formatted files containing protein sequences.


Disk space requirements

In order to further speed-up the computation of all-vs-all alignments MMseqs2 generates index files of the input proteome files. These index files are relatively big (about 1 Gigabyte per input proteome), but are automatically removed by SonicParanoid after the execution is completed.
Nevertheless when running SonicParanoid on a laptop computer the avaliable storage might be an issue. SonicParanoid solves this problem by adding an optional parameter called '--no-indexing'


Execution example

SonicParanoid comes with a test input set composed of 4 bacterial proteomes. To test if SonicParanoid has been successfully installed type the following commands:

$ sonicparanoid-get-test-data -o . Retrieves the test proteomes, and creates the test directories $ cd sonicparanoid_test $ sonicparanoid -i ./test_input -o ./test_output -m fast -t 4 --project-id my_first_run
The last of the above commands infers the orthologous relationships among the species which proteomes in FASTA format are stored in test_input, using 4 CPUs in the fast mode, and stores the output in the directory /test_output/runs/my_first_run.


Update an existing run

SonicParanoid allows the update of a previously computed set of ortholog relations by adding and/or removing proteome files from the original input set. Suppose in the previous example we computed the ortholog relations amongst species A, B, C, and D and that we now want to remove C from the analysis.
This is simply done by copying A, B, and D into a new directory my_new_input (or by removing C from the original input directory) and use the same output directory as follows:

$ sonicparanoid -i ./my_new_input -o ./test_output -m fast -t 4 --project-id three_species
The updated results will be stored in the /test_output/runs/three_species directory.
To add a new proteome to the analysis simply copy it to the directory containing the input files and run SonicParanoid again as above.
SonicParanoid will re-use the previously computed alignments and pairwise ortholog tables to minimize the required computation. In the case in which some proteome files need to be modified (e.g., add/remove sequences, or just change the file name) but we do not want to perform a complete new run the --update-input-names parameter can be set when running SonicParanoid. The database and input information will be automatically updated.


Output

At each execution SonicParanoid stores the execution information and results in a directory named /output/runs/my_project/ were my_project can be optionally set using the --project-id parameter.
For example, given the following execution of SonicParanoid

$ sonicparanoid -i ./my_input -o ./test_output --project-id my_first_run -t 4
The output directory structure will be as follows: The directory alignments contains the computed alignment files, while orthologs_db contains pairwise ortholog tables that could be re-used at each update run.
These directories should never be manually modified, since these are used for updating ortholog tables.

Run directory

At each execution a main output directory is generated under /output/runs/ (my_first_run in our example). This directory contains information on the SonicParanoid execution settings (run_info.txt) and input files (species.tsv).

Ortholog groups

The orthologs shared among the input species are stored in the directory named ortholog_groups under the main run directory (my_first_run in our example).
Following are the relevant output files related to the ortholog groups:

Pairwise ortholog tables

In addition to the ortholog groups SonicParanoid provides an ortholog table for each pair of proteome.
For example, given a run with N input proteomes, the directory pairwise_orthologs (under /output/runs/my_first_run) will contain a ortholog table for each of the N * (N - 1) / 2 possible proteome-proteome combinations.

Command line parameters

You can list all the available parameters by typing:

$ sonicparanoid --help
Following is a list of SonicParanoid's parameters and their use: