SonicParanoid is a stand-alone software tool for the identification of orthologous relationships among multiple species.

For more details refer to the paper below:



SonicParanoid, executed in the fast mode, predicted orthologous relationships for 40 eukaryotic proteomes in about 70 minutes, or in less than 5 minutes for 26 prokaryotes, using only 8 CPUs. Moreover, it processed the InParanoid8 input dataset, composed of 273 proteomes (246 eukaryotes), in about one and a half days (38 hours).

web results example


SonicParanoid was tested using a benchmark proteome dataset from the Quest for Orthologs consortium, and the correctness of its predictions was evaluated using a public Orthology Benchmarking service. When compared to other 13 orthology prediction tools, SonicParanoid showed a balanced trade-off between precision and recall, with an accuracy comparable to those of well-established inference methods.

minimum hardware

Easy to use

SonicParanoid only requires the Python programming language and the MMseqs2 alignment tool, to be installed in your laptop/server in order to work. The low hardware requirements make it possible to run SonicParanoid on modern laptop computers, while the "update" feature allows users to easily maintain collections of orthologs that can be updated by adding or removing species.

Get it from PyPI View it on Bitbucket


Hardware requirements

SonicParanoid requires a system with a 64-bit multi-core (at least 4) CPU and 8 Gigabytes of memory.

Supported operative systems

Software requirements

Before installing SonicParanoid make sure that the following software is installed in your system:

Installation and test

Linux using a Python3 virtual environment (No root previleges required)

Apple MacOSX High Sierra using a Python3 virtual environment (No root previleges required)

Install GCC and Python3 (ver. 3.6 or above) on Linux Skip this step if this software is already installed in your system

Install GCC and Python3 (ver. 3.6 or above) on Apple MacOSX High Sierra


SonicParanoid can be executed through the command line by running the program sonicparanoid.
The command:

sonicparanoid --help
provides extra information on the command line parameters.

Input format

SonicParanoid input files must be valid FASTA formatted files containing protein sequences.

Disk space requirements

In order to further speed-up the computation of all-vs-all alignments MMseqs2 generates index files of the input proteome files. These index files are relatively big (about 1 Gigabyte per input proteome), but are automatically removed by SonicParanoid after the execution is completed.
Nevertheless when running SonicParanoid on a laptop computer the avaliable storage might be an issue. SonicParanoid solves this problem by adding an optional parameter called '--no-indexing'

Execution example

SonicParanoid comes with a test input set composed of 4 bacterial proteomes. To test if SonicParanoid has been successfully installed type the following commands:

$ sonicparanoid-get-test-data -o . Retrieves the test proteomes, and creates the test directories $ cd sonicparanoid_test $ sonicparanoid -i ./test_input -o ./test_output -m fast -t 4 --project-id my_first_run
The last of the above commands infers the orthologous relationships among the species which proteomes in FASTA format are stored in test_input, using 4 CPUs in the fast mode, and stores the output in the directory /test_output/runs/my_first_run.

Update an existing run

SonicParanoid allows the update of a previously computed set of ortholog relations by adding and/or removing proteome files from the original input set. Suppose in the previous example we computed the ortholog relations amongst species A, B, C, and D and that we now want to remove C from the analysis.
This is simply done by copying A, B, and D into a new directory my_new_input (or by removing C from the original input directory) and use the same output directory as follows:

$ sonicparanoid -i ./my_new_input -o ./test_output -m fast -t 4 --project-id three_species
The updated results will be stored in the /test_output/runs/three_species directory.
To add a new proteome to the analysis simply copy it to the directory containing the input files and run SonicParanoid again as above.
SonicParanoid will re-use the previously computed alignments and pairwise ortholog tables to minimize the required computation. In the case in which some proteome files need to be modified (e.g., add/remove sequences, or just change the file name) but we do not want to perform a complete new run the --update-input-names parameter can be set when running SonicParanoid. The database and input information will be automatically updated.


At each execution SonicParanoid stores the execution information and results in a directory named /output/runs/my_project/ were my_project can be optionally set using the --project-id parameter.
For example, given the following execution of SonicParanoid

$ sonicparanoid -i ./my_input -o ./test_output --project-id my_first_run -t 4
The output directory structure will be as follows: The directory alignments contains the computed alignment files, while orthologs_db contains pairwise ortholog tables that could be re-used at each update run.
These directories should never be manually modified, since these are used for updating ortholog tables.

Run directory

At each execution a main output directory is generated under /output/runs/ (my_first_run in our example). This directory contains information on the SonicParanoid execution settings (run_info.txt) and input files (species.tsv).

Ortholog groups

The orthologs shared among the input species are stored in the directory named ortholog_groups under the main run directory (my_first_run in our example).
Following are the relevant output files related to the ortholog groups:

Pairwise ortholog tables

In addition to the ortholog groups SonicParanoid provides an ortholog table for each pair of proteome.
For example, given a run with N input proteomes, the directory pairwise_orthologs (under /output/runs/my_first_run) will contain a ortholog table for each of the N * (N - 1) / 2 possible proteome-proteome combinations.

Command line parameters

You can list all the available parameters by typing:

$ sonicparanoid --help
Following is a list of SonicParanoid's parameters and their use:


SonicParanoid was benchmarked using the Orthology Benchmarking service from the QfO consortium.

Test data

SonicParanoid was tested using a benchmark proteome dataset from the Quest for Orthologs consortium (QfO), composed of 66 proteomes, 40 of which from eukaryotes, 5 archaea and 21 bacteria.

QfO 2011 test dataset


Copyright © 2017, Salvatore Cosentino, The University of Tokyo All rights reserved.
Licensed under the GNU GENERAL PUBLIC LICENSE, Version 3.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Salvatore Cosentino
Wataru Iwasaki
top page