SonicParanoid


SonicParanoid is a stand-alone software tool for the identification of orthologous relationships among multiple species.

For more details refer to the paper below:

features

Fast

SonicParanoid, executed in the fast mode, predicted orthologous relationships for 40 eukaryotic proteomes in about 70 minutes, or in less than 5 minutes for 26 prokaryotes, using only 8 CPUs. Moreover it processed the InParanoid8 input dataset, composed of 273 proteomes (246 eukaryotes), in about one and a half days (38 hours).

web results example

Accurate

SonicParanoid was tested using a benchmark proteome dataset from the Quest for Orthologs consortium, and the correctness of its predictions was evaluated using a public Orthology Benchmarking service. When compared to other 13 orthology prediction tools, SonicParanoid showed a balanced trade-off between precision and recall, with an accuracy comparable to those of well-established inference methods.

minimum hardware

Easy to use

SonicParanoid only requires the Python programming language and the MMseqs2 alignment tool, to be installed in your laptop/server in order to work. The low hardware requirements make it possible to run SonicParanoid on modern laptop computers, while the "update" feature allows users to easily maintain collections of orthologs that can be updated by adding or removing species.


Get it from PyPI using Python pip: 'pip install sonicparanoid' View it on Bitbucket


Installation

Hardware requirements

SonicParanoid requires a system with a 64-bit multi-core (at least 4) CPU and 8 Gigabytes of memory.


Supported operative systems


Software requirements

Before installing SonicParanoid make sure that the following software is installed in your system:


Installation, and test on the tested operative systems

Linux using a Python3 virtual environment (No root previleges required)

Apple MacOSX High Sierra using a Python3 virtual environment (No root previleges required)

Ubuntu (ver. 17.10 or above)

Known Issues

SonicParanoid installs with no errors but the program sonicparanoid does not exist, or do not appear in the command line.

CentOS (ver. 7 or above) [The installation on RedHat Linux shoud be very similar]

Install software required by MMseqs2 Skip this step if you already have GCC 7 (or above), and cmake (3.10 or above) installed $ sudo yum groupinstall 'Development Tools' --assumeyes $ sudo yum install centos-release-scl --assumeyes $ sudo yum install epel-release --assumeyes $ sudo yum install devtoolset-7-gcc* --assumeyes Install GCC 7 $ scl enable devtoolset-7 bash Enable GCC 7 $ sudo yum install cmake3 --assumeyes Install Python3 (ver. 3.6) if required $ sudo yum install https://centos7.iuscommunity.org/ius-release.rpm --assumeyes Requires root privileges $ sudo yum install python36u python36u-pip python36u-devel --assumeyes Requires root privileges Install SonicParanoid $ pip3.6 install -U pip setuptools cython sh numpy pandas biopython add 'sudo' in front of the command if you get permission error $ pip3.6 install sonicparanoid add 'sudo' in front of the command if you get permission error Compile MMseqs2 Version supported by SonicParanoid $ sonicparanoid-get-mmseqs2 -o . Extracts the MMseqs2 source code in the current directory $ cd mmseqs2_src/build $ cmake3 -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. .. If the program is called just cmake make sure the version is higher than 3.10 $ make This might take a few minutes Compiles MMseqs2 $ make install Copies the MMseqs2 binaries under /mmseqs2_src/build/bin/mmseqs $ sonicparanoid-set-mmseqs2 -i ./bin/mmseqs add 'sudo' in front of the command if you get permission error Copies the MMseqs2 binaries inside the SonicParanoid package Test installation $ sonicparanoid-get-test-data -o . Retrieves the test proteomes, and creates the test directories $ cd sonicparanoid_test $ sonicparanoid -i ./test_input -o ./test_output -m fast -t 4 add 'sudo' in front of the command if you get permission error (it can happen only at the first run)
Xcode install and setup Skip this step if you already have Xcode installed on your mac Download and install the latest stable version of Xcode from https://developer.apple.com/download $ xcode-select --install Install Homebrew, Python3, gcc, and cmake Skip if you already have installed this software $ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" $ export PATH=/usr/local/bin:/usr/local/sbin:$PATH $ brew install gcc make sure to install version 8.1 or higher $ brew install cmake $ brew install git python3 skip if Python3 and pip3 are already installed Install SonicParanoid $ pip3 install -U cython add 'sudo -H' in front of the command if you get permission error Skip if Cython is already installed on your mac $ pip3 install sonicparanoid add 'sudo -H' in front of the command if you get permission error Compile MMseqs2 Version supported by SonicParanoid $ sonicparanoid-get-mmseqs2 -o . Extracts the MMseqs2 source code in the current directory $ cd mmseqs2_src/build $ CXX="$(brew --prefix)/bin/g++-8" cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. .. change the "g++-X" part if you are using a version older than 8, and change the X with the version number you are using $ make This might take a few minutes Compiles MMseqs2 $ make install Copies the MMseqs2 binaries under /mmseqs2_src/build/bin/mmseqs $ sonicparanoid-set-mmseqs2 -i ./bin/mmseqs add 'sudo' in front of the command if you get permission error Copies the MMseqs2 binaries inside the SonicParanoid package Test installation $ sonicparanoid-get-test-data -o . Retrieves the test proteomes, and creates the test directories $ cd sonicparanoid_test $ sonicparanoid -i ./test_input -o ./test_output -m fast -t 4 add 'sudo' in front of the command if you get permission error (it can happen only at the first run)

Usage

SonicParanoid can be executed through the command line by running the program sonicparanoid.
The command:

sonicparanoid --help
provides extra information on the command line parameters.


Input format

SonicParanoid input files must be valid FASTA formatted files containing protein sequences.

  1. The file names must not contain dash - symbols nor extensions.
  2. SonicParanoid will automatically replace blank, tabulation, and additional greater-then > symbols in the FASTA headers with pipe | symbols.


Disk space requirements

In order to further speed-up the computation of all-vs-all alignments MMseqs2 generates index files of the input proteome files. These index files are relatively big (about 1.2 Gigabytes per input proteome), but are automatically removed by SonicParanoid after the execution is completed.
Nevertheless when running SonicParanoid on a laptop computer the avaliable storage might be an issue. SonicParanoid solves this problem by adding an optional parameter called '--no-indexing'


Execution example

SonicParanoid comes with a test input set composed of 4 bacterial proteomes. To test if SonicParanoid has been successfully installed type the following commands:

$ sonicparanoid-get-test-data -o . Retrieves the test proteomes, and creates the test directories $ cd sonicparanoid_test $ sonicparanoid -i ./test_input -o ./test_output -m fast -t 4
The last of the above commands infers the orthologous relationships among the species which proteomes in FASTA format are stored in test_input, using 4 CPUs in the fast mode, and stores the output in the directory test_output.


Output

Given a run with N input proteomes, the main output directory will have the following structure and content:


Command line parameters

You can list all the available parameters by typing:

$ sonicparanoid --help
Following is a list of SonicParanoid's parameters and their use:

Additional programs

SonicParanoid includes additional programs to help you post-process your output.
sonicparanoid-extract is a program to filter your species groups by different criteria (e.g., group size or group ids), additionally the program can also constract fasta files from the selected groups.
The command:

sonicparanoid --help
provides extra information on the command line parameters.

sonicparanoid-extract is a program to filter your species groups by different criteria and you can use it for the following tasks:


Type sonicparanoid-extract --help to obtain the complete list of parameters.

Benchmarks

SonicParanoid was benchmarked in its three execution modes (fast, default, and sensitive), using the Orthology Benchmarking service from the QfO consortium.

Test data

SonicParanoid was tested using a benchmark proteome dataset from the Quest for Orthologs consortium (QfO), composed of 66 proteomes, 40 of which from eukaryotes, 5 archaea and 21 bacteria.

QfO 2011 test dataset

License

Copyright © 2017, Salvatore Cosentino, The University of Tokyo All rights reserved.
Licensed under the GNU GENERAL PUBLIC LICENSE, Version 3.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
https://www.gnu.org/licenses/gpl-3.0.en.html
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Contact

Salvatore Cosentino
salvocos@bs.s.u-tokyo.ac.jp
salvo981@gmail.com
Wataru Iwasaki
iwasaki@bs.s.u-tokyo.ac.jp
top page
contact
IWASAKI Lab.