Composition Profiler - version 1.1 (April 2007) Copyright (c) 2007 Vladimir Vacic, Vladimir N. Uversky, A. Keith Dunker, Stefano Lonardi. Composition Profiler incorporates portions of code from the Cephes Math Library; detailed licensing information can be found in the LICENSE.txt file. CONTENTS: cgi-bin - This directory contains Ruby scripts for generating composi- tion profiles. The main script for the web (CGI) application is profiler.cgi, and the main scripts for the command line application are cdiscover.rb and cprofile.rb. datasets - Datasets used to build the examples on the web page and to compute standard protein database statistics. html - Help, examples and credits HTML pages. cc - C code used for bootstrapping and to compute statistical significance of the difference and relative entropy between two samples. SYSTEM REQUIREMENTS: Composition Profiler requires the Ruby interpreter, GhostScript (a PostScript interpreter) and ImageMagick. All three programs are by default installed on any Linux system; in the event that they are not installed, they can be downloaded free of charge from: Ruby - http://www.ruby-lang.org GhostScript - http://www.cs.wisc.edu/~ghost ImageMagick - http://www.imagemagick.org Ruby interpreter is normally in the system path (type ruby -v on the system prompt to verify this). GhostScript and ImageMagick are usually in the system path as well; in the case that they are not, the locations of the binaries can be specified in the cprof.conf file. In addition to these three, the web version requires a running web server: Composition Profiler has been tested on Apache, using the Ruby module, which can be downloaded from: mod_ruby - http://www.modruby.net Composition Profiler was tested with Ruby version 1.8, GhostScript 8.56, ImageMagick 6.3.4, mod_ruby 1.2.5 on Fedora Core and Ubuntu Linux distributions and Mac OS X. C PROGRAMS: Composition Profiler uses three programs written in C ("pvalue", "frequency" and "rentropy") for calculating computationally-intensive functions. Source code of these three C programs can be found in the /cc directory. Before they can be used, they needs to be compiled on the platform on which they will be run. A Makefile is provided; it suffices to type "make" on the command prompt in the /cc directory and copy the executables in the directory with the Ruby scripts. COMMAND LINE ARGUMENTS: Usage: cdiscover.rb -Q [options] Looks for statistically significant composition differences between two sets. Mandatory arguments: -Q Optional arguments: -B or -D One of the following: disprot Disordered regions from DisProt 3.4 pdbs25 PDB Select 25 sprot Proteins from SwissProt surface Surface residues of monomers from PDB Defaults to sprot. -A Significance value for the statistical test. Defaults to 0.05. -b Bonferroni correction. Off by default. ------------ Usage: cprofile.rb -Q -O [options] Creates a composition profile for the input FastA file. Mandatory arguments: -Q -O Output file name. Optional arguments: -B or -D One of the following: disprot Disordered regions from DisProt 3.4 pdbs25 PDB Select 25 sprot Proteins from SwissProt surface Surface residues of monomers from PDB Defaults to sprot. -C One of the following: alpha_n Alpha helix frequency (N) amino Amino color scheme aromatics Aromatics beta_n Beta structure frequency (N) bw Black and white bulkiness_z Bulkiness (Z) charge Charge coil_n Coil propensity (N) discolor_d Discolor propensity (D) flex_v Flexibility hydro_e Hydrophobicity (E) hydro_kd Hydrophobicity (K-D) hydro_fp Hydrophobicity (F-P) interface_jt Interface propensity (J-T) linker_gh Linker propensity (G-H) polarity_z Polarity (Z) shapley Shapley color scheme size_d Size (D) surface_j Surface exposure (J) solvation_jt Solvation potential (J-T) weblogo Weblogo color scheme Defaults to bw. -F Format of output (EPS, GIF, PDF, PNG, TXT). Defaults to PNG. -H Height of output image. Defaults to 3.5". -I Number of bootstrap iterations. Deafults to 10000. -R Bitmap resolution. Defaults to 96. -S Sorts residues in the increasing order of one of the physico-chemical or structural properties: alpha Alphabetical order alpha_n Alpha helix frequency (Nagano) diff By observed differences beta_n Beta structure frequency (Nagano) bulikness_z Bulkiness (Zimmerman) coil_n Coil propensity (Nagano) flex_v Flexibility (Vihinen) hydro_e Hydrophobicity (Eisenberg) hydro_kd Hydrophobicity (Kyte-Doolittle) hydro_fp Hydrophobicity (Fauchere-Pliska) interface_jt Interface propensity (Jones-Thornton) linker_gh Linker propensity (George-Heringa) polarity_z Polarity (Zimmerman) size_d Size (Dawson) surface_j Surface exposure (Janin) solvation_jt Solvation potential (Jones-Thornton) Defaults to alphabetical order. -U Chart dimensions units (cm, inch, pixel, point). Defaults to cm. -W Width of output image. Defaults to 5". -X Resolution units when bitmap resolution is specified (ppi, ppc, ppp). Defaults to ppi. -Y Y-axis label. Optional toggles (no values associated): -a Toggle antialiasing. COMMAND LINE EXAMPLES: Simple command line examples for discovery and plotting of composition anomalies for alpha-MoRF residues: ./cdiscover.rb -Q ../datasets/alpha_morf.fa -D pdbs25 ------------ ./cprofile.rb -Q ../datasets/alpha_morf.fa -O alpha.png -F PNG -D pdbs25 \ -S flex_v -C disorder_d -Y "(Alpha MoRF - PDBS25) / PDBS25" -a ------------ ./rentropy ../datasets/heterodimers.fa ../datasets/homodimers.fa 10000 The first line of the output is the relative entropy, the second line is the p-value (details of estimating the p-value are given in the paper). WEB APPLICATION SETTINGS: Due to security concerns, we have separated the cgi scripts from the html documents and images. Cgi scripts are in a subdirectory of cgi-bin, and html documents are in a subdirectory of the Apache web document root. For Apache web server running on a Linux system, assuming the default Apache settings, those are "/var/www/cgi-bin" and "/var/www/html", respectively. For the Composition Profiler cgi script to be able to link to html documents (such as help files, images, etc.), relative path for the html documents in relation to the cgi scripts directory has to be specified in profiler.cgi "path" variable. For example: path = "../../profiler/" In addition to this, the profiler.cgi script needs to be configured to write the output images into an Apache-writable directory under the Apache web document root, so they can be displayed to the user. This is done using the "temp" variable. For example: temp = "/var/www/html/temp/"