IBA LABORATORY, THE UNIVERSITY OF TOKYO
EGPC (v1.0) : a powerful tool for data classification and important features identification
Terms and conditions for use of the software:  The owners (authors) of the software  give you non-exclusive and non-transferable license to use the software and to modify it for your needs. However, whenever you use the software, you are requested to make a citation to the following journal paper. This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Limitation of Liability:
IN NO EVENT, THE AUTHORS (OWNERS) OF THE SOFTWARE ARE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE).

Should you have any question or comment, please contact: topon@ibalab or iba@ibalab. [ibalab=iba.k.u-tokyo.ac.jp].
Citation of the Paper:
@ARTICLE{Paul:2007:EGPC,
AUTHOR = {Topon Kumar Paul and Hitoshi Iba},
TITLE = {Prediction of cancer class with majority voting genetic  
programming classifier using gene expression data},
JOURNAL = {IEEE/ACM Transactions on Computational Biology and Bioinformatics},
YEAR = {2007},
VOLUME = {},
NUMBER = {},
PAGES = {}
}
OR
Topon Kumar Paul and Hitoshi Iba (2007). Prediction of cancer class with majority voting genetic programming classifier using gene expression data. To appear in IEEE/ACM Transactions on Computational Biology and Bioinformatics.
 
Download: All Files (in WinZip format) | EGPCpre.jar | EGPCcom.jar | EGPCgui.jar | Example Data Files (WinZip file)
Download Manual: Readme.pdf | Readme.html | Readme.txt
Note:
  • Before you execute the programs, read the manual: Readme.pdf | Readme.html | Readme.txt.
  • To run this program, you need Java Runtime Environment (JRE) installed on your computer.
  • To execute EGPCgui.jar on example data files, you must put those data files in the subdirectory "DataFile/" under current working directory. If you try to execute the jar file by double clicking on it, sometimes EGPC may not work properly.
 
EGPC is a multi-class classifier based on genetic programming and majority voting. The main features of EGPC are that:
  • It runs in command line interface (CLI) and graphical user interface (GUI) modes;
  • It can be used for binary and multi-class classification;
  • It can handle microarray gene expression data as well as UCI machine learning (ML) databases (with no missing values);
  • It can handle numeric, nominal and Boolean (converted to numbers) features;
  • It can evolve rules with arithmetic and/or logical functions;
  • It can handle training subsets constructed by fixed or random split of data; and
  • It can handle training and validation data stored in two separate files provided that they have the same number of attributes and the attributes are in same order.

Execution of EGPCpre.jar (for preprocessing of data) in CLI mode:
To run the program from command prompt, type:
       java [-Xmx<heap size>] -jar EGPCpre.jar [arguments...]

Command line arguments and formats:
-Xmx<heap size>: maximum heap size; some data sets may require higher heap size. Example: -Xmx512m (m or M for mega byte).
-f <input file>: input data file name (with path if not on the current working directory); <input file> must be provided.
-o <output file>: output file name (with path if not on the current working directory); default: DataOut.txt.
-p <l:h:d:f>: preprocessing parameters; l=lower threshold, h-higher threshold, d=difference, f=fold change.
-n <normalization info>: normalization info; for log normalization type G with the base like G10 or Ge while for linear normalization type La:b where a:b is the range.
-h <header info>: header info; G: first column contains genes IDs; S: first row contains samples IDs; GS or SG for both.

Example:
java -jar EGPCpre.jar -f "DataFile/BrainPre.txt" -o BrainPro.txt -p 20:16000:100:3 -n Ge -h GS

Execution of EGPCcom.jar in CLI mode:
To run the programfrom command prompt, type:
   java [-Xmx<heapsize>] -jar EGPCcom.jar [arguments...]

Command line arguments and formats:
-u: UCIML format; default (if it is omitted) is Microarray format.
-d <data file>: data file name (with path if not on the current working directory); <datafile> must be provided.
-v <validation file>: validation file name (with path if not on the current working directory); if it is not provided, the training information must be provide under the
“–t” below.
-s <sample size>: number of samples; must be provided.
-a <attribute size>: number of attributes; must be provided.
-A <attribute info>: attribute information; default is that all attributes are numeric. Refer to Readme.pdf file for details.
-t <training info>: training subset information; the training information can be either the filename (with path if not on the current working directory) containing the
indexes of the training samples or the training size of each type of sample delimited by colon like 179:106.
-c <classes>: number of classes; default is 2.
-m <ensemble size>: ensemble size; default is 3.
-F <functions>: functions to be used; functions are delimited by colon (:) and the default functions are "+:-:/:*:sqr:sqrt". Note here that the functions’ string must be within double quotation (“ ”).
-p <population size>: population size; default is 1000.
-g <max gen>: maximum number of generations; default is 50.
-r <max run>: number of trials or repetition; default is 20.
 

Execution of EGPCgui.jar in GUI mode:

Go to the command prompt and type:
      java [-Xmx<heap size>] -jar EGPCgui.jar

Some screen shots of GUI of EGPC are given below.
 
Download: All Files (in WinZip format) | EGPCpre.jar | EGPCcom.jar | EGPCgui.jar | Example Data Files (WinZip file)| Description
Copyright@2006, IBA Laboratory. All rights reserved. Last update: July 30, 2007 03:09:54 PM (Tokyo time).