by Topon
Kumar Paul and Hitoshi Iba
[9^{th} MPS
symposium on Evolutionary Computation, IPSJ,
Flow Chart of EDA Tutorial on EDA  pdf version of EDA  Summary of different EDAs
Estimation
of Distribution Algorithms (EDAs) is a new area of Evolutionary Computation. In
EDAs there is neither crossover nor mutation operator. New population is
generated by sampling the probability distribution, which is estimated from a
database containing selected individuals of previous generation. Different approaches
have been proposed for the estimation of probability distribution. In this
paper I provide a review of different EDA approaches and present three
solutions based on Univariate Marginal Distribution Algorithm (UMDA) to
optimization in binary and nonbinary search spaces. Three wellknown problems:
Subset Sum problem, OneMax function and nQueen problem are solved and the
experimental results comparing UMDA with or without local heuristics and GAs
are given. From my experiments it seems that UMDA may perform better for linear
problems of without dependencies among variables.
Estimation
of Distribution Algorithm, 2opt local heuristic,
Genetic Algorithms (GAs) are
optimization techniques based on selection and recombination of promising
solutions. The collection of candidate solutions is called populations of
Genetic Algorithms whereas candidate solutions are sometimes named as Individuals,
Chromosomes etc. Each Individual is an encoded representation of variables of
the problems at hand. Each component (variable) in an Individual is termed as
Gene. Sometimes the components (genes) are independent of one another and
sometimes they correlated. But always a communication and information exchange
among individuals in a population is maintained through the selection and
recombination operator of Genetic Algorithm. This kind of exchange helps to
combine partial solutions (individuals) to generate high quality partial
solutionsbuilding blocks (BBs) (Holland 1975; Goldberg 1989). The
behavior of the GAs depends on the choice of the genetic operatorsselection,
crossover, mutation, probabilities of crossover and mutation, population size,
rate of generational reproduction, number of generations etc. But seldom the
problem specific interactions among the variables are considered. As a result,
the fixed two parents recombination and evolution sometimes provide inferior
quality of solution converging to a local optimum. To avoid the disruptions of
partial solutions, the two parents recombination processes can been replaced by
generating new solutions according to the probability distribution of all
promising solutions of the previous generation. This new approach is called
Estimation of Distribution Algorithm (EDA). EDAs were introduced in the field
of Evolutionary Computation for the first time by Mühlenbein and Paab (1996).
2. Estimation of Distribution Algorithm (EDA)
In EDAs the problem specific interactions among the variables of individuals are taken into consideration. In Evolutionary Computations the interactions are kept implicitly in mind whereas in EDAs the interrelations are expressed explicitly through the joint probability distribution associated with the individuals of variables selected at each generation. The probability distribution is calculated from a database of selected individuals of previous generation. The selections methods used in Genetic Algorithm may be used here. Then sampling this probability distribution generates offspring. Neither crossover nor mutation has been applied in EDAs. But the estimation of the joint probability distribution associated with the database containing the selected individuals is not an easy task. The following is a pseudocode for EDA approach:
Step 1: Generate
M individuals (the initial population) at random
Step 2: Repeat steps 35 for l=1,
2, … until the stopping criteria met
Step 3: Select N<=M individuals from D_{l1}
according to selection method
Step 4: Estimate
the probability distribution of an individual being among the selected
individuals
Step 5: Sample M individuals (the new population) from p_{l}(x)
3.
Different EDA approaches
The easiest way to calculate
the estimation of probability distribution is to consider all the variables in
a problem as univariate. Then the joint probability distribution becomes the
product of the marginal probabilities of n variables,
i.e.
_{}.
Univariate Marginal Distribution Algorithm (UMDA) (Mühlenbein, 1998), Population Based Incremental Learning (PBIL) (Baluja, 1994) and Compact Genetic Algorithm (cGA) (Harik et al., 1998) consider no interaction among variables. In UMDAs the joint probability distribution is factorized as a product of independent univariate marginal distribution, which is estimated from marginal frequencies:
_{}
_{} if in the jth case of _{}, X_{i}=x_{i} ; =0, otherwise.
In PBILs the population of
individuals is represented by a vector of probabilities: p_{l}(x)=(p_{l}(x_{1}),…,
p_{l}(x_{i}),…,p_{l}(x_{n})) where p_{l}(x_{i})
refers to the probability of obtaining a 1 in the ith component of D_{l}_{,
}the population of individuals in the l^{th} generation. At
each generation M individuals are generated by sampling p_{l}(x)
and the best N individuals are selected. The selected individuals are used to
update the probability vector by a Hebbian inspired rule:
_{} where aÎ(0,1] and _{} represents the value of x_{i}_{
}at k^{th} selected individual. The update rules
shifts the vector towards the best of generated individuals.
In CGAs the vector of
probabilities is initialized with probability of each variable 0.5. Then two
individuals are generated randomly by using this vector of probabilities and
rank them by evaluating. Then the probability vector p_{l}(x) is
updated towards the best one. This process of adaptation continues until the
vector of probabilities converges.
All the above mentioned algorithm provides better results for variable with no significant interaction among variables (Mühlenbein, 1998; Harik et al., 1998; Pelikan and Mühlenbein ,1999) but for higher order interaction it cannot provide better results.
To solve the problems of pairwise interaction among variables population based Mutual Information Maximizing Input Clustering (MIMIC) Algorithm (De Bonet et al., 1997), Combining Optimizers with Mutual Information Tress (COMIT) (Baluja and Davies, 1997), Bivariate Marginal Distribution Algorithm (BMDA) (Pelikan and Mühlenbein, 1999) were introduced. Where there is at most twoorder dependency among variables these provide better result that is far away from the real world where multiple interactions occur.
3.3 Multiple Dependencies
Factorized Distribution
Algorithm (FDA) (Mühlenbein et al.
1998), Extended Compact Genetic Algorithm (ECGA) (Harik, 1999), Bayesian
Optimization Algorithm(BOA) (Pelikan et al., 2000), Estimation of Bayesian
Network Algorithm (EBNA) (Larraňaga et al., 2000) can capture the multiple
dependencies among variables.
FDA uses a fixed
factorization of the distribution of the uniformly scaled additively decomposed
function to generate new individuals. It efficiently optimizes a class of
binary functions, which are too difficult for traditional GAs. FDA
incrementally computes Boltzmann distributions by using Boltzmann selection.
FDA converges in polynomial time if the search distribution can be formed so
that the number of parameters used is polynomially bounded in n (Mühlenbein
and Mahnig,2002). But the problem of FDA is the requirement of fixed
distribution and additively decomposed function.
BOA uses the techniques from modeling data by Bayesian Networks to estimate the joint probability distributions of selected individuals. Then new population is generated based on this estimation. It uses the BDe(Bayesian Dirichlet equivalence) metric to measure the goodness of each structure. This Bayesian metric has the property that the scores of two structures that reflect the same conditional dependency or independency are the same. In BOA prior information about the problem can be incorporated to enhance the estimation and better convergence (Pelikan et al., 2000). In order to reduce the cardinality of search space BOA imposes restriction on the number of parents a node may have. For the problems where a node may have more than 2 parents, the situation is complicated to solve.
In ECGA the factorization of
the joint probability is calculated as a product of marginal distribution of
variable size. These marginal distributions of variable size are related to the
variables that are contained in the same group and to the probability
distribution associated with them. The grouping is carried out by using a
greedy forward algorithm that obtains a partition among the n variables.
Each group of variables is assumed to be independent of the rest. So the
factorization of the joint probability on n variables is:
_{}
where C_{l} denotes the set of groups
in the l^{th} generation and p_{l}(x_{c}) represents
the marginal distribution of the variables X_{c} , that is the
variables that belong to the c^{th} group in the l^{th}
generation. ECGA uses model complexity and population complexity to measure the
quality of the marginal distribution. It can cover any number of interactions among variables but the
problem is, it does not consider conditional probabilities which is
insufficient for highly overlapping cases.
In
EBNA the joint probability distribution encoded by a Bayesian Network is learnt
from the database containing the selected individuals in each generation. The
factorization can be written as: _{} where _{}is the set
of parents of the variable X_{i}. Different algorithms can be
obtained by varying the structural search method. Two structural search methods
are usually considered: score+search and detecting conditional (in)
dependencies (EBNA_{PC}). Particularly two scores are used in the
score+search approarch: the Bayesian Information Criterion (BIC) score (EBNA_{BIC})
and the
4. Univariate Marginal Distribution Algorithm (UMDA)
In UMDA it is assumed that
is there is no interrelation among the variables of the problems. Hence the
ndimensional joint probability distribution is factorized as a product of n
univariate and independent probability distribution. That is:
_{}.
Each univariate marginal distribution is estimated from marginal frequencies: _{}
_{} if in the jth case of _{}, X_{i}=x_{i}
=0, otherwise.
The pseudocode for UMDA is as follows:
Step1:D_{0}¬Generate M individuals (the initial population) at
random
Step2: Repeat steps 35 for l=1,2,…
until stopping criteria is met
Step3: D_{l1}¬ Select N£M individuals from D_{l1}
according to selection method
Step4: Estimate the joint probability
distribution
_{}
Step5: D_{l}¬Sample M individuals (the
new population) from p_{l}(x)
González et al. has shown
that some instances with p_{l}(x)³d>0 visits populations of D^{*}
which contains global optimum infinitely with probability 1 and if
the selection is elitist, then UMDA may converge to a population that contains
the global optimum.
But
the joint probability distribution of UMDA can be zero for some x;
for example, when the selected individuals at the previous steps are such that _{} for all j=1, 2, …, N. Hence p_{l}(x)=0.
So UMDA sometimes may not visit a global optimum (González et al.).
To
overcome the problems the way of calculating the probabilities should be
changed. One possible solution is to apply
Here I have applied Univariate Marginal Distribution Algorithm to some wellknown problems like Subset Sum problem, nQueen Problem and OneMax function with some local heuristics.
It is the problem of finding what subset of a list of integers has a given sum. The subset sum is an integer relation problem where the relation coefficients are either 0 or 1.If there are n integers, the solution space is 2^{n} which is the number of subsets for n variables. For small n exact solutions can be found by backtracking method but for larger n state space tree grows exponentially. It is an NPComplete problem.
In UMDA each individual
(solution) is represented by an ntuple (x_{1},x_{2},x_{3},…x_{n)}_{
}such that x_{i}Î{0,1}, 1£i£n. Then x_{i}=0
if i^{th} integer is not selected and x_{i}=1 if
selected. Each individual is evaluated by finding the difference between the
expected sum and the sum of the selected integers in the individual. The
smaller difference is the better. Marginal probability distribution is
calculated from the best half of the solution with
Using same representation, initialization, evaluation and replacement strategy as UMDA I apply GA to the problem. I use simple one point crossover and mutation for generation of offspring. Parents have been selected randomly for crossover.
OneMax
function returns the number of ones in an input string, i.e.,_{}where X={X_{1},X2,…,X_{n}}
and X_{i}Î{0,1}
It is a unimodal function which has optimum in X_{Opt}={1,1,…,1}.It is a trivial function which is used as a test function for the evaluation of performance of Genetic Algorithms and
EDAs.
With
In my experiment I initialize population randomly, selected best half for calculation of probability distribution and elitism for replacement. The result is satisfactory.
nqueen problem is a classic combinatorial problem. The task is to place nqueen on an n´n chessboard so that no two queens attack each other; that is, no two queens are on the same row, column, or diagonal. Let us number the rows and columns of the chessboard 1 through N and so the queens. Since each queen must be placed on a different row, we can assume that queen i is to be placed on row i.
Let the solution be X={X_{1}, X_{2},…, X_{n}} where X_{i } represents the column position in row i where queen i can be placed. As all queens must be on different columns, all X_{i} s must be different. So the solution will be a permutation of the numbers 1 through n and the solution space is drastically reduced to n!.
We can easily fulfill the first two constraintsno two queens on the same row or column by allowing distinct value for each X_{i}. But how to test whether two queens at positions (i,j) and (k,l)[i=row,j=column] are on the same diagonal or not? We can test it by some observations:
Every element on the same diagonal that runs from upper left to lower right has the same row and column value [(i,i) form] or same (rowcolumn) value.
Every element on the same diagonal that goes from the upper right to the lower left has same row+column value
So two queens are on the same diagonal if ij=kl or i+j=k+l.
These
two equations can be rewritten as jl=ik and jl=ki.
That is, if abs(jl)=abs(ik) then the two queens are on the same diagonal.
So the pseudocode for the testing of the constraints for the queens at position X_{i} and X_{j} is:
If ((X_{i}=X_{j}) OR (abs(ij)=abs(X_{i}X_{j})) then
return NOT_FEASIBLE;
But this algorithm requires (n(n1)/2) comparisons to calculate the fitness of an individual.
With UMDA the individual in a population is represented as {X_{1},X2,…,X_{n}} where each X_{i} represents the column at row i where i^{th} queen is placed. The fitness of individual is calculated as the number of queens at nonattacking positions.
The initial population of size M is generated randomly with constraints that all values in an individual are distinct numbers of the set {1, 2,…,n}.By doing this I have implicitly satisfied the constraints that no two queens are on the same row or column. Then, fitness calculation is just to check whether two queens are on the same diagonal or not.
In
each generation the best half of the individuals of the previous generation are
selected for the calculation of joint probability distribution using marginal
frequency of each X_{i}. During calculation of marginal
distribution of each variable
Then
M new individuals are generated. During generation of each individual I
have applied probabilistic modification to enforce the first constraint
of nQueen. In probabilistic modification when some variables are selected
their probabilities for the next turn have been zero and the probability of non
selected variables are increased proportionately. Consider the following
example:
Before Selection of X_{1} 

After Selection of X_{1} 

Position/ Variable 
1 
2 
3 

1 
2 
3 
X_{1} 
0.7 
0.1 
0.5 

0.7 
0 
0 
X_{2} 
0.1 
0.6 
0.1 

0.1 
0.65 
0.35 
X_{3} 
0.2 
0.3 
0.4 

01. 
0.35 
0.65 
Table 1: Probabilistic modification example
If X_{1} is selected for first position then the probability of X_{1} for 2^{nd} and 3^{rd} position should be zero and the probabilities of X_{2} and X_{3} will increase proportionally. The temporary table should look like the one at the right above. By doing this we can ensure that distinct value will be generated for each component of an individual if Roulette Wheel selection is used.
For replacement strategy I have use elitism The algorithm stops when fitness of the best individual is n; that is, all queens are at nonattacking position or certain number of generations have passed.
Fitness improvement heuristics can be applied to the problem and performs much better than blind search using UMDA. For combinatorial optimization 2opt (Croes, 1992) algorithm or Partially Matched Crossover (PMX) (Goldberg et al., 1987) is widely used. I have used 2opt algorithm. The resulting algorithm is:
Figure
1. Algorithm for solution of nQueen problem by UMDA with 2opt local heuristic
5.3.2 nQueen problem by Genetic Algorithm
Using same representation, initial population and fitness function I apply Genetic Algorithm with Partially Matched Crossover and Swap Mutation. Replacement strategy is elitism. The genetic algorithm for the problem is:
Step
1: Create the initial population of size M randomly with the constraint
that all elements in an individual are distinct and evaluate each Step 2: Repeat Step 38 until (the fitness of
the best individual=n) Step 3: Randomly select two parents Step 4: Apply Partial Matching Crossover with
crossover probability p_{c} Step 5: Mutate each offspring with mutation
probability m_{c} Step 6: Go to Step 3 until no. of offspring<M^{/}
(M^{/}<M) Step 7: Evaluate M^{/} offspring Step 8: Generate new population applying elitism
Figure 2: Algorithm for solution of nQueen
problem by GA
In this section I have presented some experimental results obtained by applying UMDA to Subset Sum problem, OneMax function and nQueen Problem. I have run the programs on a computer with 902 MHZ AMD Athlon™ Processor and 512 MB of RAM and in Borland C++ builder 6.0 environment.
Here I have generated positive integers randomly. Then expected sum has been generated by randomly selecting those generated integers so that there is always a solution. For trivial case I have chosen expected sum equal to the sum of all generated integers. In this case the solution is {1,1,1,1,…,1}. The parameters for the problems are:
Total Run=50; Population Size=1000; Crossover Rate=0.7, Mutation rate=0.1; Elite=10% and Truncation Selection for UMDA.
The
results are shown below:
Figure 3. Average no. of generations required for the trivial solution
(expected sum=sum of all integers) of Subset Sum Problem
Figure 4. Average time (sec) required for the trivial solution (expected
sum=sum of all integers) of Subset Sum Problem
Figure 5. Average no. of generations required for the random sum (expected
sum<sum of all integers) of Subset Sum Problem
Figure 6. Average time (sec) required for random sum (expected sum<sum of
all integers) of Subset Sum Problem
OneMax function is a trivial one. It has only one solution of all ones. For this function I choose the following parameters:
Total run=10; Population size=10* problem size; Crossover rate=0.7; Mutation rate=0.1; Elite=10%; Truncation selection for UMDA.
I get the following results:
Figure 7. Average time (sec) required until solution for OneMaX function
Figure 8. Average number of generations required for OneMax function
6.3 nQueen Problem
For nQueen problem the
solution space is n! and some of them are feasible solution. For n=1,2,3
no solutions exist. The following table gives an idea about solution for
different n:
No. of Queens 
Total Feasible Solutions 
No. of Queens 
Total Feasible Solutions 
4 
2 
10 
724 
5 
10 
11 
2680 
6 
4 
12 
14200 
7 
40 
13 
73712 
8 
92 
14 
365596 
9 
352 
15 
2279184 
Table 2: No. of feasible
solutions for nQueen problem
For small n one
possible solution can be found quickly by using Backtracking Method. But for
larger n backtracking may not be right choice as it may consume a lot of
memory for recursive function, neither it would be possible to try all
permutations in solution space. So EDA can be a choice for the problem to get
solution in a reasonable time limit. I apply the simplest one UMDA with 2opt
algorithm as local heuristics with the following parameters:
Total Run=50; Population Size=10*Variable Size;
Crossover Rate=0.7; Mutation Rate=0.1; Elitism=10% and the results are:
Figure 9. Average
time required for solution of nQueen Problem
7.
Discussion
From the experimental results we find that for medium size of problem UMDA provide better results in respect of generation when the expected sum is the sum of a proper set of the set of integers. But calculation of probability distribution takes time. For trivial solution (when the final solution consists of all integer) UMDA outperforms GA in both respects. This is due to the fact that the variables in the problem are less interdependent.
7.2 OneMax Function
The OneMax function is a trivial one and the variables are independent of one another. So UMDA provides better results than those of GA.
7.3 nQueen problem
From the experimental results we see that UMDA with 2opt as local heuristics produces solution slower than two other methods, while GA with Partially Matched Crossover produces result very fast for higher variable size. This is due to that fact that the variables, which represent the positions of the queens in a checkerboard, are highly corelated. As I have said, UMDA cannot capture the interdependencies among the variables of a problem. For highly correlated problems we may get the dependencies among the variables by Bayesian networks or other methods, which is my future work. But the number of generations required until solution is relatively low compared to GA with PMX. This is due to fact that the algorithms spend much time in 2opt algorithm, which searches each individual to determine whether there is a legal move possible or not.
In this paper I have discussed Estimation of Distribution Algorithm. It seems reasonable to apply EDA in place of GA. But the estimation of the joint probability distribution associated with the database containing the selected individuals is a bottleneck of this new heuristic. There is no easy method to calculate it. If the distribution is more general we get better result, but calculation of this distribution is time consuming and complicated and sampling of new instances using this distribution is not an easy task.
Due to simplicity I have applied UMDA. It provides better results for linear problem but when there is no dependency among variables. NQueen problem has positional dependencies among variables. Bayesian network may be a possible structurelearning algorithm for estimation of their probability distributions, which is my future work.
9. Future
Works
q
Estimation of Probability distribution by applying Factorized
Distribution Algorithm and Bayesian Optimization Algorithm
q
Applying EDAs to other optimization problems
[1] Baluja, S. (1994). Population based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical Report No. CMUCS94163, Carnegie Mellon University, Pittsburgh, Pennsylvania.
[2] Baluja, S. and Davies, S.(1997). Using optimal dependency trees for combinatorial optimization: Learning the structure of search space. Technical Report CMUCS97107, Carnegie Mellon University, Pittsburgh, Pennsylvania.
[3] Baluja,S. and Caruana,R. (1995): Removing the genetics from standard genetic algorithm. In A. Priedits and S. Russell, editors, Proceedings of the International Conference on Machine Learning, Morgan Kaufman, 3846.
[4] Cestnik,B.(1990).Estimating probabilities: A crucial task in machine learning .Proceedings of the European Conference in Artificial Intelligence, 147149.
[5] Cooper,G.F. and Herskovits, E.A.(1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9: 309347.
[6] Croes,G.A.(1992). A method for solving traveling salesman problem. Operations Research, 6:791812.
[7] De Bonet, J.S., Isbell,C.L. and Viola, P.(1997). MIMIC: Finding Optima by estimating probability densities. Advances in Neural Information Processing Systems, Vol. 9
[8] González, C., Lozano,J.A. and Larraňaga,P. Mathematical modeling of discrete estimation of distribution algorithms. In P. Larraňaga and J.A. Lozano,editors, Estimation of Distribution Algorithms: A New Tool for Evolutionary Optimization. Kluwer Academic Publishers, Boston, 2001.
[9] Harik, G.(1999).Linkage learning via probabilistic modeling in the ECGA. Illigal Report No. 99010,Illinois Genetic Algorithm Laboratory, University of Illinois, Urbana, Illinois.
[10] Harik,G. R., Lobo, F.G. and Goldberg, D.E.(1998).The compact genetic algorithm. In Proceedings of the IEEE Conference on Evolutionary Computation, 523528
[11]
Holland, J.H.(1975). Adaptation in Natural and Artificial Systems.
University of Michigan Press, Ann Arbor, Michigan.
[12] Horowitz,E., Sahni ,S.and Rajasekaran,S.(2000).Fundamentals of Computer Algorithms. Galgotia Publications pvt.ltd, New Delhi,2000.
[13] Goldberg, D.E.(1989). Genetic Algorithms in search, optimization and machine learning. AddisonWesley, Reading, Massachusetts.
[14] Goldberg, D.E. and Lingle R. (1987). Alleles, loci, and the traveling salesman problem. In John J. Gerenstette, editor, Proceedings of the International Conference on Genetic Algorithms and their Applications, Morgan Kaufmann Pulishers,Inc. 1987.
[15] Larraňaga,P. and Lozano,J.A.(2001). Estimation of Distribution Algorithms: A New Tool for Evolutionary Optimization. Kluwer Academic Publishers, Boston, 2001.
[16] Larraňaga,P., Etxeberria, R., Lozano,J.A. and Peňa, J.M.(2000). Combinatorial Optimization by learning and simulation of Bayesian networks. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, Stanford, 343352
[17] Mühlenbein, H. (1998). The equation for response to selection and its use for prediction. Evolutionary Computation, 5(3): 303346.
[18] Mühlenbein,H. and Mahnig,T.(1999). The Factorized Distribution Algorithm for additively decomposed functions. Proceedings of the 1999 Congress on Evolutionary Computation, IEEE press,752759.
[19]
Mühlenbein, H. and Mahnig, T. (2002). Evolutionary Optimization and the
Estimation of Search Distributions with Applications to Graph Bipartitioning.
[20] Mühlenbein, H. and Paaß,G. (1996).From recombination of genes to the estimation of distributions I. Binary parameters. In Lecture Notes in Computer Science 1411: Parallel Problem Solving from NaturePPSN IV,178187.
[21]
Pelikan, M.,Goldberg,D.E. and Cantúpaz, E.(2000). LinkageProblem,
Distribution Estimation and Bayesian Networks. Evolutionary Computation
8(3):311340.
[22] Pelikan,M. and Mühlenbein, H.(1999).The bivariate marginal distribution algorithm. Advances in Soft ComputingEngineering Design and Manufacturing, 521535.