Abstract

Estimation of Distribution Algorithms (EDAs) is a new area of Evolutionary Computation. In EDAs there is neither crossover nor mutation operator. New population is generated by sampling the probability distribution, which is estimated from a database containing selected individuals of previous generation. Different approaches have been proposed for the estimation of probability distribution. In this paper I provide a review of different EDA approaches and present three solutions based on Univariate Marginal Distribution Algorithm (UMDA) to optimization in binary and non-binary search spaces. Three well-known problems: Subset Sum problem, OneMax function and n-Queen problem are solved and the experimental results comparing UMDA with or without local heuristics and GAs are given. From my experiments it seems that UMDA may perform better for linear problems of without dependencies among variables.

Keywords

Estimation of Distribution Algorithm, 2-opt local heuristic, Laplace corrections, n-Queen problem, Partially Matched Crossover, Subset Sum problem

1. Introduction

Genetic Algorithms (GAs) are optimization techniques based on selection and recombination of promising solutions. The collection of candidate solutions is called populations of Genetic Algorithms whereas candidate solutions are sometimes named as Individuals, Chromosomes etc. Each Individual is an encoded representation of variables of the problems at hand. Each component (variable) in an Individual is termed as Gene. Sometimes the components (genes) are independent of one another and sometimes they correlated. But always a communication and information exchange among individuals in a population is maintained through the selection and recombination operator of Genetic Algorithm. This kind of exchange helps to combine partial solutions (individuals) to generate high quality partial solutions-building blocks (BBs) (Holland 1975; Goldberg 1989). The behavior of the GAs depends on the choice of the genetic operators-selection, crossover, mutation, probabilities of crossover and mutation, population size, rate of generational reproduction, number of generations etc. But seldom the problem specific interactions among the variables are considered. As a result, the fixed two parents recombination and evolution sometimes provide inferior quality of solution converging to a local optimum. To avoid the disruptions of partial solutions, the two parents recombination processes can been replaced by generating new solutions according to the probability distribution of all promising solutions of the previous generation. This new approach is called Estimation of Distribution Algorithm (EDA). EDAs were introduced in the field of Evolutionary Computation for the first time by Mühlenbein and Paab (1996).

2. Estimation of Distribution Algorithm (EDA)

In EDAs the problem specific interactions among the variables of individuals are taken into consideration. In Evolutionary Computations the interactions are kept implicitly in mind whereas in EDAs the interrelations are expressed explicitly through the joint probability distribution associated with the individuals of variables selected at each generation. The probability distribution is calculated from a database of selected individuals of previous generation. The selections methods used in Genetic Algorithm may be used here. Then sampling this probability distribution generates offspring. Neither crossover nor mutation has been applied in EDAs. But the estimation of the joint probability distribution associated with the database containing the selected individuals is not an easy task. The following is a pseudocode for EDA approach:

Step 1: Generate M individuals (the initial population) at random

Step 2: Repeat steps 3-5 for l=1, 2, … until the stopping criteria met

Step 3: Select N<=M individuals from D_l-1 according to selection method

Step 4: Estimate the probability distribution of an individual being among the selected individuals

Step 5: Sample M individuals (the new population) from p_l(x)

3. Different EDA approaches

3.1 Independent Variables

The easiest way to calculate the estimation of probability distribution is to consider all the variables in a problem as univariate. Then the joint probability distribution becomes the product of the marginal probabilities of n variables,

i.e. .

Univariate Marginal Distribution Algorithm (UMDA) (Mühlenbein, 1998), Population Based Incremental Learning (PBIL) (Baluja, 1994) and Compact Genetic Algorithm (cGA) (Harik et al., 1998) consider no interaction among variables. In UMDAs the joint probability distribution is factorized as a product of independent univariate marginal distribution, which is estimated from marginal frequencies:

if in the jth case of , X_i=x_i ; =0, otherwise.

In PBILs the population of individuals is represented by a vector of probabilities: p_l(x)=(p_l(x₁),…, p_l(x_i),…,p_l(x_n)) where p_l(x_i) refers to the probability of obtaining a 1 in the ith component of D_l_,the population of individuals in the l^th generation. At each generation M individuals are generated by sampling p_l(x) and the best N individuals are selected. The selected individuals are used to update the probability vector by a Hebbian inspired rule:

where aÎ(0,1] and represents the value of x_iat k^th selected individual. The update rules shifts the vector towards the best of generated individuals.

In CGAs the vector of probabilities is initialized with probability of each variable 0.5. Then two individuals are generated randomly by using this vector of probabilities and rank them by evaluating. Then the probability vector p_l(x) is updated towards the best one. This process of adaptation continues until the vector of probabilities converges.

All the above mentioned algorithm provides better results for variable with no significant interaction among variables (Mühlenbein, 1998; Harik et al., 1998; Pelikan and Mühlenbein ,1999) but for higher order interaction it cannot provide better results.

3.2 Bivariate Dependencies

To solve the problems of pairwise interaction among variables population based Mutual Information Maximizing Input Clustering (MIMIC) Algorithm (De Bonet et al., 1997), Combining Optimizers with Mutual Information Tress (COMIT) (Baluja and Davies, 1997), Bivariate Marginal Distribution Algorithm (BMDA) (Pelikan and Mühlenbein, 1999) were introduced. Where there is at most two-order dependency among variables these provide better result that is far away from the real world where multiple interactions occur.

3.3 Multiple Dependencies

Factorized Distribution Algorithm (FDA) (Mühlenbein et al. 1998), Extended Compact Genetic Algorithm (ECGA) (Harik, 1999), Bayesian Optimization Algorithm(BOA) (Pelikan et al., 2000), Estimation of Bayesian Network Algorithm (EBNA) (Larraňaga et al., 2000) can capture the multiple dependencies among variables.

FDA uses a fixed factorization of the distribution of the uniformly scaled additively decomposed function to generate new individuals. It efficiently optimizes a class of binary functions, which are too difficult for traditional GAs. FDA incrementally computes Boltzmann distributions by using Boltzmann selection. FDA converges in polynomial time if the search distribution can be formed so that the number of parameters used is polynomially bounded in n (Mühlenbein and Mahnig,2002). But the problem of FDA is the requirement of fixed distribution and additively decomposed function.

BOA uses the techniques from modeling data by Bayesian Networks to estimate the joint probability distributions of selected individuals. Then new population is generated based on this estimation. It uses the BDe(Bayesian Dirichlet equivalence) metric to measure the goodness of each structure. This Bayesian metric has the property that the scores of two structures that reflect the same conditional dependency or independency are the same. In BOA prior information about the problem can be incorporated to enhance the estimation and better convergence (Pelikan et al., 2000). In order to reduce the cardinality of search space BOA imposes restriction on the number of parents a node may have. For the problems where a node may have more than 2 parents, the situation is complicated to solve.

In ECGA the factorization of the joint probability is calculated as a product of marginal distribution of variable size. These marginal distributions of variable size are related to the variables that are contained in the same group and to the probability distribution associated with them. The grouping is carried out by using a greedy forward algorithm that obtains a partition among the n variables. Each group of variables is assumed to be independent of the rest. So the factorization of the joint probability on n variables is:

where C_l denotes the set of groups in the l^th generation and p_l(x_c) represents the marginal distribution of the variables X_c , that is the variables that belong to the c^th group in the l^th generation. ECGA uses model complexity and population complexity to measure the quality of the marginal distribution. It can cover any number of interactions among variables but the problem is, it does not consider conditional probabilities which is insufficient for highly overlapping cases.

In EBNA the joint probability distribution encoded by a Bayesian Network is learnt from the database containing the selected individuals in each generation. The factorization can be written as: where is the set of parents of the variable X_i. Different algorithms can be obtained by varying the structural search method. Two structural search methods are usually considered: score+search and detecting conditional (in) dependencies (EBNA_PC). Particularly two scores are used in the score+search approarch: the Bayesian Information Criterion (BIC) score (EBNA_BIC) and the K2+penalization score (EBNA_k2+pen). In each case the convergence is only affected by the calculus of the parameters q_ijk , where q_ijk represents the conditional probability of variable X_i being in its k^th value while the set of its parent at j^th value. The parameter of the local probability distribution can be calculated for every generation using either: their expected values as obtained by K2 score (Cooper and Herskovits, 1992) for their score: or by maximum likelihood estimates: where N_ijk denotes the number of cases in the selected individuals in which the variable X_i takes the k^th value and its parents Pa_i take j^th value and .For the case of expected values when the selection is elitist EBNAs converge to a population that contains the global optimum whereas for maximum likelihood case it is not guaranteed( González et al.).

4. Univariate Marginal Distribution Algorithm (UMDA)

In UMDA it is assumed that is there is no interrelation among the variables of the problems. Hence the n-dimensional joint probability distribution is factorized as a product of n univariate and independent probability distribution. That is:

Each univariate marginal distribution is estimated from marginal frequencies:

if in the jth case of , X_i=x_i

=0, otherwise.

The pseudocode for UMDA is as follows:

Step-1:D₀¬Generate M individuals (the initial population) at random

Step-2: Repeat steps 3-5 for l=1,2,… until stopping criteria is met

Step-3: D_l-1¬ Select N£M individuals from D_l-1 according to selection method

Step-4: Estimate the joint probability distribution

Step-5: D_l¬Sample M individuals (the new population) from p_l(x)

4.1 Convergence of UMDA by Laplace correction

González et al. has shown that some instances with p_l(x)³d>0 visits populations of D^* which contains global optimum infinitely with probability 1 and if the selection is elitist, then UMDA may converge to a population that contains the global optimum.

But the joint probability distribution of UMDA can be zero for some x; for example, when the selected individuals at the previous steps are such that for all j=1, 2, …, N. Hence p_l(x)=0. So UMDA sometimes may not visit a global optimum (González et al.).

To overcome the problems the way of calculating the probabilities should be changed. One possible solution is to apply Laplace correction (Cestnik, 1990). Now where r_i is the number of different values that variable X_i may take.

5. Some Solutions by UMDA

Here I have applied Univariate Marginal Distribution Algorithm to some well-known problems like Subset Sum problem, n-Queen Problem and OneMax function with some local heuristics.

5.1 Subset Sum Problem

It is the problem of finding what subset of a list of integers has a given sum. The subset sum is an integer relation problem where the relation coefficients are either 0 or 1.If there are n integers, the solution space is 2ⁿ which is the number of subsets for n variables. For small n exact solutions can be found by backtracking method but for larger n state space tree grows exponentially. It is an NP-Complete problem.

5.1.1 Solving Subset Sum Problem with UMDA

In UMDA each individual (solution) is represented by an n-tuple (x₁,x₂,x₃,…x_n)such that x_iÎ{0,1}, 1£i£n. Then x_i=0 if i^th integer is not selected and x_i=1 if selected. Each individual is evaluated by finding the difference between the expected sum and the sum of the selected integers in the individual. The smaller difference is the better. Marginal probability distribution is calculated from the best half of the solution with Laplace correction and new individuals are generated by sampling this. During replacement I have used elitism. The algorithm terminates when the evaluation of the best individual of a population is zero.

5.1.2 Genetic Algorithm for Subset Sum Problem

Using same representation, initialization, evaluation and replacement strategy as UMDA I apply GA to the problem. I use simple one point crossover and mutation for generation of offspring. Parents have been selected randomly for crossover.

5.2 OneMax by UMDA

OneMax function returns the number of ones in an input string, i.e.,where X={X₁,X2,…,X_n} and X_iÎ{0,1}

It is a unimodal function which has optimum in X_Opt={1,1,…,1}.It is a trivial function which is used as a test function for the evaluation of performance of Genetic Algorithms and

EDAs.

With Laplace correction it is guaranteed that UMDA will find optimum value within a small number of generations.

In my experiment I initialize population randomly, selected best half for calculation of probability distribution and elitism for replacement. The result is satisfactory.

5.3 n-Queen Problem

n-queen problem is a classic combinatorial problem. The task is to place n-queen on an n´n chessboard so that no two queens attack each other; that is, no two queens are on the same row, column, or diagonal. Let us number the rows and columns of the chessboard 1 through N and so the queens. Since each queen must be placed on a different row, we can assume that queen i is to be placed on row i.

Let the solution be X={X₁, X₂,…, X_n} where X_i represents the column position in row i where queen i can be placed. As all queens must be on different columns, all X_i s must be different. So the solution will be a permutation of the numbers 1 through n and the solution space is drastically reduced to n!.

We can easily fulfill the first two constraints-no two queens on the same row or column by allowing distinct value for each X_i. But how to test whether two queens at positions (i,j) and (k,l)[i=row,j=column] are on the same diagonal or not? We can test it by some observations:

Every element on the same diagonal that runs from upper left to lower right has the same row and column value [(i,i) form] or same (row-column) value.

Every element on the same diagonal that goes from the upper right to the lower left has same row+column value

So two queens are on the same diagonal if i-j=k-l or i+j=k+l.

These two equations can be rewritten as j-l=i-k and j-l=k-i.

That is, if abs(j-l)=abs(i-k) then the two queens are on the same diagonal.

So the pseudocode for the testing of the constraints for the queens at position X_i and X_j is:

If ((X_i=X_j) OR (abs(i-j)=abs(X_i-X_j)) then

return NOT_FEASIBLE;

But this algorithm requires (n(n-1)/2) comparisons to calculate the fitness of an individual.

5.3.1 Solving n-Queen using UMDA

With UMDA the individual in a population is represented as {X₁,X2,…,X_n} where each X_i represents the column at row i where i^th queen is placed. The fitness of individual is calculated as the number of queens at non-attacking positions.

The initial population of size M is generated randomly with constraints that all values in an individual are distinct numbers of the set {1, 2,…,n}.By doing this I have implicitly satisfied the constraints that no two queens are on the same row or column. Then, fitness calculation is just to check whether two queens are on the same diagonal or not.

In each generation the best half of the individuals of the previous generation are selected for the calculation of joint probability distribution using marginal frequency of each X_i. During calculation of marginal distribution of each variable Laplace correction has been used. By applying Laplace correction I have ensured that the probability of any variable will be greater than zero and hence increase the joint probability distribution.

Then M new individuals are generated. During generation of each individual I have applied probabilistic modification to enforce the first constraint of n-Queen. In probabilistic modification when some variables are selected their probabilities for the next turn have been zero and the probability of non selected variables are increased proportionately. Consider the following example:

Before Selection of X₁				After Selection of X₁
Position/ Variable	1	2	3	1	2	3
X₁	0.7	0.1	0.5	0.7	0	0
X₂	0.1	0.6	0.1	0.1	0.65	0.35
X₃	0.2	0.3	0.4	01.	0.35	0.65

Table 1: Probabilistic modification example

If X₁ is selected for first position then the probability of X₁ for 2^nd and 3^rd position should be zero and the probabilities of X₂ and X₃ will increase proportionally. The temporary table should look like the one at the right above. By doing this we can ensure that distinct value will be generated for each component of an individual if Roulette Wheel selection is used.

For replacement strategy I have use elitism The algorithm stops when fitness of the best individual is n; that is, all queens are at non-attacking position or certain number of generations have passed.

Fitness improvement heuristics can be applied to the problem and performs much better than blind search using UMDA. For combinatorial optimization 2-opt (Croes, 1992) algorithm or Partially Matched Crossover (PMX) (Goldberg et al., 1987) is widely used. I have used 2-opt algorithm. The resulting algorithm is:

Figure 1. Algorithm for solution of n-Queen problem by UMDA with 2-opt local heuristic

6. Experimental Results

In this section I have presented some experimental results obtained by applying UMDA to Subset Sum problem, OneMax function and n-Queen Problem. I have run the programs on a computer with 902 MHZ AMD Athlon™ Processor and 512 MB of RAM and in Borland C++ builder 6.0 environment.

6.1 Subset Sum Problem

Here I have generated positive integers randomly. Then expected sum has been generated by randomly selecting those generated integers so that there is always a solution. For trivial case I have chosen expected sum equal to the sum of all generated integers. In this case the solution is {1,1,1,1,…,1}. The parameters for the problems are:

Total Run=50; Population Size=1000; Crossover Rate=0.7, Mutation rate=0.1; Elite=10% and Truncation Selection for UMDA.

The results are shown below:

Figure 3. Average no. of generations required for the trivial solution (expected sum=sum of all integers) of Subset Sum Problem

Figure 4. Average time (sec) required for the trivial solution (expected sum=sum of all integers) of Subset Sum Problem

Figure 5. Average no. of generations required for the random sum (expected sum<sum of all integers) of Subset Sum Problem

Figure 6. Average time (sec) required for random sum (expected sum<sum of all integers) of Subset Sum Problem

6.2 OneMax Function

OneMax function is a trivial one. It has only one solution of all ones. For this function I choose the following parameters:

Total run=10; Population size=10* problem size; Crossover rate=0.7; Mutation rate=0.1; Elite=10%; Truncation selection for UMDA.

I get the following results:

Figure 7. Average time (sec) required until solution for OneMaX function

Figure 8. Average number of generations required for OneMax function

6.3 n-Queen Problem

For n-Queen problem the solution space is n! and some of them are feasible solution. For n=1,2,3 no solutions exist. The following table gives an idea about solution for different n:

No. of Queens	Total Feasible Solutions	No. of Queens	Total Feasible Solutions
4	2	10	724
5	10	11	2680
6	4	12	14200
7	40	13	73712
8	92	14	365596
9	352	15	2279184

Table 2: No. of feasible solutions for n-Queen problem

For small n one possible solution can be found quickly by using Backtracking Method. But for larger n backtracking may not be right choice as it may consume a lot of memory for recursive function, neither it would be possible to try all permutations in solution space. So EDA can be a choice for the problem to get solution in a reasonable time limit. I apply the simplest one UMDA with 2-opt algorithm as local heuristics with the following parameters:

Total Run=50; Population Size=10*Variable Size; Crossover Rate=0.7; Mutation Rate=0.1; Elitism=10% and the results are:

Figure 9. Average time required for solution of n-Queen Problem

7. Discussion

7.1 Subset Sum problem

From the experimental results we find that for medium size of problem UMDA provide better results in respect of generation when the expected sum is the sum of a proper set of the set of integers. But calculation of probability distribution takes time. For trivial solution (when the final solution consists of all integer) UMDA outperforms GA in both respects. This is due to the fact that the variables in the problem are less interdependent.

7.2 OneMax Function

The OneMax function is a trivial one and the variables are independent of one another. So UMDA provides better results than those of GA.

7.3 n-Queen problem

From the experimental results we see that UMDA with 2-opt as local heuristics produces solution slower than two other methods, while GA with Partially Matched Crossover produces result very fast for higher variable size. This is due to that fact that the variables, which represent the positions of the queens in a checkerboard, are highly co-related. As I have said, UMDA cannot capture the interdependencies among the variables of a problem. For highly correlated problems we may get the dependencies among the variables by Bayesian networks or other methods, which is my future work. But the number of generations required until solution is relatively low compared to GA with PMX. This is due to fact that the algorithms spend much time in 2-opt algorithm, which searches each individual to determine whether there is a legal move possible or not.

8. Conclusion

In this paper I have discussed Estimation of Distribution Algorithm. It seems reasonable to apply EDA in place of GA. But the estimation of the joint probability distribution associated with the database containing the selected individuals is a bottleneck of this new heuristic. There is no easy method to calculate it. If the distribution is more general we get better result, but calculation of this distribution is time consuming and complicated and sampling of new instances using this distribution is not an easy task.

Due to simplicity I have applied UMDA. It provides better results for linear problem but when there is no dependency among variables. N-Queen problem has positional dependencies among variables. Bayesian network may be a possible structure-learning algorithm for estimation of their probability distributions, which is my future work.

9. Future Works

q Estimation of Probability distribution by applying Factorized Distribution Algorithm and Bayesian Optimization Algorithm

q Applying EDAs to other optimization problems

10. References

[1] Baluja, S. (1994). Population based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical Report No. CMU-CS-94-163, Carnegie Mellon University, Pittsburgh, Pennsylvania.

[2] Baluja, S. and Davies, S.(1997). Using optimal dependency trees for combinatorial optimization: Learning the structure of search space. Technical Report CMU-CS-97-107, Carnegie Mellon University, Pittsburgh, Pennsylvania.

[3] Baluja,S. and Caruana,R. (1995): Removing the genetics from standard genetic algorithm. In A. Priedits and S. Russell, editors, Proceedings of the International Conference on Machine Learning, Morgan Kaufman, 38-46.

[4] Cestnik,B.(1990).Estimating probabilities: A crucial task in machine learning .Proceedings of the European Conference in Artificial Intelligence, 147-149.

[5] Cooper,G.F. and Herskovits, E.A.(1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9: 309-347.

[6] Croes,G.A.(1992). A method for solving traveling salesman problem. Operations Research, 6:791-812.

[7] De Bonet, J.S., Isbell,C.L. and Viola, P.(1997). MIMIC: Finding Optima by estimating probability densities. Advances in Neural Information Processing Systems, Vol. 9

[8] González, C., Lozano,J.A. and Larraňaga,P. Mathematical modeling of discrete estimation of distribution algorithms. In P. Larraňaga and J.A. Lozano,editors, Estimation of Distribution Algorithms: A New Tool for Evolutionary Optimization. Kluwer Academic Publishers, Boston, 2001.

[9] Harik, G.(1999).Linkage learning via probabilistic modeling in the ECGA. Illigal Report No. 99010,Illinois Genetic Algorithm Laboratory, University of Illinois, Urbana, Illinois.

[10] Harik,G. R., Lobo, F.G. and Goldberg, D.E.(1998).The compact genetic algorithm. In Proceedings of the IEEE Conference on Evolutionary Computation, 523-528

[11] Holland, J.H.(1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, Michigan.

[12] Horowitz,E., Sahni ,S.and Rajasekaran,S.(2000).Fundamentals of Computer Algorithms. Galgotia Publications pvt.ltd, New Delhi,2000.

[13] Goldberg, D.E.(1989). Genetic Algorithms in search, optimization and machine learning. Addison-Wesley, Reading, Massachusetts.

[14] Goldberg, D.E. and Lingle R. (1987). Alleles, loci, and the traveling salesman problem. In John J. Gerenstette, editor, Proceedings of the International Conference on Genetic Algorithms and their Applications, Morgan Kaufmann Pulishers,Inc. 1987.

[15] Larraňaga,P. and Lozano,J.A.(2001). Estimation of Distribution Algorithms: A New Tool for Evolutionary Optimization. Kluwer Academic Publishers, Boston, 2001.

[16] Larraňaga,P., Etxeberria, R., Lozano,J.A. and Peňa, J.M.(2000). Combinatorial Optimization by learning and simulation of Bayesian networks. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, Stanford, 343-352

[17] Mühlenbein, H. (1998). The equation for response to selection and its use for prediction. Evolutionary Computation, 5(3): 303-346.

[18] Mühlenbein,H. and Mahnig,T.(1999). The Factorized Distribution Algorithm for additively decomposed functions. Proceedings of the 1999 Congress on Evolutionary Computation, IEEE press,752-759.

[19] Mühlenbein, H. and Mahnig, T. (2002). Evolutionary Optimization and the Estimation of Search Distributions with Applications to Graph Bipartitioning.

[20] Mühlenbein, H. and Paaß,G. (1996).From recombination of genes to the estimation of distributions I. Binary parameters. In Lecture Notes in Computer Science 1411: Parallel Problem Solving from Nature-PPSN IV,178-187.

[21] Pelikan, M.,Goldberg,D.E. and Cantú-paz, E.(2000). LinkageProblem, Distribution Estimation and Bayesian Networks. Evolutionary Computation 8(3):311-340.

[22] Pelikan,M. and Mühlenbein, H.(1999).The bivariate marginal distribution algorithm. Advances in Soft Computing-Engineering Design and Manufacturing, 521-535.

Estimation of Distribution Algorithm (EDA)/Probabilistic Model Building GA (PMBGA)

Home Online Solution Software Tutorial Software Download Genetic Algorithm

Linear and Combinatorial Optimizations by Estimation of Distribution Algorithms