List of publications


 FDA (Factorized Distribution Algorithm)

I did my Ph.D.-thesis on this subject (in German)

 BibTeX-Entries to my publications
 

All these papers were written together with Heinz Mühlenbein

2002

Mathematical Analysis of Evolutionary Algorithms, Heinz Mühlenbein, Thilo Mahnig
Essays and Surveys in Metaheuristics, Kluwer Academic Publishers, 2002

We analyze evolutionary algorithms which use a population of search points at each step. A popular example are genetic algorithms. We approximate genetic algorithms by an algorithm using Univariate Marginal Distributions (UMDA). UMDA generates the search points from a probability distribution. We derive difference equations for the marginal distributions which completely describe the dynamic behavior of UMDA. The key result is that UMDA transforms the discrete optimization problem to a continuous one. The continuous optimization problem is solved by gradient ascent. We show that UMDA solves many difficult multi-modal optimization problems. UMDA is extended to the Factorized Distribution Algorithm (FDA) which uses a general factorization of the search distribution into conditional and marginal distributions. We prove that FDA with Boltzmann selection converges to the set of global optima. A new adaptive selection schedule for Boltzmann selection is derived. FDA is extended to an algorithm LFDA, which computes the factorization from promising search points. LFDA is based on learning of Bayesian networks.

2001

Optimal Mutation Rate Using Bayesian Priors for Estimation of Distribution Algorithms, Thilo Mahnig, Heinz Mühlenbein
Proceedings of the 1st Symposium on Stochastic Algorithms, Foundations and Applications, Springer, 2001

UMDA (the univariate marginal distribution algorithm) was derived by analyzing the mathematical principles behind recombination. Mutation, however, was not considered. The same is true for the FDA (factorized distribution algorithm), an extension of the UMDA which can cover dependencies between variables. In this paper mutation is introduced into these algorithms by a technique called Bayesian prior. We derive theoretically an estimate how to choose the Bayesian prior. The recommended Bayesian prior turns out to be a good choice in a number of experiments. These experiments also indicate that mutation increases in many cases the performance of the algorithms and decreases the dependence on a good choice of the population size.

(c) Springer-Verlag, LNCS

Evolutionary Computation and Beyond, Heinz Mühlenbein, Thilo Mahnig
Foundations of Real-World Intelligence, CSLI Publications, 2001, 123-188

This is an overview article of the theoretical results derived in the Real World Computing Partnership.

Postscript zipped PDF

Evolutionary Synthesis of Bayesian Networks for Optimization, Heinz Mühlenbein, Thilo Mahnig
Advances in the Evolutionary Synthesis of Intelligent Agents, MIT Press, 2001, 429-455 MIT Press

We shortly review our theoretical analysis of genetic algorithms and provide some new results. The theory has lead to the design of three different algorithms, all based on probability distributions instead of recombination of strings. In order to be numerically tractable, the probability distribution has to be factored into a small number of factors. Each factor should depend on a small number of variables only. For certain applications the factorization can be explicitly determined. In general it has to be determined from the search points used for the optimization. Computing the factorization from the data leads to learning Bayesian networks. The problem of finding a minimal structure which explains the data is discussed in detail. It is shown that the Bayesian Information Criterion is a good score for this problem. The algorithms are extended to probabilistic prototype trees used for synthesizing programs.

Postscript zipped PDF

A new adaptive Boltzmann selection schedule SDS, Thilo Mahnig, Heinz Mühlenbein
Proceedings of the 2001 Congress on Evolutionary Computation, 2001, 183-190

FDA (the Factorized Distribution Algorithm) is an evolutionary algorithm that combines mutation and recombination by using a distribution. The distribution is estimated from a set of selected points. It is then used to generate new points for the next generation. In general a distribution defined for n binary variables has 2^n parameters. Therefore it is too expensive to compute. For additively decomposed discrete functions (ADFs) there exists an algorithm that factors the distribution into conditional and marginal distributions, each of which can be computed in polynomial time. Previously, we have shown a convergence theorem for FDA. But it is only valid using Boltzmann selection. Boltzmann selection was not used in practice because a good annealing schedule was lacking. Using a Taylor expansion of the average fitness of the Boltzmann distribution, we have developed an adaptive annealing schedule called SDS (standard deviation schedule) that is introduced in this work. The inverse temperature beta is changed inversely proportional to the standard deviation.

Postscript zipped PDF

Mathematical Analysis of Evolutionary Algorithms for Optimization, Heinz Mühlenbein, Thilo Mahnig
Proceedings of the Third Internatinal Symposium on Adaptive Systems, Havanna, 2001, 166-185

Simulating evolution as seen in nature has been identified as one of the key computing paradigms for the new decade. Today evolutionary algorithms have been successfully used in a number of applications. These include discrete and continuous optimization problems, synthesis of neural networks, synthesis of computer programs from examples (also called genetic programming) and even evolvable hardware. But in all application areas problems have been encountered where evolutionary algorithms performed badly. In this survey we concentrate on the analysis of evolutionary algorithms for optimization. We present a mathematical theory based on probability distributions. It gives the reasons why evolutionary algorithms can solve many difficult multi-modal functions and why they fail on seemingly simple ones. The theory also leads to new sophisticated algorithms for which convergence is shown.

Postscript zipped PDF

Comparing the adaptive Boltzmann selection schedule SDS to truncation selection, Thilo Mahnig, Heinz Mühlenbein
Proceedings of the Third Internatinal Symposium on Adaptive Systems, Havanna, 2001, 121-128

FDA (the Factorized Distribution Algorithm) is an evolutionary algorithm that combines mutation and recombination by using a distribution. The distribution is estimated from a set of selected points. It is then used to generate new points for the next generation. FDA uses a factorization to be able to compute the distribution in polynomial time. Previously, we have shown a convergence theorem for FDA. But it is only valid using Boltzmann selection. Boltzmann selection was not used in practice because a good annealing schedule was lacking. Using a Taylor expansion of the average fitness of the Boltzmann distribution, we have developed an adaptive annealing schedule called SDS. The inverse temperature beta is changed inversely proportional to the standard deviation. In this work, we compare the resulting scheme to truncation selection both theoretically and experimentally with a series of test functions. We find that it behaves similar in terms of complexity, robustness and efficiency.

Postscript zipped PDF

Wright's Equation and Evolutionary Computation, Heinz Mühlenbein, Thilo Mahnig
Advances in Fuzzy Systems and Evolutionary Computation, WSES, 2001, 259-264

In this paper Wright's equation, formulated in 1931 is proven and applied to evolutionary computation. Wright's equation shows that evolution is doing gradient ascent in the landscape defined by the average fitness of the population. The average fitness W is defined in terms of marginal gene frequencies pi. Wright's equation is only approximately valid in population genetics, but it is exactly describing the behavior of our Univariate Marginal Distribution Algorithm (UMDA).

Postscript zipped PDF

Application of Wright's Equation in Evolutionary Computation, Heinz Mühlenbein, Thilo Mahnig
Advances in Fuzzy Systems and Evolutionary Computation, WSES, 2001, 265-270

In this paper we apply Wright's equation to a specific fitness function defined by Wright himself. We introduce mutation into Wright's equation and our Univariate Marginal Algorithm UMDA. We show that mutation moves the stable attractors from the boundary into the interior. We compute optimal mutation rates for proportionate selection and truncation selection.

Postscript zipped PDF

2000

Evolutionary Algorithms: From Recombination to Search Distributions, Heinz Mühlenbein, Thilo Mahnig
Theoretical Aspects of Evolutionary Computing, Springer, 2000, 137-176

First we show that all genetic algorithms can be approximated by an algorithm which keeps the population in linkage equilibrium. Here the genetic population is given as a product of univariate marginal distributions. We describe a simple algorithm which keeps the population in linkage equilibrium. It is called the Univariate Marginal Distribution Algorithm (UMDA). Our main result is that UMDA transforms the discrete optimization problem into a continuous one defined by the average fitness W(p1,...,pn) as a function of the univariate marginal distributions pi. For proportionate selection UMDA performs gradient ascent in the landscape defined by W(p). We derive a difference equation for pi which has already been proposed by Wright in population genetics. We show that UMDA solves difficult multimodal optimization problems. For functions with highly correlated variables UMDA has to be extended. The Factorized Distribution Algorithm (FDA) uses a factorization into marginal and conditional distributions. For decomposable functions the optimal factorization can be explicitly computed. In general it has to be computed from the data. This is done by LFDA. It uses a Bayesian network to represent the distribution. Computing the network structure from the data is called learning in Bayesian network theory. The problem of finding a minimal structure which explains the data is discussed in detail. It is shown that the Bayesian Information Criterion is a good score for this problem.

Postscript zipped PDF

Mathematical analysis of optimization methods using search distributions, Thilo Mahnig, Heinz Mühlenbein
2000 Genetic and Evolutionary Computation Conference, Workshop Program, 2000, 205-208

We show that UMDA transforms the discrete optimization problem f(x) into a continuous one defined by the average fitness W(p). For proportionate selection, UMDA performs gradient ascent in this landscape. For functions with highly correlated variables UMDA has to be extended to an algorithm FDA which uses more complex search distributions. FDA also transforms the discrete optimization problem into a continuous one defined by W(p), where W(p) now depends on the factorization. The difference between UMDA and FDA are discussed for a deceptive function.

Postscript zipped PDF

Evolutionary Optimization using Graphical Models, Heinz Mühlenbein, Thilo Mahnig
New Generation Computing, 18, 2000, 157-166

We have previously shown that a genetic algorithm can be approximated by an evolutionary algorithm using the product of univariate marginal distributions of selected points as search distribution. This algorithm (UMDA) successfully optimizes difficult multi-modal optimization problems. For correlated fitness landscapes more complex factorizations of the search distribution have to be used. These factorizations are used by the Factorized Distribution Algorithm FDA. In this paper we extend FDA to an algorithm which computes a factorization from the data. The factorization can be represented by a Bayes network. The Bayes network is used to generate the search points.

Postscript zipped PDF

1999

The Factorized Distribution Algorithm for Additively Decomposed Functions, Heinz Mühlenbein, Thilo Mahnig
Proceedings of the 1999 Congress on Evolutionary Computation, 1999, 752-759

FDA - the Factorized Distribution Algorithm - is an evolutionary algorithm that combines mutation and recombination by using a distribution. First the distribution is estimated from a set of selected points. It is then used to generate new points for the next generation. In general a distribution defined for n binary variables has 2^n parameters. Therefore it is too expensive to compute. For additively decomposed discrete functions (ADFs) there exists an algorithm that factors the distribution into conditional and marginal distributions, each of which can be computed in polynomial time.  The scaling of FDA is investigated theoretically and numerically. The scaling depends on the ADF structure and the specific assignment of function values. Difficult functions on a chain or a tree structure are optimized in about O(n sqrt(n)) function evaluations. More standard genetic algorithms are not able to optimize these functions. FDA is not restricted to exact factorizations. It also works for approximate factorizations.

Postscript zipped PDF
 

Convergence Theory and Applications of the Factorized Distribution Algorithm , Heinz Mühlenbein, Thilo Mahnig
Journal of Computing and Information Technology - CIT 7:19-32, 1999

The paper investigates the optimization of additively decomposable functions (ADF) by a new evolutionary algorithm called Factorized Distribution Algorithm (FDA). FDA is based on a factorization of the distribution to generate search points. First separable ADFs are considered. These are mapped to generalized linear functions with metavariables defined for multiple alleles. The mapping transforms FDA into an Univariate Marginal Frequency Algorithm (UMDA). For UMDA the exact equation for the response to selection is computed under the assumption of proportionate selection. For truncation selection an approximate equation for the time to convergence is used, derived from an analysis of the OneMax function. FDA is also numerically investigated for non separable functions. The time to convergence is very similar to separable ADFs. FDA outperforms the genetic algorithm with recombination of strings by far.

It is shown that additively decomposed functions with disjunct defining sets can be mapped to generalized linear functions with metavariables defined for multiple alleles. The mapping transforms the Factorized Distribution Algorithm in the original space into an Univariate Marginal Frequency Algorithm (UMDA) in the space of the metavariables. For UMDA with proportionate selection the exact equation for the response is computed. For generalized linear functions the response is given by the additive genetic variance divided by the average fitness of the population. This was conjectured by Fisher in the Fundamental Theorem of Natural Selection.

 Postscript zipped PDF

Schemata, Distributions and Graphical Models in Evolutionary Optimization,  Heinz Mühlenbein, Thilo Mahnig ,  Alberto Ochoa Rodriguez
Journal of Heuristics, 5,1999, 215-247
Kluwer Academic Publishers, Boston
 

The paper deals with the optimization of additively decomposable discrete functions where genetic algorithms have exhibited a poor performance. First the schema theory of genetic algorithms is reformulated in probability theory terms. A schema defines the structure of a marginal distribution. A new algorithm FDA is presented which is based on factorizing a probability model into marginal distributions. The probability model captures the structure of the given function. The theory of conditional independence graphs is used to classify the fitness functions which can be optimized in polynomial time. For the test functions considered, the performance of FDA - in number of generations till convergence - is similar to that of the simple genetic algorithm for the OneMax function. This result is theoretically explained.

 Postscript zipped PDF
 

FDA - A scalable evolutionary algorithm for the optimization of additively decomposed functions, Heinz Mühlenbein, Thilo Mahnig
Evolutionary Computation, Volume 7, 1999, 353-376
 
FDA - the Factorized Distribution Algorithm - is an evolutionary algorithm which combines mutation and recombination by using a distribution instead. First the distribution is estimated from a set of selected points. Then it is used to generate the new points for the next generation. In general a distribution defined for n binary variables has 2n parameters. Therefore it is too expensive to compute. For additively decomposed discrete functions (ADFs) there exists an algorithm which factors the distribution into conditional and marginal distributions, each of which can be computed in polynomial time.

The scaling of FDA is investigated theoretically and numerically. The scaling depends on the ADF structure and the specific assignment of function values. Difficult functions on a chain or a tree structure are solved in about O(n*sqrt(n)). More standard genetic algorithms are not able to optimize these functions. FDA is not restricted to exact factorizations. It also works for  approximate factorizations as is shown for a circle and a grid structure.

 Postscript zipped PDF
 
 


My diploma thesis as a  postscript file

Papers describing the research at GMD on genetic algorithms can be found here .

 Zurück zur Heimseite