Genetic Programming

My approach is to use Genetic Programming, an approach to creating programs through processes inspired by biological evolution.

Genetic Programming is an Artificial Intelligence technique that is ideally suited to needle-in-a-haystack search and optimisation tasks that would otherwise be impractical to solve. Here’s how it works :-

  • Create a population of randomly built algorithms,
  • Assess each algorithm against a fitness function and give it a score,
  • Rank the algorithms according the score,
  • Discard the bottom half of the population,
  • Interbreed specimens in the top half to create new ‘child’ algorithms,
  • Go back and repeat the assessment etc.

In our case, the fitness function will be a trading simulator where the algorithms specify the trade entry criteria and direction, the stop loss value and then the exit criteria. The score will be the total profit from many trades over a range of historic data.

Breeding consists of taking a function at random from each of the parent e.g. the entry function might come from the ‘father’ and the exit function might come from the ‘mother’ except of course there is no gender. This is called ‘crossover’ but is very simplistic and will never modify the functions from their original random construction.

Additionally, aspects of each function are changed through ‘mutation’. This could be a simple change like changing a numeric constant or it could be more complex such as introducing a conditional or arithmetic computation.

After many generations of breeding and mutation, you should end up with algorithms that are extremely well suited to the task you intended. This does, however, take a lot of CPU time and there are some pitfalls to watch out for such as ‘overfitting’ where you end up with an algorithm that is fantastic on the data that that it has been trained with but lacks any general ability on other data.

There is some GP terminology that deserves explanation that I may use in posts on here :-

  • Crossover (breeding) is a technique of creating offspring programs by taking random top level elements from each parent to construct a new individual
  • Mutation is the process of subtly modifying a program by changing one or more elements at random
  • Prefix notation is the human readable form of the abstract syntax tree where brackets () represent a functional node e.g. (+ 3 2) is a node that adds the two constants 3 and 2. The function name ‘+’ prefixes the arguments.
  • Abstract syntax tree is a heirarchical structure representing the program e.g. (+ 5 (* x 3)) – there is a top level addition node which adds a constant (5) with the results of a lower node which is a multiplication.
  • Terminal Set is the palette of low level data items that the programs can be built from. These are constants (e.g. 0.5), variables (e.g. bid) or functions (e.g. (volume 2))
  • Fitness function is the evaluator that assesses the program and gives it a score. In our case the fitness function takes a candidate program and makes it trade over historic data. The score is the profit or loss that it makes.
  • Population is the collection of candidate programs
  • Generation is one round of assessment where every member of the population is assessed and ranked by score. The losers are discarded and the winners are used to repopulate the discarded set through crossover and mutation.

I’d thoroughly recommend A Field Guide to Genetic Programming as this is the book that I’ve used to teach myself GP.

Comments

7 Responses to “Genetic Programming”

  1. Guilherme says:

    Andrew,

    I´ve been testing some different strategies using GP and results on long term does not satisfies my intentions, it´s rare to have some algos that make more than 60% success rates during trades and the results very too much from one currency to another using same strategies. I´ve developed myself on GP program for EXCEL (it´s really slow, but for first trying it will do). I´ve found that criteria used to exit and enter does not make sense and my fitness function uses delay, antecipation and amount of pip´s acquired by the trade related to the swing it was entered, I´ve found this to be more productive than only the results itself. Another fact that I´ve found is that using a trailing stop, will give you a lower std deviation value when checking each trade result on a program. I hope you can find succes, up till now I´m not convinced of strategy that appeared to me, I´m using 5 years of that to back test and 1 year to foward test. Seems if you use more that to back test and create the strategy, it would give you better results and avoid overfitting.

    Best wishes,

  2. andrew says:

    Hi Guilherme,

    Thanks for the comment. So far my results have not been inspiring, the latest only shows a very marginal profit on unseen data. My latest post ‘Initial results are disappointing’ has the detail. At this stage I am not optimistic but I have one last trick to try.

    I’m impressed that you’ve managed to do GP in Excel as I wouldn’t have thought that possible ! I intend to publish my C code on this site so you could have a look at that as it would give you a huge performance boost over Excel I would expect.

    Andrew

  3. Guilherme says:

    Andrew,

    It was a really difficult dealing with some parts, but in the end it works (slowly,but works). My main focus at the beggining was to use nothing but price (O,H,L,C data), using a Lag function to create comparisons among a lot of data, iI´m trying not to use more than exp averages and some arithimetical functions. In my intents I´ve seen that less is better and more generalized. Don´t give up, I´ll try using currency futures data that include volume, maybe this can bring some light at the end of the tunnel.

  4. Nicolas says:

    In the last issue of “Automated Trade” there is an interview of guys who are using systems based GP. They don’t go into the details of course, but it seems to work for them.
    The two biggest errors according to me ar overfitting and thinking too much as a programmer/mathematiciand and not enough as a trader (data to feed, rules to implement, etc).

  5. andrew says:

    Thanks Nicolas – that’s encouraging. Is the article online as I would like to read it ?

  6. Nicolas says:

    Not online but you can suscribe for free and have access to one issue. You can also have access online to some old issues and I found them interesting.

Leave a Reply