Genetic Algorithms with Processing

Genetic Algorithms are something I’ve found intriguing for a long time, but I’d never got around to actually implementing one. A recent flurry of blog posts and forum chatter about them – such as this program for evolving the Mona Lisa as a vector image, or this one which breeds vehicles, rekindled my interest. I quickly found a tutorial outlining the basics, and posing a sample problem: given a series of disks in a given area, find the largest disk which fits within the area without overlapping any of the others. I’d been planning to work, as usual, with SAGE, but the graphical nature of this problem gave me an opportunity to dive once again in to processing, which as it happens has recently left beta.

You can play with the results here; for an afternoon’s coding, I’m quite pleased with it as a proof of concept even though it doesn’t always do a fantastic job of solving the problem! As implemented, each run generates a fixed obstacle set of a dozen disks (in black) and an initial population of 25 candidate solutions. The ‘genome’ is 28 bit, split 10/9/9 for x co-ordinate, y co-ordinate and radius. Disks that overlap an obstacle or go off screen are awarded a fitness of zero and painted red; otherwise the fitness function is simply the radius. In line with the tutorial, I pick pairs by roulette selection based on this fitness (so red disks shouldn’t survive into the next generation), breed by crossover at bit 20, and mutate about 1 in 200 bits.

However, for this particular set up the crossover approach doesn’t seem ideal- offspring will inherit almost exactly the co-ordinates of one parent and the radius of the other. This often means large areas of the screen are missed unless a chance mutation moves a disk, which then has to be lucky enough to produce a few offspring in the next generation. It might work better if I generated offspring bit by bit (say, taking bits from each parent a certain percentage of the time), rather than the simple cut currently in use. Fortunately, it shouldn’t be hard to experiment in this way; ideally, I’d build several algorithms into the applet and offer a choice at runtime, as well as control over the assorted variables. If nothing else, it’s teaching me a lot about processing, which I continue to be impressed with.

iSquared: The limits of Computation

The winter issue of iSquared magazine is now available, with the cover article ‘The limits of Computation’ being by, well, me! I discuss Turing Machines, undecidability, intractability, and (briefly) Artificial Intelligence.

I’ve not received my own copy so can’t comment on the other articles yet, although I’ve attended a talk by interviewee Keith Briggs before which was very good, so I expect the interview will be interesting too. So I hope you’ll pick up a copy- it’s a venture well worth supporting!

Distributed Computing

Eddie, a computing cluster at the University of Edinburgh.

Mathematical study is often thought of as ‘purer’ than scientific research- instead of labs full of chemicals, fruit flies or lasers, our work could in theory proceed with nothing more than a chalkboard; rather than believing theories through weight of evidence and an absence of counterexamples, we prove theorems as undeniable consequences of our base assumptions.

But theorems rarely spring into our minds fully formed, simply awaiting proof. Instead mathematical research can often mimic the scientific method- experimenting with ideas, following promising leads, looking for tests that would break or support hunches, until they look convincing enough to attempt a proof.

This has very much been the nature of my work over the past few months- and although the actual proofs have mostly been worked out by pen and paper, with supporting calculations on a MacBook laptop, the exploratory phase requires something rather more powerful. One of the perks of being a University of Edinburgh researcher is access to the ECDF – a cluster of over 1400 processors, as illustrated above. Provided a problem can be split up into many independent sub-problems, such a resource can offer staggering reductions in computation time, by farming them out to multiple machines and running them in parallel.

For instance, I usually wish to generate very, very long lists of matrices and test them for a certain property, keeping only the much smaller subset of those with the property. A typical calculation of this type might take five days on my little Mac, assuming it doesn’t simply run out of memory in the process. But by splitting it into ten smaller calculations (each exploring a subset of the matrices), I can have the answer from the grid in 12 hours- or, splitting to 20 jobs, I can load it up overnight, and collect the answer the next morning just in time for a supervisor meeting!

Of course, a twenty-fold reduction in calculation time only hints at the power of the cluster; another 70 users could also get such a job done that night. Running all-out on a single task, ECDF could complete four years worth of calculation in a day!

Not all researchers have access to systems like Eddie, but there is one network almost everyone has access to- the internet. The idea of harnessing the power of idle home computers is nothing new: over ten years ago, I was contributing spare cycles of my (then powerful) Pentium 200 to a distributed.net project that succeeded in demonstrating the insecurity of 56-bit encryption keys, which at the time was the maximum allowed by the US government in software exports. Other famous distributed efforts include the alien-seeking SETI and the biological research project folding@home.

The potential for highly parallelised computing has surged as processor counts have grown and always-on internet connectivity becomes increasingly ubiquitous. Modern PCs often have multiple processor cores (and many homes have multiple PCs), and there has been work on exploiting highly powerful (yet specialised) graphics card technology for general purpose computation too: you can even contribute to folding@home with a PlayStation3!

However, except for a few long-range ‘big ticket’ schemes such as the ones I’ve mentioned, setting up, managing and publicising your own distributed computing project might be more daunting than solving your original problem! Fortunately, there is an increasing range of ways to connect researchers to resources. At ANTS (in connection with another cryptographic challenge) I learnt of BOINC, a general platform that has grown out of SETI: you can submit a project to the volunteer community, or manage a grid within your own organisation. All modern Apple computers include the Xgrid feature, allowing again for local grid computing with your own machines or harnessing donated cycles through the openmacgrid project. Recently, Fedora Nightlife was announced, to create a community grid from machines running the popular linux distribution.

There are limitations, of course. Not all calculations fall in to the ‘embarassingly parallel’ category that lends itself so well to distributed computing. There are also concerns about the other costs of such schemes – home PCs are not necessarily cut out for 24/7 running, and those that are will probably add a bit to their owner’s power bill. However, Bryan’s blog makes a good case for the environmental merits of distributed computing over data centres, assuming that the calculations are worthwhile and thus going to be done somewhere anyway – a claim that is perhaps easier to defend for cancer studies than the search for alien signals. Or you can think of it as a very direct form of charity in support of science- donating some electricity, processing power and wear-and-tear rather than cash. So, if like me your home features more computers than people, why not devote some of their time to volunteer computing?

Geometry Club Talk: Computational aspects of ECDLP

On Friday I gave a geometry club seminar, speaking about some of the computational aspects of discrete-logarithm cryptography in general and as implemented for elliptic curves. My notes supplement rather than completely describe the talk, being heavier on the formalities and lighter on the narrative.

The topics covered are: Diffie-Hellman and one-way functions for key exchange; the generic Discrete Logarithm Problem and BSGS algorithm; scalar multiplication- addition chains, fast exponentiation, m-ary methods and windowing; group law implementations, Side-channel attacks and the Edwards form.

I also refered to a recent real-world example of a side-channel attack; see this story from The Register for details.

Greedy Pig

This entry first appeared as a writeup for Everything2.

Greedy pig is a simple maths game for groups that serves as an introduction to probability. I used it recently as a warm-up activity for a maths hour with local primary school children (around 11 years old), where it was well-received. For older students, it could provide the starting point for a discussion of topics such as the gambler’s fallacy or for a statistical investigation.

How to Play

A pair of dice are thrown, and their total recorded as a starting score for all participants. Play then proceeds in rounds. Before each round, players decide whether to stick with their current score, or continue playing. To play a round, roll a die; each player who is still in adds that many points to their score- unless a two is thrown, in which case they lose all their points. Play proceeds until all participants have decided to keep their score, or a two eliminates all remaining players. The winners of the game are the players with the highest score; it’s worth playing around three games and taking a combined total.

Practical issues when running the game

Keeping track of who’s in or out is most easily done by having students stand up if they wish to gamble or sit down if they wish to stick with their score. Apart from making it easy to spot when a player is trying to sneak back into the game, this is also good as it gives the students an idea of how confident their peers are to continue, and you’ll get lots of them wavering up and down as they try to decide!

Recording scores of players as they drop out is harder, but vital- children may try to cheat, or accuse each other of doing so, when it comes to declaring their final total. It’s definitely worth keeping a running tally of throws and totals on the blackboard -with the students doing the adding up! For smaller groups, you might be able to give out tokens or numbered cards as players save their score, but with larger groups (we had around 30 students per session) this would probably slow things down a lot. Perhaps give each student a piece of paper and a pen to write their score on (nice and large!) to hold up once they’ve sat down.

Some children are very risk adverse, and sit down almost immediately; others just stay standing until they get knocked out by a two. To make sure this isn’t due to misunderstanding the game, it’s worth doing a practice run first. It’s interesting to watch how strategies adapt as the players get more experience- particularly if the two is thrown surprisingly early or late in a game (we hit a total in the 70s for one session, which skewed things somewhat!)

Strategy

Can we say anything mathematically about when a player should sit down? That is, should we gamble a given total or not? You might want to think about this yourself before reading on.

To model this game, we can consider the expectation of a round- that is, the average outcome in the long run. Suppose then we have a total of N. Obviously, it’s only worth playing if our expected increase in the total offsets the risk of losing it all. One sixth of the time, a one will be thrown in the next round, leading to a gain of 1; with equal probability we might gain 3,4,5, or 6. So five sixths of the time, we gain some amount. But the remaining sixth of the time, we’ll hit a two and lose everything; this can be thought of as a ‘gain’ of minus N. So our expected gain by staying in is:

(1/6)*1+(1/6)*(-N)+(1/6)*3+(1/6)*4+(1/6)*5+(1/6)*6 = (1/6)*(1+3+4+5+6-N) = (19-N)/6.

Hence, for a play to be worthwhile, we need (19-N)/6 (and thus 19-N) to be positive. That is, we should be willing to gamble on a total of 19 or less, but a total of 20 or above should be banked.

Game Theory

However, as is always the case with expectation theory, this analysis depends on playing a large number of games and considering the total (or average) score across them. Playing just a few games tends to encourage an ‘all or nothing’ approach wherein players are more interested in winning in absolute terms (that is, being best in class) than the score attained in the progress.

Of course, the ideal time to bank is just before the two is thrown, thus leaving you with the maximum possible score (anyone who sat out earlier has less; anyone who stayed in scores zero). The problem is that by banking a score in a given round with the hope of winning that particular game, you are effectively gambling on it being the next throw being a two, and you’ll only be right one sixth of the time. The remaining five sixths of the time, you’ll be wrong and the others get a higher score- in a group situation, any of them could now retire with a better score than you, and in a 1-on-1 duel your opponent can bank immediately to guarantee the win.

But then, if everyone adopts this brinksmanship strategy of always staying in, then eventually they will all go over the brink and score zero. Depending on how the payoffs are modelled (which is of course crucial!) this two-player version of a round of Greedy Pig can be interpreted in game-theoretic terms as follows. If neither player has an advantage over the other, either by ending the game with a tied score, or by proceeding to another round, then assign them a score of 1, unless both players tied with zero, in which case score 0. Else, if one player wins the game this round and the other loses, the winner scores 2 and the loser 0. Mixing in the probabilities of winning or losing depending on a play of stick or gamble, we get a payoff bimatrix:


Gamble Stick
Gamble 5/6,5/6 10/6,2/6
Stick 2/6,10/6 1,1

Notice that for player 1 the ‘gamble’ row dominates the ‘stick’ row (and equivalently for player 2 in columns), and thus each player must gamble despite the fact that they each prefer the outcome of both sticking (score 1) to both gambling (score 5/6). Thinking of sticking as cooperating, and gambling as defecting, this is precisely the famous prisoner’s dilemma!

Variations

More advanced versions of Greedy Pig, and the resulting changes in optimal strategy, can be explored. For instance, you could cap the number of rounds to be played. A pair of dice could be used, scoring by either adding the total of both or taking their difference; this also allows for a range of elimination conditions: ending the game on a double, when either die is a 2, for a particular total etc. You could also vary the frequency with which players choose to gamble, such as commiting them to two throws of the die each round. But, particularly for younger children, beware of making the game too complicated at the expense of fun!


Based on my experiences running primary school workshops as part of the Science Communcation in Action scheme at the University of Edinburgh. Unfortunately, I do not know who deserves the original credit for this game.

What I’m working on…

So it’s been over two months since a post; more attentive readers will have noticed that there was one, but now there isn’t. I’ve moved away from thinking about cryptography to generalising some number/graph theoretic results of my supervisor, concerning matrices with constrained eigenvalues. However, this creates a problem: unless I ‘blog every up and down of the research process (which could be interesting, but would slow me down!) information on here becomes decreasingly accurate or relevant as I revise my thinking on the topic. Certainly it would be premature to present firm results at the moment.

But I can at least set the stage for more technically-minded readers (a friendlier explanation/illustration will hopefully follow once I truly understand all this!). Chris has characterised all symmetric integer matrices with the property that their eigenvalues are at most 2 in modulus; under a suitable transformation of their characteristic polynomials, these give cyclotomic polynomials and thus are referred to as cyclotomic matrices. Conveniently, any submatrix of a cyclotomic matrix is itself cyclotomic, so it suffices to find maximal examples. Although there are infinite families of these matrices, there are only a few ‘types’ possible.

These types are best understood by considering not the matrix, but an associated graph, where values in the matrix determine the weights on edges and nodes of the graph. This introduces a notion of equivalence, since many matrices will correspond to the same graph or certain well-defined variations on it. Further, we can adjoin nodes and edges to the corresponding graph to try and ‘grow’ towards maximal examples.

The motivation comes from finding polynomials of small Mahler measure- whilst a cyclotomic polynomial has measure 1, all others seem to be pushed away, with the smallest known value being 1.176… The question is how to generate small examples, and these matrices provide a way: by adjoining a single extra node to a maximal cyclotomic graph, a non-cyclotomic graph/matrix is obtained and thus a non-cyclotomic polynomial. The minimal graphs with this property (non-cyclotomic, but all subgraphs cyclotomic) often correspond to polynomials with some of the smallest known Mahler measures.

But some examples are not generated in this way, which is where I’ve stepped in. There is no reason to restrict attention to integer matrices, and I’ve established which imaginary quadratic extensions of the rationals give rise to rings of integers over which suitable matrices can be found. For a couple of fields, there are very few new (non rational-integer) cyclotomic matrices, and I have a complete description of them, but in others there are again infinite families as well as occasional examples that don’t generalise.

So I explore this behaviour by growing graphs/matrices, and try to spot patterns as they emerge from the fragments. I use the university’s parallel computing cluster Eddie for brute force work in SAGE, but such is the nature of the combinatorial explosion that even this doesn’t suffice without some mathematical insight along the way, as I try to refine my growing algorithms and capture equivalence as early as possible. I’m hopefully nearing the point where all examples fit into known families, at which point I’ll need to switch into serious mathematician mode and try and prove why this should be so. But for now I need to make sure that nothing unexpected tumbles out of each batch of calculations!

On a completely unrelated note, I’ve dragged modulo errors up to date with wordpress 2.5 and switched themes; please shout if you find I’ve broken something along the way.

The Extended Euclidean Algorithm

I promised some of my tutorial students a demonstration of how the ‘highschool’ approach to Euclid’s algorithm can be reversed to give rise to the extended Euclidean algorithm – as opposed to the version in their lecture notes, which finds both gcd(a,b) and x,y such that ax+by=gcd(a,b) in one pass, at the price of some notational complexity. To do so, it seemed worth recapping some of the properties of divisibility that make Euclid’s algorithm tick, and give an application for its extended form. That ended up taking four pages, so I figured I’d post it here as well… you can read it behind the cut, or download the LaTeX-formatted pdf version.

Continue reading ‘The Extended Euclidean Algorithm’ »

Understanding Public Key Cryptography with Paint

(this post roughly corresponds to the narrative from part of a talk I’m preparing for S5 (approximately 16 years old) students, intended to portray a modern research area in an accessible manner. So I’d very much appreciate feedback in the comments!)

For centuries, cryptography – literally, `secret writing’ – has been used to securely send and receive messages. But although the sophistication of these systems increased, the core idea remained the same: combining a secret encryption rule with the plaintext message yields a ciphertext, from which the message is recovered by a corresponding decryption rule. Thus the secrecy of messages depended on preserving the secrecy of the cryptographic system (or at least certain parameters).

While this might be feasible for governments or armies, it leads to a fatal flaw when trying to communicate securely with a stranger, a task that underpins the millions of ecommerce transactions that take place every day, for instance. To share secrets, you must first share a secret, the particulars of the cryptographic system you wish to protect the message with. This presents a seemingly impossible hurdle: how can you share that first secret with a previously-uncontacted individual, if any instructions you give will also be available to your adversaries?

Public Key encryption is the solution to this problem; to get a feel for how this is achieved, we’ll consider a non-mathematical formulation in terms of mixing paint, before abstracting to the properties that make it work.

Secret Sharing with Paint

Suppose, then, that our protagonists, Alice and Bob, wish to share a secret: but all their communication is intercepted by an eavesdropper, Eve. How can Alice and Bob arrive at a colour without Eve also knowing it?

Alice and Bob are assumed to know a public base colour- and there’s no problem with Eve knowing this too. They then choose a private colour of their own, and combine some of that with the base colour to create a public mix. They can then send these mixtures to each other: Eve sees both public colours, but (since it’s a lot harder to unmix paint), has no idea what private colours were used to produce them.

Having received each others’ mix, Alice and Bob can then mix in their own private colour again, to produce a blend of three colours. But, each of them will have the same colour, since the order in which we mix paint is irrelevant. Eve, however, has no idea what this new mixture looks like.

The following table summarises who sees what, for a particular set of chosen private colours.

Useful Properties

What are the key ideas that make the example above work? and how can we mimic them mathematically?

Unmixing is necessary…

No combination of the colours Eve has seen will mix to give the fetching shade of mustard yellow that Alice and Bob know. Since they had to agree on the procedure, Eve would be able to recreate the desired shade if she knew either of the private colours, since then she could mix it with the corresponding public colour just like Alice and Bob. Unfortunately, the private colours are never disclosed, only their combination with the base colour. So Eve must analyse the public colours in the hope of extracting a private colour.

…but unmixing is hard!

Without an encyclopaedic knowledge of all combinations of paint, Eve cannot know what private colours have been used to generate the public ones. So her only apparent option is to keep trying candidates, mixing each of them with the base coat until she arrives at one of the public colours by sheer luck. This brute force approach will obviously take a very long time!

Fortunately, Alice and Bob don’t need to unmix.

For Alice and Bob, this is irrelevant- they’re only ever required to mix, which is much easier than unmixing. However, it’s vital that their two routes through the colours lead to the same result: that is, that Blue+Red+Green is the same as Blue+Green+Red.

Secret Sharing with Maths

This leads us in search of a ‘one-way function’: roughly speaking, a mathematical function with the property that it’s much more difficult to recover the inputs from the outputs (reverse the function) than it is to compute those outputs in the first place, thus satisfying the second property above. Moreover, we need a procedure by which Alice and Bob can make use of such functions to independently arrive at a mutual secret which cannot be obtained by Eve. To do so, we therefore require that the only way to deduce a shared secret is indeed to reverse the function (the first property). Finally, to make the whole thing work as described above, we require that two applications of the function can be performed in either order to give the same result. This is the third property, described mathematically as commutativity.

Unfortunately, no-one has been able to demonstrate a genuinely one-way function. Fortunately, there are a few candidates, for which even the best publically-known techniques for the reversal are painfully slow. But there remains a risk that someone, somewhere, will devise a smarter way to perform this mathematical unmixing, rendering the function useless for cryptographic application.

A candidate One-Way Function: Modular Exponentiation

For a number x, we say x is congruent to y modulo N (written x=y mod N) if y is the remainder after dividing x by N. This might seem a strange idea, but it’s something we do every day: a clock face works “mod 12″, so that if you add 6 to 7 you get 1, as 13 = 12*1 +1 = 1 mod 12.

So, for a fixed base g and modulus N (our equivalent of the base colour), we can compute the modular exponent of any x, defined as gx mod N (that is, multiply g by itselfx times, subtracting lots of N until the answer is at most N-1).

For instance, with a base of 2 and a modulus of 11, the modular exponent of x=6 is 26 mod 11. Working that out, we get 2*2*2*2*2*2 mod 11 = 64 mod 11 = 9 mod 11 since 64 = 5*11 +9.

The possibly surprising (but desired) result is that going backwards- that is, given y, finding x such that gx mod N =y mod N – is apparently hard for decently-sized N. This reversal is known as the Discrete Logarithm Problem, and generalises to some very abstract mathematical objects, known as finite groups, with varying difficulty.

For our example function, with N=21024+643 and assuming a computer capable of a billion tests per second, naively trying each possible private key in turn can take up to a staggering 3, 671743, 063000, 000000, 000000, 000000, 000000, 000000, 000000, 000000, 000000 years to match with a public key. This is significantly longer than the life of the universe so far – some 13700000000 years – and definitely longer than anyone is prepared to wait. Of course, there are smarter ways to try keys- and finding yet smarter ones for a given DLP is a very active area of research- but for such values of N none of the publically known ones fall within feasible time scales.

Further, we have the commutativity property we required: working out (ga)b is the same as (gb)a. So we should be able to share secret numbers via modular exponentiation just as we were able to share secret colours with paint mixing. This process is known as Diffie-Hellman Key Exchange.

Diffie-Hellman Key Exchange

  • Alice and Bob agree (in public) on a modulus N and a base g (the public base colour)
  • Each chooses a private key between 1 and N-1; a and b respectively (their private colour)
  • They each construct a public key by computing A=ga mod N and B=gb mod N (mixing private with base)
  • These can be safely exchanged, as it’s hard to get back a or b from the public keys A and B (unmixing hard)
  • Each then performs another modular exponentiation on the public key received:
    • Alice computes Ba mod N = (gb)a mod N = (gab) mod N = S mod N for some S between 0 and N-1.
    • Bob computes Ab mod N = (ga)b mod N = (gab) mod N = S mod N, the same S.

Hence (assuming N is chosen for the DLP to be sufficiently hard) Alice and Bob know a secret,S, which Eve does not.

Man-in-the-Middle Attacks

But is Diffie-Hellman truly secure? If we alter the ‘intruder power’, replacing our eavesdropper Eve with a more powerful character, the malicious Mallory, then we can construct a scenario in which Alice and Bob think they share a secret with each other, but instead share one with Mallory.

To achieve this, Mallory must be able to not just listen in on messages, but replace them with messages of his own. Given that power, he can pick a private key of his own, m, and generate the corresponding public key M, since the choice of g and N must be made in the clear.

Then, when Alice attempts to retrieve Bob’s public key B, Mallory instead supplies her with M, keeping A; and when Bob asks for Alice’s key A, revealing B, he is given M as well. Alice then computes S=Ma mod N, but Mallory has seen A and thus can compute S as Am mod N; Bob meanwhile computes T=Mb mod N which is also known to Mallory, being Bm mod N.

Thus, if Alice the tries to use a classical encryption system depending on the secrecy of S, then Mallory will be able to decode the ciphertext. Even more cleverly, he can re-encrypt it using T and forward it to Bob- who can decrypt it with the secret he thinks he shares with Alice, T. Thus no suspicion is raised, yet Mallory has also read the message, without ever having to reverse the one-way function.

Fortunately, this attack depends on near-total control of Alice and Bob’s communication. If Alice ever looks up Bob’s public key without Mallory intervening, then she’ll notice that it’s B, not M. Or, if she sends Bob a message encrypted with S that Mallory doesn’t intercept and repack, Bob will recieve a message that cannot be decrypted with the key he has, T: and knows that the Diffie-Hellman key exchange must have been compromised.

Conclusions

Identifying and preventing such attacks is part of cryptanalysis, and even with perfect cryptography, forms a vital part of designing secure communication systems. Since we don’t yet have a provably one-way function, cryptography itself remains an active field of mathematical research, drawing on a range of topics from both pure and applied areas to assess the difficulty of functions. Together, these two fields are known as cryptology; a subject which is becoming increasingly vital as computers and communication systems work themselves further into our everyday lives.

The Anatomy of a Maths Degree

Ever wondered what a mathematics degree involves? As another excuse to play with graphviz, and before I forget entirely, I decided to map out the contents of my undergraduate degree- a four year masters at the University of Bath, England. Here’s the result:

(clickthrough for legible, hi-res version)

The colours denote the year in which I took the course: green for year 1, blue for year 2, yellow for year 3, and red for year 4. The rating of the courses themselves is denoted by the course code- the first digit specifies the intended level of study. 1 is ‘certificate’, 2 is ‘intermediate’, 3 is ‘honours’ and 4 ‘masters’. The intention is that you collect at least 10 each at levels 3 and 4.

For a fuller description of what the courses involve, you can look up their course code in the unit handbook, available online via the university.