Archive for the 'Algorithms' Category

Geometry Club Talk: Computational aspects of ECDLP

Wednesday, April 23rd, 2008

On Friday I gave a geometry club seminar, speaking about some of the computational aspects of discrete-logarithm cryptography in general and as implemented for elliptic curves. My notes supplement rather than completely describe the talk, being heavier on the formalities and lighter on the narrative.

The topics covered are: Diffie-Hellman and one-way functions for key exchange; the generic Discrete Logarithm Problem and BSGS algorithm; scalar multiplication- addition chains, fast exponentiation, m-ary methods and windowing; group law implementations, Side-channel attacks and the Edwards form.

I’ve discussed several of these ideas elsewhere on this blog, as well as the cryptanalysis ideas I mentioned on the day but which are not in the notes. I also refered to a recent real-world example of a side-channel attack; see this story from The Register for details.

The Extended Euclidean Algorithm

Thursday, January 17th, 2008

I promised some of my tutorial students a demonstration of how the ‘highschool’ approach to Euclid’s algorithm can be reversed to give rise to the extended Euclidean algorithm - as opposed to the version in their lecture notes, which finds both gcd(a,b) and x,y such that ax+by=gcd(a,b) in one pass, at the price of some notational complexity. To do so, it seemed worth recapping some of the properties of divisibility that make Euclid’s algorithm tick, and give an application for its extended form. That ended up taking four pages, so I figured I’d post it here as well… you can read it behind the cut, or download the LaTeX-formatted pdf version.

Read the rest of this entry »

A variable block length algorithm for Elliptic Nets

Wednesday, January 9th, 2008

(updated 10/i/08)

In an earlier post I described Stange’s algorithm for efficiently finding terms in elliptic nets (with a view to pairing computation). I also made the observation that a shorter block structure could be used for doubling- but once employed, it was not possible to perform a double-and-add. This meant that unless the desired term had a higher power of two as a factor then savings would be minor.

However, for a cost it is possible to ‘upgrade’ these short blocks to long blocks, since they contain enough information to recover the missing (k+4,0) term:

Better still, this only introduces an additional two multiplications and one inversion, since some of the terms feature in the precomputation (and assuming (2,0)2 is computed once and stored for subsequent use):

Thus, given a short block centred at k, we can obtain the short block centred at 2k+1 (that is, perform a shortDoubleAdd). The dependencies are as follows:

Notice that a shortDoubleAdd is more expensive than DoubleAdd, even though it gives a short rather than long block! Thus a purely short-block algorithm would perform worse than the standard algorithm for binary strings with a high Hamming weight, since for each Double-and-add an inversion is introduced in place of a multiplication. However, when the Hamming weight is low, then the occasional cost of an inversion is balanced by the savings accrued during short doublings. To exploit this, whilst guarding against too many inversions, we introduce an algorithm that uses a mixture of standard (‘long’) and short blocks. Since only a single inversion is required to switch to long blocks, doing so allows us to more efficiently compute long runs of 1s in the binary representation by spreading the cost across several DoubleAdds.

We consider the generation of long or short blocks with centre 2k (double) or 2k + 1 (double-and-add) from long or short blocks of centre k. The cheapest such operation is the generation of the short block with centre 2k from a short or long block with centre k, at a cost of 31 multiplications, 6 squarings and no inversions. Using this as a base line, each procedure introduces the following additional operations:


Procedure M S I
DoubleShortFromShort 0 0 0
DoubleLongFromShort 6 1 1
DoubleAddShortFromShort 4 1 1
DoubleAddLongFromShort 6 1 1
DoubleShortFromLong 0 0 0
DoubleLongFromLong 4 1 0
DoubleAddShortFromLong 2 1 0
DoubleAddLongFromLong 4 1 0

We adopt a windowing approach with two-bit windows: that is, bit bi informs whether we are to double or double-and-add, but bi-1 is also examined to determine whether we should generate a long or short block.

  • For bibi-1=00, the short block approach clearly minimises the cost through these two bits.
  • For bibi-1=11, one should stay with long blocks if these are already in use, to avoid inversion. If short, adopting the long block immediately will mean only a single inversion is required for the following run of 1s.
  • For bibi-1=10, if short, then an inversion is required whether you go long or not: since being long is not necessary for the following double, we keep the multiplication count down by 2 by staying short. Similarly for long: no inversion is required to perform the Double-and-add for either length, but as the next operation will be a double, we go short to avoid the unnecessary 2 multiplications.
  • For bibi-1=01, then it is always worth staying short if you already are, defering the inversion until it is strictly required for bi-1=1 (possibly choosing to go long then based on bi-2). If currently long, going short will save 4 multiplications and a squaring (approximately 5 multiplications). Even if it proves necessary to upgrade to long for the very next bit, that will only cost around 3.6 multiplications (based on 1I=1.6M, see performance section). Thus even a single zero bit is worth going short for.

Hence the approach is to always go to (or stay with, if already the case) short blocks, unless bibi-1=11 in which case one should go to (or stay with) long blocks. Thus a 2-bit window is sufficient to determine appropriate block length, leading to the following algorithm.

2-Bit Window Algorithm

Double-and-add Mixed-blocks Algorithm

INPUT: Integer n and long block centred at 1.

OUTPUT: Block centred at n.

  1. Compute binary digits di of n such that n=(dkdk-1…d1)2 with dk=1.
  2. Set c=1 (centre), Set status=’long’
  3. For i=k-1 down to 2 do:
    • If status=’long’
      • If d_i=1
        • If d_{i-1}=1 Compute block with centre 2c+1 via DoubleAddLongFromLong; Set c to 2c+1.
        • Else Compute short block with centre 2c+1 via DoubleAddShortFromLong; Set c to 2c+1; Set status=’short’.
      • Else
        Compute short block with centre 2c via DoubleShortFromLong; Set c to 2c; Set status=’short’.
    • Else
      • If d_i=1
        • If d_{i-1}=1 Compute block with centre 2c+1 via DoubleAddLongFromShort; Set c to 2c+1; Set status=’long’.
        • Else Compute block with centre 2c+1 via DoubleAddShortFromShort; Set c to 2c+1.
      • Else
        Compute short block with centre 2c via DoubleShortFromShort; Set c to 2c.
    • If d_1=1
      • If status=’short’ Compute block with centre 2c+1 via DoubleAddShortFromShort.
      • Else Compute block with centre 2c+1 via DoubleAddShortFromLong.
    • Else Compute short block with centre 2c via DoubleShortFromShort.
  4. Return final block.

Performance

As described before, the maximum possible gain is when n is a power of two, in which case the algorithm proceeds entirely by short doubles. In this case, there is a 12 percent reduction in the number of multiplications/squarings performed, with no inversions required.

Brute-force analysis of all possible 16-bit strings gives an average reduction of around 9 percent in the number of multiplications/squarings performed. Costing each inversion at 1.6 multiplications (based on average performance in SAGE for a 256-bit prime field), this leads to an average reduction of around 5 percent in the number of multiplications required for such strings. Testing several hundred 256-bit strings gives a similar figure.

Clearly, inversion is not viable if it will lead to a division-by-zero error. However, since the first zero along (i,0) will arise at (m,0), no such error will occur when performing Tate pairing computations.


A summary of this post and the earlier one on Stange’s algorithm is available as pdf, containing (hopefully) clearer copies of the dependency graphs.

Addition Chains

Thursday, November 29th, 2007

Suppose you have a rule for addition of some objects (most generally, elements of a semigroup), and you wish to compute the sum of n copies of the same object. How could you achieve this?

The primary school answer is to do just that: given g, compute 2g=g+g; then compute 3g=2g+g and so on, for n steps. But this rapidly becomes tedious, and isn’t (I hope) how we’d perform such a calculation in our heads: instead, we’d try to take bigger jumps. For instance, to compute the number 16g, we can find 2g, add that to itself to reach 4g, and again for 8g, hitting 16g after just 4 steps. Had we wanted 17, we’d then just add g again.

The sequence of intermediate values correspond to an addition chain for n: this is a sequence starting at a0=1 and ending at al=n with the property that for any term ak there exist terms ai,aj with ak=ai+aj. That is, every term is the sum of two earlier terms (not necessarily distinct), so we can reach ng by computing each akg in turn without needing anything fancier than our addition law.

As a side effect, an addition chain also tells us how to exponentiate if we know how to multiply, since xa+b=xaxb so we can build up to xn by finding the powers xak in turn.

Thus our tedious sequence for 16 is 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16, with the more sophisticated one being 1,2,4,8,16. If I asked you to compute 16x for a number x, you’d probably do something different: compute 10x (easy in base 10!) and 6x, then add those. With access to a multiplication procedure, this is definitely more efficient: but without that luxury it can still be used to give an addition chain for 16, by finding chains for 10 and 6. These are 1,2,4,5,10 and 1,2,3,6 respectively, so we can merge them into the chain 1,2,3,4,5,6,10,16.

We describe the length of the chain as l: that is, there are l terms after the 1. Notice that at worst, therefore, we need n-1 terms. If we move as quickly as possible by taking ak+1=ak+ak=2ak then the greatest value we can reach after m terms is 2m, as demonstrated with the chain for 24=16. However, if we can compute n and m in ln,lm terms, it needn’t take as many as ln+lm+1 terms to compute n+m, since there may be common terms in the chains. Finally, as defined an addition chain needn’t make use of all it’s terms: for example, 1,2,4,6,8,16 is still a chain for 16, although the 6 isn’t needed. We will obviously be interested in chains without any such wasted terms.

The observation for powers of two motivates a first attempt at an efficient algorithm for building addition chains. if n is 1, then we have nothing to do: the chain 1 suffices. Otherwise, we can work recursively: for an even number n=2k, we ask for the chain for k and then add 2k to the end; whereas for an odd number n=2k+1, we ask for the chain for k and add 2k then 2k+1 to the end. This will require at most log2n calls, and each step adds 1 or 2 entries to the chain.

This can also be captured by considering the binary representation of n, which will also indicate whether we hit an even or odd number at each stage. Suppose n is of the form

where each di is either 0 or 1, and dk=1 so there are no leading zeros. Then we start the chain with 1, and read from left to right, ignoring the leading 1: if we see a 0, then we double the last term of the chain; if we see a 1, add 1 to this new term as well. More formally:

Input: the binary expansion of n.

Output: an addition chain for n.

Set a0=1
Set t=1
for i from k-1 down to 0:
    Set at=2at-1 and t=t+1.
    If d_i=1:
        Set at=at-1+1 and t=t+1
return a0…at

For instance, with n a power of 2 we will see only a run of zeros, and thus perform the appropriate number of doublings, for a chain of length log2n. For n one less than a power of two, the expansion consists entirely of 1s, costing 2log2n. For an ‘average’ number where each digit has around a 50% chance of being a 1 or a 0, the chain will therefore be of approximate length (3/2)log2n. More formally, we can define the Hamming Weight v(n) to be the number of 1s in the binary expansion of n; then the chain will have length floor(log2n) + v(n) -1.

Is this optimal? A counterexample suffices to show it is not, the first being 15, which is 1111 in binary. Using the binary expansion we get the sequence 1,2,3,6,7,14,15 of length 6 (=3+4-1 as claimed). But the sequence would be 1,2,3,5,10,15 of length 5 and hence 1 shorter.

Nonetheless, our binary chain gives a much better upper bound than the naive chain of (n-1) steps, and is very simple to implement both in logic and memory. The idea also generalises to arbitrary bases, as follows.

Suppose n is of the form

where each di is in the range 0…m-1, dk non-zero. Then we can compute an m-ary addition chain for n:

Input: the m-ary expansion of n.

Output: an addition chain for n.

Set a=0
Start the chain with 1,2,3,…,m-1.
for i from k down to 0:
    Extend the chain to m*a (if not already present) and set a=m*a.
    Add a+di to the chain (if not already present), and set a=a+di.
return chain.

Notice that a tracks the last term computed, and that extending the chain to m*a is itself an addition chain problem: this makes some 2k a good choice for m, since this extension can then be accomplished in k additions.

The shorter chain for 15 arises from its ternary expansion: considered as (120)3 1*9+2*3+0*1, we have an initial chain of 1,2 then apply the loop. With a=0,d2=1 we set a=3a=0 then a=a+1=1; neither of these need to be added to the chain; for a=1, d1=2 we set a=3a=3 which requires the addition of 3 to the chain; then a=a+2=5 which extends the chain to 1,2,3,5; finally for a=5,d2=0 we set a=3a=15, adding first 10 then 15 to the chain; then a=a+0=15 so we terminate with chain 1,2,3,5,10,15.

The optimal choice of m depends on n, since although there will be less steps (logmn) for greater m, each step will require more additions in the computation of m*a. Nor are m-ary expansions the end of the story- there are various other tricks available to finesse addition chains. However, there is no known algorithm for finding shortest addition chains- and a generalisation of the problem to finding shortest addition sequences (chains that contain a desired list of values n1,n2,…,nk) is NP-complete. Searching for short chains is therefore an interesting challenge, and is constrained by the fact that unless you are precomputing them for future use, any shorter chain must offer a speedup in computation greater than the time taken to find it.

There are, however, bounds for the length of the chain:

Where v(n) is the Hamming weight as before.

However, if you have access to basic operations other than addition, all bets are off. In some contexts (such as elliptic curves), finding the inverse of an element is easy and so we can achieve subtraction, giving rise to addition-subtraction chains. For the pairing computations I’ve been interested in, I’ve never actually described a block addition. It is possible, following an algorithm of Shipsey, to do so, although it is much more expensive than the block double. But even if it cost no more, and we had an oracle for optimal chains, such an approach would only rarely beat the current algorithm. This is because access to the double-add procedure makes the Hamming weight irrelevant: for instance, with 15 we could form the chain 1,3,7,15. This means that we can compute in log2n block operations, and comparing to the lower bound for addition chains above shows that this is almost always better.

Still, ideas from addition chains can motivate construction of other operations: access to triple, tripleadd and tripleadd2 procedures would allow for chain of log3n length, although it is likely that these block operations would be more expensive than the double and doubleadds. Even if they don’t turn out to be useful for my work, the subtleties of this innocent-looking problem still makes for interesting mathematics!

References/Further Reading

Knuth- The Art of Computer Programming Vol. 2: Seminumerical Algorithms.

Bos and Coster- Addition Chain Heuristics in Advances in Cryptology- Crypto’89 (Lecture Notes in Computer Science) (Available via SpringerLink.)

Gordon- A Survey of Fast Exponentiation Methods. (Available via citeseer.)

Stange’s Algorithm for EDS

Thursday, November 8th, 2007

Since EDS are the rank 1 case of elliptic nets, Stange’s approach to the rank 2 case can be specialised to them: we are only concerned with terms of the form (x,0), so only the bottom row of a block need be calculated. Such sequences are considered when pairing a point with itself.

Thus I’ve simplified the ellnet class to give another eds class. I’ve changed the internals so that this class can be initialized and interacted with in the same way as my previous eds class based on Shipsey’s algorithm. (You retrieve the n term from a rank 1 net X using X[n] rather than the X[(n,0)] required for a general elliptic net, for instance.) There will probably be a few discrepencies to be ironed out, but I’ve thrown in the ward_point methods necessary for height calculation, and hope to have the two versions interchangeable in due course.

This requires only 1 initial inversion, and continues to use the step-by-step precomputation speedups given by Stange. It doesn’t support the larger jumps possible with Shipsey’s algorithm, although my existing code doesn’t exploit these as fully as it could anyway. Final doublings are still performed with smaller blocks, so for height estimation purposes it’s probably wise to choose a power of two for the term used.

The next post illustrates pairing computation with rank 1 nets.

Stange’s Algorithm for Elliptic Nets

Thursday, November 8th, 2007

In Shipsey’s algorithm1 for EDS, the septuples centred on terms mk and (m+1)k can be used to construct the septuple about 2mk, (2m+1)k or (2m+2)k from that about k. With k=1 this gives rise to a double-and-add algorithm, such that term n can be obtained in logarithmic time; knowledge of larger k introduces an additional speedup.

However, Shipsey’s approach makes use of various inverses, potentially giving division-by-zero errors for a sequence corresponding to a point of finite order (any point, in the finite field cases of interest for pairing computations). Further, the recurrence relation for a rank 2 elliptic net is more complicated than for an EDS, and we ultimately wish to find terms (m,0) and (m,1). Stange provides an algorithm to achieve this, replacing the septuple with an 11-element block centred on k as follows:


(k-1,1) (k,1) (k+1,1)
(k-3,0) (k-2,0) (k-1,0) (k,0) (k+1,0) (k+2,0) (k+3,0) (k+4,0)

Notice that the blocks centred at any of m, m-1 or m+1 will contain the information needed to compute the pairing. Thus we generate such a block by a double-and-add algorithm.

Algorithm

Double-and-add Algorithm

INPUT: Integer n and block centred at 1.

OUTPUT: Block centred at n.

  1. Compute binary digits di of n such that n=(dkdk-1…d1)2 with dk=1.
  2. Set c=1 (centre)
  3. For i=k-1 down to 1 do:
    • If d_i=0 Compute block with centre 2c; Set c to 2c
    • Else, Compute block with centre 2c+1; Set c to 2c+1
  4. Return final block.

Clearly, this requires procedures to generate new blocks from old- Stange2 gives formulae for these ((12)-(17)). A speed-up (Id. S5.1) can be achieved by computing commonly used squares/products at each step, denoted by the A(i) (squares) and B(i) (products) below.

Double

DoubleAdd

The inversions in the recurrence formulae are independent of k, so are precomputed and made available to Double and DoubleAdd as multipliers. Including multiplication by these values, the total cost of a block double/double-and-add is 35 multiplications, 7 squarings, 11 additions and no inversions.

A minor refinement

Define a 10-element short block centred at k by


(k-1,1) (k,1) (k+1,1)
(k-3,0) (k-2,0) (k-1,0) (k,0) (k+1,0) (k+2,0) (k+3,0)

shortDouble

Notice from the Double dependencies that the (k+4,0) term is necessary only to generate (2k+4,0). Thus given a short block centred at k, we can double it to obtain the short block centred at 2k, with the following dependencies:

The inversions in the recurrence formulae are independent of k, so are precomputed and made available to shortDouble as multipliers. Including multiplication by these values, the total cost of a short block double is 31 multiplications, 6 squarings, 10 additions and no inversions.

shortDoubleAdd

From the DoubleAdd dependencies it can be seen that the missing (k+4,0) is required for both (2k+4,0) and (2k+5,0)- although the latter can be dropped for a short block, the former cannot. Thus it is not possible to adapt this choice of recurrence formulae to give a DoubleAdd for short blocks. Nonetheless, the basic algorithm can be ammended to exploit the simpler short block Double once no further DoubleAdds are required; as the short block is sufficient for finding the Tate Pairing.

Algorithm

Amended Double-and-add Algorithm

INPUT: Integer n and block centred at 1.

OUTPUT: Block centred at n.

  1. Compute s,p such that p is odd and n=2sp.
  2. Compute binary digits di of p such that p=(dkdk-1…d1)2 with dk=1.
  3. Set c=1 (centre)
  4. For i=k-1 down to 1 do:
    • If d_i=0 Compute block with centre 2c using Double; Set c to 2c
    • Else, Compute block with centre 2c+1 using DoubleAdd; Set c to 2c+1
  5. This gives the block with centre p, and thus the short block with centre p.
  6. For i from 1 to s do:
    • Compute short block with centre 2c using shortDouble; Set c to 2c
  7. Return final block.

The efficiency gain of this variant depends on n- if n is a power of 2, then we may proceed entirely by shortDouble with 47 instead of 53 operations, a saving of around 11 percent (this will vary depending on the precise cost of multiplication, squaring and addition). If, however, n is odd then there is no gain. Thus pairing computation can be improved in the case of points with order a high power of two.

Implementation

A SAGE ellnet class is available, which implements all these ideas for rank 2 (Stange’s algorithm, with precomputation of inverses, caching of commonly-used factors at each step, and finishing by short blocks). See this post for how to generate nets from points and compute pairings.

References

1: Shipsey- Elliptic Divisibility Sequences (Download)

2: Stange- The Tate Pairing via Elliptic Nets (Preprint)

Baby Steps, Giant Steps, and element orders

Monday, April 2nd, 2007

The discrete logarithm problem is the vital part of elliptic curve cryptography, but can be defined (to varying cryptographic strength) for any cyclic group:

Discrete Logarithm Problem (DLP)
Let G be a cyclic group, with operation ⊕. Let [n] represent (n-1)-fold application of ⊕ i.e., [2]g=g⊕g, [3]g=g⊕g⊕g etc.

Given g,h from G, the discrete logarithm problem is to find t such that [t]g=h.

The baby-step, giant step (BSGS) algorithm is a generic algorithm for solving the DLP- that is to say, it makes no appeal to properties of the group involved, merely calculating abstractly with ⊕. There is a theoretical upper bound to the effectiveness of generic algorithms, and BSGS approaches are of that order of magnitude.

The simplest demonstration of BSGS (the original by Shanks) assumes that g generates G, with both of known order n: recall that the group order is simply the number of elements it contains, whereas the order of an element g is the least n such that [n]g=idG, should such a value exist. The order n of an element always divides the order of the group N, with equality when g generates G. Between the two there is also the exponent, the least value e such that [e]g=idG for all g in G; for cyclic groups this is N, but for products of cyclic groups (the structure of groups of rational points) it is the lowest common multiple of their respective orders, and thus, although a divisor of N, may be significantly smaller than it.

So, if we can find an order m rational point on a curve, we know that the cardinality of the group of rational points is zero modulo m. By testing random points and taking the lowest common multiple of their order we can usually find the exponent e of the group. With luck, but not always, this will be large enough that when combined with bounds on the cardinality the latter is established exactly (as with modular information in the SEA algorithm).

But it is no good attempting to do so using an algorithm that requires the order of the point! There are BSGS algorithms for the DLP which can handle unknown order, for a given g we can apply these to solve for h=idG in the cyclic group generated by g. Provided the algorithm gives the minimal t such that [t]g=h=idG in G=<g>, then t is the order of g.

One such algorithm, by Terr, is given in Cohen and Frey’s Handbook of Elliptic and Hyperelliptic curve cryptography (my current bible!) It relies on the following observation:

Let Tn be the nth triangular number: that is, defined by the recursion T1=0, Tn+1=Tn+n.

Then any non-negative integer t, there are unique integers j and k with t=Tj+1-k with 0≤k<j.

To see how, notice that there is a unique j such that t lies in the interval (Tj , … , Tj+1]=(Tj+1-j , … , Tj+1-0] and that this interval has width j so there is an appropriate choice of k.

We then suppose that t satisfies [t]g=h. Then we have [Tj+1]g=h⊕[k]g. So, instead of testing each [t] in turn until we hit equality, for a given j we need only test the ‘giant step’ [Tj+1]g against the set of ‘baby steps’ β={βi}={h⊕[0]g,…,h⊕[i]g,…,h⊕[j-1]g. If that j yields no matches, we move to j’=j+1: by keeping track of [j]g at each iteration, the new Tj’+1 and the extra h⊕[j’-1]g are found by a single group operation each. Hence this is attractive from both storage and time complexity perspectives: we need only record the baby steps β, [j]g and a single giant step at any given iteration j, whilst the time complexity is of order t1/2.

Terr’s BSGS variant for the DLP
Finds t such that [t]g=h for g generating G of unknown order, h from G.

Initialise β={β0=h}, γ=δ=idG, j=0. ((For each iteration, we have γ=[Tj+1]g and δ=[j]g)
Then loop over j as follows:

  • Increment j by 1, then update:
  • Set δ=δ⊕g=[j-1]g⊕g=[j]g.
  • Set γ=γ⊕δ=[Tj]g+[j]g=[Tj+1]g.
  • If j≥2 then
    • For s from 0 to j, if γ=βs then return Tj+1-s
  • Add βjj-1⊕g to β

I’ve implemented this generic DLP algorithm and an order-finding version (which requires you to catch g=idG) as Maple procedures for arbitrary groups- details on that will be in the next post! BSGS-based approaches become impractical at finite field sizes well within the grasp of SEA for counting on elliptic curves; but are of interest in the higher genus case which lacks an Elkies procedure.

The SEA algorithm

Wednesday, March 14th, 2007

I’ve spent the last few weeks getting my head around the SEA algorithm- a series of techniques for computing cardinality of elliptic curves over finite fields. The basic Schoof algorithm (the S of SEA) is based on the Weil conjectures, from which to find the cardinality it suffices to determine the trace of Frobenius, t. This can be shown to fall within certain bounds and thus (by the chinese remainder theorem) the problem can be solved by establishing t modulo l for a series of smaller primes l. The contributions by Elkies and Atkin make these subcalculations feasible, but trade computational complexity for mathematical difficulty, and it’s these I’ve been grappling with recently- particularly the Elkies approach.

Fortunately there is an excellent book - Blake, Seroussi and Smart’s Elliptic Curves in Cryptography which covers this topic thoroughly. As usual, attempting to implement the procedures myself has uncovered (and repaired) many gaps in my comprehension of the algorithms involved. I’m finally getting correct answers, albeit for small examples, so I’ve written up some summary notes on the book and a worked example in Maple. It still requires a high degree of interaction to get to the final result; my understanding is that SAGE (on linux) implements the SEA algorithm, so it doesn’t seem worth trying to tie it all together into a standalone Maple procedure- this would be a programming challenge, but wouldn’t add anything to my mathematical understanding!

Notes: [SEA.dvi]
Appendix (Maple procedures): [SEA_tools.mpl]
Example (Maple worksheet): [SEA.mw]