Archive for January, 2008

The Extended Euclidean Algorithm

Thursday, January 17th, 2008

I promised some of my tutorial students a demonstration of how the ‘highschool’ approach to Euclid’s algorithm can be reversed to give rise to the extended Euclidean algorithm - as opposed to the version in their lecture notes, which finds both gcd(a,b) and x,y such that ax+by=gcd(a,b) in one pass, at the price of some notational complexity. To do so, it seemed worth recapping some of the properties of divisibility that make Euclid’s algorithm tick, and give an application for its extended form. That ended up taking four pages, so I figured I’d post it here as well… you can read it behind the cut, or download the LaTeX-formatted pdf version.

Read the rest of this entry »

Tate pairing computation in SAGE III

Thursday, January 10th, 2008

The latest version of my ellnet class is ellnet2d_lowmem.spyx. It combines all the tricks I know of:

  • The use of precomputed inverses for all steps, and precomputed squares/products for each step, as described by Stange,
  • computation with a local vector to avoid overhead from function calls to keep the dictionary up-to-date,
  • mixed block lengths as described in the previous post,
  • and compilation to pyrex.

Thus it’s the fastest implementation I currently have for finding Tate pairings in SAGE (about twice as fast as accessing Stange’s PARI script from SAGE). Attach it in the normal way; example calculations are here.

A variable block length algorithm for Elliptic Nets

Wednesday, January 9th, 2008

(updated 10/i/08)

In an earlier post I described Stange’s algorithm for efficiently finding terms in elliptic nets (with a view to pairing computation). I also made the observation that a shorter block structure could be used for doubling- but once employed, it was not possible to perform a double-and-add. This meant that unless the desired term had a higher power of two as a factor then savings would be minor.

However, for a cost it is possible to ‘upgrade’ these short blocks to long blocks, since they contain enough information to recover the missing (k+4,0) term:

Better still, this only introduces an additional two multiplications and one inversion, since some of the terms feature in the precomputation (and assuming (2,0)2 is computed once and stored for subsequent use):

Thus, given a short block centred at k, we can obtain the short block centred at 2k+1 (that is, perform a shortDoubleAdd). The dependencies are as follows:

Notice that a shortDoubleAdd is more expensive than DoubleAdd, even though it gives a short rather than long block! Thus a purely short-block algorithm would perform worse than the standard algorithm for binary strings with a high Hamming weight, since for each Double-and-add an inversion is introduced in place of a multiplication. However, when the Hamming weight is low, then the occasional cost of an inversion is balanced by the savings accrued during short doublings. To exploit this, whilst guarding against too many inversions, we introduce an algorithm that uses a mixture of standard (‘long’) and short blocks. Since only a single inversion is required to switch to long blocks, doing so allows us to more efficiently compute long runs of 1s in the binary representation by spreading the cost across several DoubleAdds.

We consider the generation of long or short blocks with centre 2k (double) or 2k + 1 (double-and-add) from long or short blocks of centre k. The cheapest such operation is the generation of the short block with centre 2k from a short or long block with centre k, at a cost of 31 multiplications, 6 squarings and no inversions. Using this as a base line, each procedure introduces the following additional operations:


Procedure M S I
DoubleShortFromShort 0 0 0
DoubleLongFromShort 6 1 1
DoubleAddShortFromShort 4 1 1
DoubleAddLongFromShort 6 1 1
DoubleShortFromLong 0 0 0
DoubleLongFromLong 4 1 0
DoubleAddShortFromLong 2 1 0
DoubleAddLongFromLong 4 1 0

We adopt a windowing approach with two-bit windows: that is, bit bi informs whether we are to double or double-and-add, but bi-1 is also examined to determine whether we should generate a long or short block.

  • For bibi-1=00, the short block approach clearly minimises the cost through these two bits.
  • For bibi-1=11, one should stay with long blocks if these are already in use, to avoid inversion. If short, adopting the long block immediately will mean only a single inversion is required for the following run of 1s.
  • For bibi-1=10, if short, then an inversion is required whether you go long or not: since being long is not necessary for the following double, we keep the multiplication count down by 2 by staying short. Similarly for long: no inversion is required to perform the Double-and-add for either length, but as the next operation will be a double, we go short to avoid the unnecessary 2 multiplications.
  • For bibi-1=01, then it is always worth staying short if you already are, defering the inversion until it is strictly required for bi-1=1 (possibly choosing to go long then based on bi-2). If currently long, going short will save 4 multiplications and a squaring (approximately 5 multiplications). Even if it proves necessary to upgrade to long for the very next bit, that will only cost around 3.6 multiplications (based on 1I=1.6M, see performance section). Thus even a single zero bit is worth going short for.

Hence the approach is to always go to (or stay with, if already the case) short blocks, unless bibi-1=11 in which case one should go to (or stay with) long blocks. Thus a 2-bit window is sufficient to determine appropriate block length, leading to the following algorithm.

2-Bit Window Algorithm

Double-and-add Mixed-blocks Algorithm

INPUT: Integer n and long block centred at 1.

OUTPUT: Block centred at n.

  1. Compute binary digits di of n such that n=(dkdk-1…d1)2 with dk=1.
  2. Set c=1 (centre), Set status=’long’
  3. For i=k-1 down to 2 do:
    • If status=’long’
      • If d_i=1
        • If d_{i-1}=1 Compute block with centre 2c+1 via DoubleAddLongFromLong; Set c to 2c+1.
        • Else Compute short block with centre 2c+1 via DoubleAddShortFromLong; Set c to 2c+1; Set status=’short’.
      • Else
        Compute short block with centre 2c via DoubleShortFromLong; Set c to 2c; Set status=’short’.
    • Else
      • If d_i=1
        • If d_{i-1}=1 Compute block with centre 2c+1 via DoubleAddLongFromShort; Set c to 2c+1; Set status=’long’.
        • Else Compute block with centre 2c+1 via DoubleAddShortFromShort; Set c to 2c+1.
      • Else
        Compute short block with centre 2c via DoubleShortFromShort; Set c to 2c.
    • If d_1=1
      • If status=’short’ Compute block with centre 2c+1 via DoubleAddShortFromShort.
      • Else Compute block with centre 2c+1 via DoubleAddShortFromLong.
    • Else Compute short block with centre 2c via DoubleShortFromShort.
  4. Return final block.

Performance

As described before, the maximum possible gain is when n is a power of two, in which case the algorithm proceeds entirely by short doubles. In this case, there is a 12 percent reduction in the number of multiplications/squarings performed, with no inversions required.

Brute-force analysis of all possible 16-bit strings gives an average reduction of around 9 percent in the number of multiplications/squarings performed. Costing each inversion at 1.6 multiplications (based on average performance in SAGE for a 256-bit prime field), this leads to an average reduction of around 5 percent in the number of multiplications required for such strings. Testing several hundred 256-bit strings gives a similar figure.

Clearly, inversion is not viable if it will lead to a division-by-zero error. However, since the first zero along (i,0) will arise at (m,0), no such error will occur when performing Tate pairing computations.


A summary of this post and the earlier one on Stange’s algorithm is available as pdf, containing (hopefully) clearer copies of the dependency graphs.

Topics in Algebra, Analysis and Geometry.

Sunday, January 6th, 2008

Last summer I spent two weeks at the very rewarding Utrecht Summerschool in Mathematics, so I thought I’d spread the word about this year’s course. It’s entitled Topics in Algebra, Analysis and Geometry; analysis is a new inclusion this year (in place of number theory) and will be the main emphasis. Abstracts for the three courses are not yet available, but the titles are QRT and elliptic surfaces, Distributions,and Lie algebras and Integrable Systems.

As last year, the course runs for two weeks in August, with a fairly intensive schedule of lectures and problem classes; when I attended, the students also spent a couple of days preparing a presentation for the final day. The pace is reasonably demanding, and the ideal audience would be students just finishing undergrad and about to enter study for an MSc or PhD (although I went after a year of postgrad study).

There are also social activities organised by both the department and the university - there are fifty courses scheduled across the summer in a wide range of subjects, so you’ll have the opportunity to mix with students from outside of mathematics too. Utrecht itself is a beautiful city - night canoeing through the canals is highly recommended! - and daytrips further afield are also offered.

For further details on the summerschool programme, see here; specifics for the mathematics course are being made available on the department pages. I also took some photos during my stay. Feel free to leave any questions you have in the comments!