This chapter describes all the data structures used in this package. We start with a section on finite fields and what is already there in the GAP kernel and library. Then we describe compressed vectors and compressed matrices.
Throughout the package, elements in the field GF(p) of p elements are represented by numbers 0,1,...,p-1 and arithmetic is just the standard arithmetic in the integers modulo p.
Bigger finite fields are done by calculating in the polynomial ring GF(p)[x] in one indeterminate x modulo a certain irreducible polynomial. By convention, we use the so-called Conway polynomials
(see http://www.math.rwth-aachen.de:8001/~Frank.Luebeck/data/ConwayPol/index.html) for this purpose, because they provide a standard way of embedding finite fields into their extension fields. Because Conway polynomials are monic, we can store coset representatives by storing polynomials of degree less than d, where d is the degree of the finite field over its prime field.
As of this writing, GAP has two implementation of finite field elements built into its kernel and library, the first of which stores finite field elements by storing the discrete logarithm with respect to a primitive root of the field. Although this is nice and efficient for small finite fields, it becomes unhandy for larger finite fields, because one needs a lookup table of length p^d for doing efficient addition. This implementation thus is limited to fields with less than or equal to 65536 elements. The other implementation using polynomial arithmetic modulo the Conway polynomial is used for fields with more than 65536 elements. For prime fields of characteristic p with more than that elements, there is an implementation using integers modulo p.
For this section, we assume that the base field is GF(p^d), the finite field with p^d elements, where p is a prime and d is a positive integer. This is realized as a field extension of degree d over the prime field GF(p) using the Conway polynomial c ∈ GF(p)[x]. We always represent field elements of GF(p^d) by polynomials a = \sum_{{i=0}}^{{d-1}} a_i x^i where the coefficients a_i are in GF(p). Because c is monic, this gives a one-to-one correspondence between polynomials and finite field elements.
The memory layout for compressed vectors is determined by two important constants, depending only on p and the word length of the machine. The word length is 4 bytes on 32bit machines (for example on the i386 architecture) and 8 bytes on 64bit machines (for example on the AMD64 architecture). More concretely, a
is an Word
unsigned long int
in C and the length of a Word
is sizeof(unsigned long int)
.
Those constants are bitsperel
(bits per prime field element) and elsperword
(prime field elements per Word
). bitsperel
is 1 for p=2 and otherwise the smallest integer, such that 2^bitsperel > 2⋅ p-1. This means, that we can store the numbers from 0 to 2⋅ p - 1 in bitsperel
bits. elsperword
is 32/bitsperel rounded down to the next integer and multiplied by 2 if the length of a Word
is 8 bytes. Note that we thus store as many prime field elements as possible into one Word
, however, on 64bit machines we store only twice as much as would fit into 32bit, even if we could pack one more into a Word
. This has technical reasons to make conversion between different architectures more efficient.
These definitions imply that we can put elsperword
prime field elements into one Word
. We do this by using the bitsperel
least significant bits in the Word
for the first prime field element, using the next bitsperel
bits for the next prime field element and so on. Here is an example that shows how the 6 finite field elements 0,1,2,3,4,5 of GF(11) are stored in that order in one 32bit Word
. bitsperel
is here 4, because 2^4 < 2⋅ 11 - 1 = 21 < 2^5. Therefore elsperword
is 5 on a 32bit machine.
most significant xx|.....|.....|.....|.....|.....|..... least significant 00|00101|00100|00011|00010|00001|00000 | 5| 4| 3| 2| 1| 0
Here is another example that shows how the 20 finite field elements 0,1,2,0,0,0,1,1,1,2,2,2,0,1,2,2,1,0,2,2 of GF(3) are stored in that order in one 64bit Word
. bitsperel
is here 3, because 2^2 < 2⋅ 3 - 1 = 5 < 2^3. Therefore elsperword
is 20 on a 32bit machine. Remember, that we only put twice as many elements in one 64bit Word
than we could in one 32bit Word
!
xxxx..!..!..!..!..!..!..!..!..!..!..!..!..!..!..!..!..!..!..!..! 0000010010000001010010001000010010010001001001000000000010001000 2 2 0 1 2 2 1 0 2 2 2 1 1 1 0 0 0 2 1 0
(The exclamation marks mark the right hand side of the prime field elements.)
Note that different architectures store their Word
s in a different byte order in memory (endianess
). We do not specify how the data is distributed into bytes here! All access is via Word
s and their arithmetic (shifting, addition, multiplication, etc.). See Section 3.4 for a discussion of this with respect to our external representation.
Now that we have seen how prime field elements are packed into Word
s, we proceed to the description how compressed vectors are stored over arbitrary extension fields:
Assume a compressed vector has length l with l ≥ 0. If d=1 (prime field), it just uses elsperword/l Word
s (division rounded up to the next integer), where the first Word
stores the leftmost elsperword
field elements in the first Word
as described above and so on. This means, that the very first field element is stored in the least significant bits of the first Word
.
In the extension field case GF(p^d), a vector of length l uses (elsperword/l)*d Word
s (division rounded up to the next integer), where the first d Word
s store the leftmost elsperword
field elements. The very first word stores all the constant coefficients of the polynomials representing the first elsperword
field elements in their order from left to right, the second Word
stores the coefficients of x^1 and so on until the d'th Word
, which stores the coefficients of x^{d-1}. Unused entries behind the end of the actual vector data within the last Word
has to be zero!.
The following example shows, how the 9 field elements x^2+x+1, x^2+2x+2, x^2+3x+3, x^2+4x+4, 2x^2+x, 2x^2+3x+1, 2x^2+4x+2, 3x^2+1, and 4x^2+x+3 of GF(5^3) occurring in a vector of length 9 in that order are stored on a 32bit machine:
^^^ lower memory addresses ^^^ ....|....|....|....|....|....|....|.... (least significant bit) 1st Word: 0001|0010|0001|0000|0100|0011|0010|0001| (first 2nd Word: 0000|0100|0011|0001|0100|0011|0010|0001| 8 field 3rd Word: 0011|0010|0010|0010|0001|0001|0001|0001| elements) --------------------------------------------------- 4th Word: 0000|0000|0000|0000|0000|0000|0000|0011| (second 5th Word: 0000|0000|0000|0000|0000|0000|0000|0001| 8 field 6th Word: 0000|0000|0000|0000|0000|0000|0000|0100| elements) VVV higher memory addresses VVV
A
(one of our compressed vectors) is a GAP cvec
Data object
(that is with TNUM
equal to T_DATOBJ
). The first machine word in its bag is a pointer to its type, from the second machine word on the Word
s containing the above data are stored. The bag is exactly long enough to hold the type pointer and the data Word
s.
But how does the system know, over which field the vector is and how long it is? The type of a GAP object consists of 3 pieces: A family, a bit list (for the filters), and one GAP object for defining data
. The additional information about the vector is stored in the third piece, the defining data, and is called a
.cvecclass
A cvecclass
is a positional object with 5 components:
Position | Name | Content |
1 | IDX_fieldinfo |
a fieldinfo object, see below |
2 | IDX_len |
the length of the vector as immediate GAP integer |
3 | IDX_wordlen |
the number of Word s used as immediate GAP integer |
4 | IDX_type |
a GAP type used to create new vectors |
5 | IDX_GF |
a GAP object for the base field |
6 | IDX_lens |
a list holding lengths of vectors in cvecclasses for the same field |
7 | IDX_classes |
a list holding cvecclass es for the same field with lengths as in entry number 6 |
In position 5 we have just the GAP finite field object GF(p,d)
. The names appear as symbols in the code.
The field is described by the GAP object in position 1, a so-called fieldinfo
object, which is described in the following table:
Position | Name | Content |
1 | IDX_p |
p as an immediate GAP integer |
2 | IDX_d |
d as an immediate GAP integer |
3 | IDX_q |
q=p^d as a GAP integer |
4 | IDX_conway |
a GAP string containing the coefficients of the Conway Polynomial as unsigned int [] |
5 | IDX_bitsperel |
number of bits per element of the prime field (bitsperel ) |
6 | IDX_elsperword |
prime field elements per Word (elsperword ) |
7 | IDX_wordinfo |
a GAP string containing C data for internal use |
8 | IDX_bestgrease |
the best grease level (see Section 5.8) |
9 | IDX_greasetabl |
the length of a grease table using best grease |
10 | IDX_filts |
a filter list for the creation of new vectors over this field |
11 | IDX_tab1 |
a table for GF(q) to [0..q-1] conversion if q ≤ 65536 |
12 | IDX_tab2 |
a table for [0..q-1] to GF(q) conversion if q ≤ 65536 |
13 | IDX_size |
0 for q ≤ 65536, otherwise 1 if q is a GAP immediate integer and 2 if not |
14 | IDX_scafam |
the scalars family |
Note that GAP has a family for all finite field elements of a given characteristic p, vectors over a finite field are then in the collections family of that family and matrices are in the collections family of the collections family of the scalars family. We use the same families in the same way for compressed vectors and matrices.
The following limits come from the above mentioned data format or other internal restrictions (an immediate integer
in GAP can take values between 2^{-28} and (2^{28})-1 inclusively on 32bit machines and between 2^{-60} and (2^{60})-1 on 64bit machines):
The prime p must be an immediate integer.
The degree d must be smaller than 1024 (this limit is arbitrarily chosen at the moment and could be increased easily).
The Conway polynomial must be known to GAP.
The length of a vector must be an immediate integer.
The implementation of cmats
(compressed matrices) is done mainly on the GAP level without using too many C parts. Only the time critical parts for some operations for matrices are done in the kernel.
A cmat
object is a positional object with at least the following components:
Component name | Content |
len |
the number of rows, can be 0 |
vecclass |
a cvecclass object describing the class of rows |
scaclass |
a cscaclass object holding a reference to GF(p,d) |
rows |
a list containing the rows of the matrix, which are cvec s |
greasehint |
the recommended greasing level |
The cvecclass
in the component vecclass
determines the number of columns of the matrix by the length of the rows.
The length of the list in component rows
is len+1
, because the first position is equal to the integer 0 such that the type of the list rows
can always be computed efficiently. The rows are then stored in positions 2 to len+1
.
The component greasehint
contains the greasing level for the next matrix multiplication, in which this matrix occurs as the factor on the right hand side (if greasing is worth the effort, see Section 5.8).
A cmat
can be pre-greased
, which just means, that a certain number of linear combinations of its rows is already precomputed (see Section 5.8). In that case, the object is in the additional filter HasGreaseTab
and the following components are bound additionally:
Component name | Content |
greaselev |
the grease level |
greasetab |
the grease table, a list of lists of cvecs |
greaseblo |
the number of greasing blocks |
spreadtab |
a lookup table for indexing the right linear combination |
This section describes the external representation of matrices. A special data format is needed here, because of differences between computer architectures with respect to word length (32bit/64bit) and endianess. The term endianess
refers to the fact, that different architectures store their long words in a different way in memory, namely they order the bytes that together make up a long word in different orders.
The external representation is the same as the internal format of a 32bit machine with little endianess, which means, that the lower significant bytes of a Word
are stored in lower addresses. The reasons for this decision are firstly that 64bit machines can do the bit shifting to convert between internal and external representation easier using their wide registers, and secondly, that the nowadays most popular architectures i386 and AMD64 use both little endianess, such that conversion is only necessary on a minority of machines.
cmat
fileThe header of a cmat
file consists of 5 words of 64bit each, that are stored in little endian byte order:
the magic value as ASCII letters (8 bytes) in this order |
the value of p to describe the base field |
the value of d to describe the base field |
the number of rows of the matrix |
the number of columns of the matrix |
After these 40 bytes follow the data words as described above using little endian 32bit Word
s as in the internal representation on 32bit machines.
Note that the decision to put not more than twice as many prime field elements into a 64bit Word
than would fit into a 32bit Word
makes the conversion between internal and external representation much easier to implement.
generated by GAPDoc2HTML