encode scheme object in 64 bits
Encode scheme objects in a word(64bits) is fun and important for dynamic
language! The idea is using 3 bits for tagging, 61 bits for data. For
small types like integer
, boolean
, char
, how to use that 61 bits
is clear , but how to handle complex type like pair
, closure
? Let's
distinguish them to two parts, small type and
complex type.
0.1. Tag
But first, you need some tag utility
(define fixnum-shift 2) (define fixnum-mask 3) (define ptr-mask 7) ; mask for pointer type tag (define ptr-mask-inv #xfffffff8) ; mask for pointer value (define pair-tag 1) (define vec-tag 2) (define str-tag 3) (define struct-tag 4) (define void-tag 5) (define closure-tag 6) (define char-mask 255) ; character type mask (define char-shift 8) (define char-tag 7) (define bool-mask 255) (define bool-shift 8) (define bool-tag 15)
0.2. Small type
Generate immediate representation for small type is not quite complex.
Basically, it's (v << type-shift) | type-tag
or type-tag
.
(define (immediate-rep x) (cond [(integer? x) (bitwise-and (arithmetic-shift x fixnum-shift) #xffffffff)] [(char? x) (bitwise-ior (arithmetic-shift (char->integer x) char-shift) char-tag)] [(boolean? x) (if x (bitwise-ior (arithmetic-shift 1 bool-shift) bool-tag) bool-tag)] [(null? x) pair-tag] [(void? x) void-tag]))
0.3. Complex type
0.3.1. Pair
If the object is a pair(cons
in Lisp), the idea is put it into heap
with two words size.
car | cdr |
---|---|
64bits | 64bits |
Then encode it's pointer as pair pointer. Above desciprtion is the following function.
export fn _scheme_cons(car: i64, cdr: i64) callconv(.C) i64 { const p: [*]i64 = @intToPtr([*]i64, @ptrToInt(c.GC_malloc(2 * rep.WORDSIZE))); p[0] = car; p[1] = cdr; return @intCast(i64, @ptrToInt(p)) | rep.PAIR_TAG; }
0.3.2. Vector
Vector is important structure for speed up access elements heavily and don't change size frequently usage. The idea is allocating enough size for elements(a chunk of memory).
size | data |
---|---|
64bits | 64bits * size |
Then encode it's pointer as vector pointer. Above desciprtion is the following function.
export fn _scheme_make_vector(length: i64, filled_by: i64) callconv(.C) i64 { const len: i64 = length >> rep.FIXNUM_SHIFT; const p: [*]i64 = @intToPtr([*]i64, @ptrToInt(c.GC_malloc(rep.WORDSIZE * (1 + @intCast(usize, len))))); p[0] = len; const p2: [*]i64 = p + 1; var i: usize = 0; while (i <= len) : (i += 1) { p2[i] = filled_by; } return @intCast(i64, @ptrToInt(p)) | rep.VEC_TAG; }
0.3.3. String
String is simple, because it's a vector! But to optimize memory usage,
you don't use encoded char objects, but char itself. Since char takes
8bits(u8
) only, data takes 8 * size bits.
size | data |
---|---|
64bits | 8bits * size |
Thus, the following function shows.
export fn _scheme_make_string(length: i64, filled_by: i64) callconv(.C) i64 { const len: i64 = length >> rep.FIXNUM_SHIFT; const p: [*]i64 = @intToPtr([*]i64, @ptrToInt(c.GC_malloc(rep.WORDSIZE + @intCast(usize, len)))); p[0] = len; var p2: [*]u8 = @ptrCast([*]u8, p + 1); var i: usize = 0; while (i <= len) : (i += 1) { p2[i] = @intCast(u8, filled_by >> rep.CHAR_SHIFT); } return @intCast(i64, @ptrToInt(p)) | rep.STR_TAG; }
0.3.4. Closure
Closure is the most important runtime object for any dynamic funcitonal langauges, the idea is same as pair encoding, allocates two words for two objects then encodes pointer as closure pointer. And here encode environment as a vector, which is done from compiler side.
function pointer | environment pointer |
---|---|
64bits | 64bits |
Then you get the following function.
export fn _scheme_make_closure(fn_pointer: i64, env: i64) callconv(.C) i64 { const p: [*]i64 = @intToPtr([*]i64, @ptrToInt(c.GC_malloc(2 * rep.WORDSIZE))); p[0] = fn_pointer; p[1] = env; return @intCast(i64, @ptrToInt(p)) | rep.CLOSURE_TAG; }
Well, if you look back to pair section, definitions are same internally! In fact, use pair for closure is ok! You only have to track closure use points to do proper conversion for them in compiler.
0.3.5. Struct
Struct is a chunk a memory with meaningful reference way, since every object in this implementation way is one word, a definition like the following takes 3 words.
(struct date (year month day))
It's simple to convert (date 1582 10 15)
to the following pesudo
program.
var date_instance = @intToPtr([*]i64, _scheme_make_struct(3)); date_instance[0] = encode_integer(1582); date_instance[1] = encode_integer(10); date_instance[2] = encode_integer(15);
Then you can figure out the definition of _scheme_make_struct
.
export fn _scheme_make_struct(len: usize) callconv(.C) i64 { const p: [*]i64 = @intToPtr([*]i64, @ptrToInt(c.GC_malloc(rep.WORDSIZE * @intCast(usize, len)))); return @intCast(i64, @ptrToInt(p)) | rep.STRUCT_TAG; }
Conversion for all struct constructor calls should be done by compiler, since all access to struct fields is known, you have the following safe conversion.
(date-year (date 1582 10 15))
To pesudo program.
var date_instance = @intToPtr([*]i64, _scheme_make_struct(3)); // ... _ = date_instance[0];
1. Conclusion
Hope this article shows a lot to help you have idea about how dynamic language handles runtime objects, there are still something missing in there, let me discuss them here.
- 61 bits for integer is not enough, so in scheme this concept called
fixnum
, any number exceed the size will be handled by a big number structure. Floating number will join this fun, some implementation also supports complex number. - closure needs closure conversion step to make them, that's also in my posting plan to explain it.
- related project shows some known bugs about vector, welcome to solve them for me XD.
Now, take a break and have a nice tea time for yourself! See you in next article!