closure conversion
Closure conversion is an important transformation in compilers, especially for functional programming languages. To know why this is important, you should know what closure is. To understand closure, lambda calculus is a prerequisite, I assume you are already familiar with it, in the following introduction.
1. Closure
The closure is a lambda that has captured external variables, to move forward, I must introduce the definition
- external variables is the variable not defined in the lambda body, also not a parameter
- captured a variable is having expression which is using it
An example always helps us learn a concept better, in the following
lambda, z
is captured, but x
is not captured.
(let ([z 123]) (lambda (x) (+ z x)))
Now, you should know what's a closure roughly, but why do we need to convert it?
2. Assembly Model
That's because, in assembly or similar models(for example, C programming language), one cannot keep an external variable, at least not generally. For example, you can refer to a global variable(an external of course) in C.
int c = 1; void foo() { c = 2; }
how about a variable in the stack?
int bar() { // what to do here? return c + 1; } typedef int (*bar_t)(); bar_t foo(int n) { int c = n; return &bar; }
In this case, when the caller uses this result, c
is already gone!
This shows you need to save it into the heap, and this is what you will
do in closure conversion!
3. Why not put them in global variable?
You might pop up an idea, what if we reuse global variables for captured variables? For example, you might write the below code.
int c; int bar() { return c + 1; } typedef int (*bar_t)(); bar_t foo(int n) { c = n; return &bar; } int main() { bar_t b1 = foo(1); bar_t b2 = foo(2); printf("%d\n", b1()); printf("%d\n", b2()); }
However, this produces 3 3
rather than 2 3
, this is because foo(1)
and foo(2)
affect the same c
! Another way to say this is that
you're using the same environment for different call, or you at most
has one closure in this encoding.
4. Lambda lifting
A rough idea is not storing variables somewhere, but extending all lambda's parameters, as much as captured variables it used! This process is lambda lifting. The idea is simple, let's have an example to show.
(lambda (x y z) (let ((f (lambda (a b) (+ (* a x) (* b y))))) (- (f 1 2) (f 3 4))))
goes to
(lambda (x y z) (let ((f (lambda (x y a b) (+ (* a x) (* b y))))) (- (f x y 1 2) (f x y 3 4))))
But this approach cannot handle a returned lambda! Since
- Stack will be destroyed, just like what happened in C
- The caller cannot sure how to give this captured variable to this returned lambda, because both are in its callee. Of course, for some simple languages global analytics should work, but in a language with side effects? I don't believe things can be that easy.
Thus, you should already be convinced that closure conversion is needed, let's take a look at it.
5. Closure conversion
First of all, we need a simple target to track our understanding, here we go.:
(begin (define (make-adder n) (lambda (m) (+ m n))) ((make-adder 2) 3))
Then we need to define our range of language, let me show the nanopass
(define (primitive? x) (member x '(+ - * / cons car cdr vector vector-ref))) (define-language L0 (terminals (primitive (p)) (symbol (x)) (number (n))) (Expr (e body) x n p (begin body* ... body) (lambda (x* ...) body* ... body) (let ([x* e*] ...) body* ... body) (define (x x* ...) body* ... body) (define x e) (e e* ...)))
5.1. Step one: remove define lambda form
Using nanopass's idea, you will simplify language step by step to get an easier language to handle, here the point is converting define lambda form from a special form to define form. After this step, there will leave only define form.
(define-language L1 (extends L0) (Expr (e body) (- (define (x x* ...) body* ... body)))) (define-pass remove-define-lambda : L0 (e) -> L1 () (Expr : Expr (e) -> Expr () [(define (,x ,x* ...) ,[body*] ... ,[body]) `(define ,x (lambda (,x* ...) ,body* ... ,body))]))
5.2. Step two: wrap all expressions in a begin form
In this step, you make all expressions in different forms be wrapped by
begin
!
(define-language L2 (extends L1) (Expr (e body) (- (lambda (x* ...) body* ... body) (let ([x* e*] ...) body* ... body)) (+ (lambda (x* ...) body) (let ([x* e*] ...) body)))) (define-pass wrap-with-begin : L1 (e) -> L2 () (definitions (define (wrap body* body) (if (empty? body*) body `(begin ,body* ... ,body)))) (Expr : Expr (e) -> Expr () [(lambda (,x* ...) ,[body*] ... ,[body]) `(lambda (,x* ...) ,(wrap body* body))] [(let ([,x* ,[e*]] ...) ,[body*] ... ,[body]) `(let ([,x* ,e*] ...) ,(wrap body* body))]))
5.3. Find free variables
Now, you are ready to touch on the core idea, finding free variables.
(define-pass freevars : L2 (e) -> * () (Expr : Expr (e) -> * () [,x (set x)] [(lambda (,x* ...) ,body) (set-subtract (freevars body) (list->set x*))] [(let ([,x* ,e*] ...) ,body) (apply set-union (cons (set-subtract (freevars body) (list->set x*)) (map freevars e*)))] [(begin ,body* ... ,body) (apply set-union (map freevars (cons body body*)))] [(,p ,e* ...) (apply set-union (map freevars e*))] [(,e ,e* ...) (apply set-union (map freevars (cons e e*)))] [(define ,x ,e) (freevars e)] [else (set)]))
Explanation
- variable is free
- bounds in
let
andlambda
form will be removed from freeslambda
is quite simple, get freevars in body then remove parameters from that setlet
have to counte*
in too, but cannot eliminate bounds from them, because left-hand bounds inlet
didn't occur at right-hand expressions- but we can have
let*
andletrec
, they would be much more complicate
- map this function to each expression in
begin
form - map this function to application form
- handle primitive calls especially, don't count function part as freevars
- normal call count function part in
- recur on
define
form's expression part. Notice that this is not enough, we would very likely removex
for following expression ofdefine
form, but we keep simple here since we didn't use this in any function. - rest are ignored
5.4. Step three: closure conversion
In this step, you will find all lambdas and do the following to each
- add a variable
env
for them - encode environment as
vector
- replace free variables in expression with
vector-ref
to environment index - encode closure as a pair of the converted lambda and the environment
(define-pass replace-free : L2 (e $env fvs) -> L2 () (Expr : Expr (e) -> Expr () [,x (guard (set-member? fvs x)) `(vector-ref ,$env ,(index-of (set->list fvs) x))])) (define-pass closure-conversion : L2 (e) -> L2 () (Expr : Expr (e) -> Expr () [(lambda (,x* ...) ,[body]) (define $env (gensym 'env)) (define fvs (freevars e)) ; convert free-vars in body by using reference to $env (if (set-empty? fvs) `(cons (lambda (,x* ... ,$env) ,body) (vector)) `(cons (lambda (,x* ... ,$env) ,(replace-free body $env fvs)) (vector ,(set->list fvs) ...)))]))
NOTE: This relies on the fact that set->list
is stable.
Then you have to convert the call into two kinds of call:
- call to primitive
- call to closure: take lambda out and put to the head of call, take environment out and put to the end of the call
(define-pass closure-call : L2 (e) -> L2 () (Expr : Expr (e) -> Expr () [(,p ,[e*] ...) `(,p ,e* ...)] [(,[e] ,[e*] ...) `(let ([clos ,e]) ((car clos) ,e* ... (cdr clos)))]))
6. Conclusion
Now, you already get a workable closure conversion, all lambda has no free variables now! Thus, you can go further to get assembly without the model barrier now.
6.1. Challenge
Some challenges here are for you to have more fun!
- Some lambdas have no free variables before converting, can you eliminate these closure conversions? How to mark them? In module sense?
- To get assembly, you need to generate a name for converted lambda,
for example,
lambda51289
. But then you might get(define f lambda51289)
after conversion, can you eliminate these?