In some circles, you will hear proper tail calls referred to as “tail call optimization”. I think this name is terrible: omitting unnecessary and stupid behaviour (like accumulating stack at every tail call) is hardly what I’d consider an “optimization”: it’s simply good principled design. — Ron Garcia
What are tail calls?
Tails calls are the recursive calls that are executed as the last statement of procedure. Consider the following Racket (PLAI) program to sum up the elements of a list of numbers.
After the first call to sum
, our computation can be expressed as (+ 1 (sum (list 2 3)))
. After the second call, we get (+ 1 (+ 2 (sum (list 3))))
and so on until the base case (i.e the empty list) is reached. Notice how on each call, the +
operation needs to wait for the recursive call to return a value, resulting in this ever growing context of pending computations. This becomes clearer when we trace the execution of sum
.
This stack buildup is not ideal. This was just a small example, but imagine if our list was long enough to fill all available memory - we would run out of stack space well before we are able to evaluate our result. Understanding tail recursion can help us avoid this unneccesary build up on the stack. Consider the following modified sum
procedure.
Here, we use an accumlator to store the sum so far. Instead of using the stack to store the context of pending computations, we use the accumlator variable. Thus at any given time we are occupying a only a single stack frame since our recursive calls don’t need to wait. The trace of sum-acc
reflects this change.
Racket has “proper” tail calls, i.e it does not accumulate any stack space when making a recursive call that appears at a tail position. This means that the following tail-recursive Racket program will simply run forever:
C, on the other hand, accumulates stack space even with tail recursive procedures. This means that an equivalent tick tock
program in C will terminate with a stack overflow error. This article discusses how we can implement “proper” tail calls in C so that procedures like tick-tock
will run forever instead of blowing the stack. Our discussion will consider the Euclidean algorithm, a fundamental recursive algorithm to find the greatest common divisor of two integers. Following is an implementation of the Euclidean algorithm in Racket.
Trampolining
We will employ a technique called “trampolining” to implement “proper” tail calls in C. A trampoline is nothing but a wrapper around the recursive function. In regular recursion, we have a function calling itself from within. In trampolined recursion, the function doesn’t call itself, instead it returns another function (thus completing the current execution of our function) and this returned function again calls our recursive function. Then we develop a trampoline loop that keeps calling the function returned by our trampolined recursive function until computation is completed. In other words, control flow goes from our recursive function to the trampoline loop and back to the recursive function again and again (hence the name trampolining!). Here is a trampolined implementation of euclid-alg
in Racket:
Notice how all recursive calls are represented by the bounce
data type and the base case is represented by dismount
. Using the information encapsulated by these data representations, the loop-trampoline
procedure is able to determine whether or not the computation is complete.
Defunctionalization
Notice that we use lambda functions in the bounce
constructor. C does not have lambdas, and so we aren’t quite ready to translate this Racket code to C. We need to remove any usage of first-class of functions, a process refered to as “defunctionalization”. Defunctionalization entails two key steps:
- Create abstractions for all places a lambda is being applied or constructed
- Then replace the lambda with a data representation
The only place were we apply the bounce
lambdas is in the loop-trampoline
procedure. So, we create a new apply/th
(read: apply thunk) procedure to abstract away lambda application.
A thunk refers to an argument-less function that mimics lazy evaluations. For example, the procedure
(define (lazy-sum) (+ 1 2))
is a thunk. By capturing the state of our recursive function at each step, we are creating thunks that get evaluated by theloop-trampoline
procedure.
The bounce
lambdas are created in the euclid-alg/tr
procedure. We introduce new procedures to abstract away these constructions.
Now we are setup to get rid of these first-class functions altogether. To do so, we introduce a new data type — thunk
. This will encapsulate the same information that the lambdas did.
Now we update our apply/th
procedure to handle these two thunk variants.
Now we have a trampolined, defunctionalized version of the Euclidean algorithm. This can easily be translated to C.
Euclidean algorithm in C with proper tail calls
Continuation Passing Style
This is great, we can now run any tail recursive function in C without blowing the stack. But what if our function cannot easily be expressed as tail recursive? A common exmaple of such functions are those which aren’t singly recursive (i.e make two or more recursive invocations at a time). One such case occurs in the optimal solution to the famous towers of Hanoi problem. Here is a solution to the towers of Hanoi problem in Racket:
Notice how we need to make two recursive calls in order to solve this problem. Since we have two recursive calls, it is clearly not possible to have both of them be in a tail position. However, we can achieve tail recursion by converting this procedure to use continuation passing style. There are numerous resources online that will be able to explain CPSing much better than I can (for instance, see this article). Here is what a CPSed solution to the towers of Hanoi look like:
Now we can use trampolining and defunctionalization to be able to implement a “proper” tail recursive solution to the towers of Hanoi in C (or really any other language that doesn’t have proper tail calls).