Tail Recursion: From Python to OCaml

Practical exploration of TCO

Oct 08, 2023

DALL-E 3 thought this image illustrated the article well. It’s kinda cute.

I’ve been learning about compilers lately and about how call stacks work. This got me wondering how functional programming languages, where everything is handled recursively deal with maximum call stack size.

Recursion in Python

Let’s start with something familiar and write a valid albeit impractical Python function to sum numbers from 1 to n.

def recursive_sum(n: int) -> int:
    if n <= 0:
        return 0
    return n + recursive_sum(n-1)

It feels weird but it’s pretty much exactly how we’d write this in a functional language like Haskell or OCaml. When I run this with sufficiently large input, 1000 is enough on my machine, I get an error:

RecursionError: maximum recursion depth exceeded.

Every time a function call is made, CPython pushes a new stack frame to the call stack. Among other things, each frame contains function arguments and local variables. When the function returns, the stack frame can be popped from the call stack.

Here’s what the call stack looks like for n = 5.

recursive_sum(5)
5 + recursive_sum(4)
5 + (4 + recursive_sum(3))
5 + (4 + (3 + recursive_sum(2)))
5 + (4 + (3 + (2 + recursive_sum(1))))
5 + (4 + (3 + (2 + (1 + recursive_sum(0))))
5 + (4 + (3 + (2 + (1))))
5 + (4 + (3 + (3)))
5 + (4 + (6))
5 + (10)
15

So why do we get a RecursionError?

Python assumes that a function that exceeds the recursion limit is incorrect — maybe the user forgot to add a base case. That’s a good thing and lets users handle the exception. But in our case, we know that recursive_sum is valid and would terminate if CPython didn’t stop it.

The other part is memory usage. Each stack frame takes up memory. It’s more graceful to let the user handle the exception than for the program to unexpectedly crash because it ran out of memory.

Turns out, there is a way to get pseudo-tail recursion in Python by leveraging exceptions to unwind the stack. Thanks to Chris Penner’s article for explaining this well.

Tail recursion

Functional programming languages do something pretty smart for recursive functions that meet a certain form— the current stack frame is reused for the next call.

First, let’s rewrite our Python function into a tail-recursive form.

def tail_recursive_sum(n: int, accumulator: int) -> int:
    if n <= 0:
        return accumulator
    return tail_recursive_sum(n-1, accumulator + n)

All I did was introduce an accumulator to the function parameters. Then whenever it makes a recursive call, the accumulator is incremented.

Let’s illustrate how this works.

tail_recursive_sum(5, 0)
| tail_recursive_sum(4, 0 + 5)
| | tail_recursive_sum(3, 5 + 4)
| | | tail_recursive_sum(2, 9 + 3)
| | | | tail_recursive_sum(1, 12 + 2)
| | | | | tail_recursive_sum(0, 14 + 1)
| | | | | 15
| | | | 15
| | | 15
| | 15
| 15
15

Notice that once the function reaches the base case, it immediately has the result. However, Python does not take advantage of this. If we run the function above with n = 1000, we get the familiar maximum recursion depth exceeded.

What we can notice though, is that previous stack frames don’t serve any purpose. Compilers that optimize tail recursion take advantage of this fact to reuse the existing stack frame! The call stack then looks something like this:

tail_recursive_sum(5, 0)
tail_recursive_sum(4, 0 + 5)
tail_recursive_sum(3, 5 + 4)
tail_recursive_sum(2, 9 + 3)
tail_recursive_sum(1, 12 + 2)
tail_recursive_sum(0, 14 + 1)
15

Next, let’s see how this works in a language that supports it!

Tail recursion in OCaml

Here’s our tail_recursive_sum written in OCaml.

To explain the syntax, we define a function with the tail_recursive_sum name with two parameters n and accumulator. We use the rec keyword to tell the compiler that it’s a recursive function. Note that there’s no explicit return statement but the last expression is implicitly returned.

let rec tail_recursive_sum n accumulator =
    if n <= 0 then accumulator
    else tail_recursive_sum (n-1) (accumulator + n)

For good measure, let’s test it with10^9,which correctly returns 500000000500000000

Woohoo! That’s pretty remarkable. It would work for larger values as well, but we’d soon need to start worrying about integer overflow.

There’s one last loose end to tie up. To verify that this behavior is indeed due to tail recursion optimization and not a fluke, let’s try using the non-tail recursive form that we used before.

let rec sum n =
    if n <= 1 then 1
    else n + sum (n-1)

As we would hope, we start getting Fatal error: exception Stack_overflow errors starting at around n = 10^6. Great!

If you enjoyed this shorter write-up, consider subscribing. I write technical deep dives, often about implementing complex software from scratch to properly understand its inner workings.

Maybe take a look at one of my other posts?