Tail Call Elimination in Scala Monads

Update: Friday, August 23, 2013

This post is from 2011, but has seen a lot of traffic lately and drawn some comments that the solution given here is incomplete or “doesn’t work”. This post is definitely incomplete, but the solution does work. For the most up-to-date code, please see my paper Stackless Scala with Free Monads as well as the source code for scalaz.Free.


Continue reading


Monads are Dominoes

It’s about time I made my contribution to the zoo of monad tutorials. After explaining the concept of monads to somebody who is not a programmer or a mathematician, I was told that what I had explained sounded exactly like a game of dominoes.

However, instead of explaining monads in terms of a cute metaphor about dominoes, I’m just going to throw you into the deep end. Deep breath!

Definition: A category   is given by objects and arrows between these objects. Each arrow   has a unique source object   and a target object  . For every three objects  ,  , and  , and arrows   and  , there is a composite arrow

such that

Composition is associative: if  ,  , and  , then

There exists a unique identity arrow for each object For every object  , there exists an arrow     such that for every arrow  , we have    

A category for dominoes:

The objects of the category are the natural numbers. An arrow in the category is a domino piece with two numbers on it:  . A composition of arrows   is two domino pieces placed together end-to-end by the following equation:

The identity arrow for each object   is a domino piece   with the same number   on both ends.



A monad for dominoes

Definition: A monad is given by a functor   (from a category   to itself, i.e. an endofunctor in  ), equipped with natural transformations:


where   is the identity functor and x is functor composition. This must satisfy the axioms that     and   are identity transformations, and     (i.e.   is associative).

The dominoes need not be annotated with numbers, but might be annotated with any set of objects. Therefore, there exists an endofunctor   on the category of sets, such that for all sets  , we can construct the set of dominoes  .

For every object   in  , we can construct a domino  . This is   for our monad. Then, for the set of dominoes  , we can construct   which is the set of dominoes annotated with other dominoes. For any pair of dominoes in  , composed end-to-end, we can see it as a single domino in  . The   operation in our monad is then the fact that for any such pair of dominoes  , it behaves as the domino   with regard to composition with other dominoes in  .




Exercises left to the reader:

1. What does a monoid in the category of dominoes look like?

2. How would the game of dominoes change if it allowed products and coproducts?

3. Can you come up with a coherent theory describing the behavior of “spinners” in the game of dominoes?

Imperative vs Functional Programming

Recently I had a quasi-private discussion about philosophy in programming where somebody asked a question about functional programming. I’d like to relay part of the discussion here since it might be of interest to the community at large.

Jason wrote:

I have come out strongly in support of OO, and I think it has a very clear and strong basis in the facts of how computers work and how programmers think. In brief: Computers function by executing a series of instructions, at the level of machine language, which perform calculations and manipulate memory and I/O. In the earliest days of computing, programmers gave instructions to the computer directly at this level. As programs grew more complex, programmers needed ways to achieve greater unit-economy.

Often a set of variables truly do go together, such as a record representing an item for sale in a database, and certain functions operate primarily on this data. [This choice is not arbitrary.] Objects formalize this pattern and allow programmers to express it clearly and directly.

In general, imperative programming is rooted in the fact that computers actually run programs by executing a series of instructions.

I’d love to understand better the advantages of functional programming are. So to its advocates, I ask: What are the facts of reality that give rise to it? Note that it is not enough to explain, for instance, why it is mathematically *possible* to represent any given thing in any given mathematical way. What I want to know is: How is functional style rooted in the *purpose* of the software and the *nature of the thinking* that programmers do?

On the validity of the FP approach

The fact of reality that give rise to the FP approach is that computation is essentially a process of inferring quantities. Its purpose is to aid in human thinking. The conceptual faculty is a quantitative mechanism. This is laid out eloquently in David Harriman’s book “The Logical Leap”. See also “Concept-formation as a mathematical process” in OPAR. So the computer is in essence a quantifying device. Its invention was motivated by man’s need to quantify.

Calculating with numbers (quantities of something) gives rise to algebra, and importantly the abstraction “function”. And functional programming is simply writing programs using this abstraction. Functional programming was being done well before there were computers.

Imperatives are nonessential to computing. A mechanical computer has levers, wheels, and the like. We can imagine programming Babbage’s inference engine by turning a few wheels to set the premises and then moving a lever to make it infer the consequent. Similarly, in a digital computer we fill some registers with the premises and then pull a virtual lever in the form of a CPU instruction. But it’s important to note that “instruction” here is a metaphor. What’s really going on is the selection of a function which computes a quantity based on the input given.

So in summary, the purpose of the computer is to calculate with quantities as functions of other quantities. It is a mathematical tool. Even (especially?) for things like games. Drawing a complex 3D scene on the screen in real-time involves calculating a particular quantity of light of each color to be used to illuminate each pixel on the screen, given a particular time.

On the utility of the FP approach

Referential transparency buys you modularity (the extent to which components of a program can be separated and recombined) and compositionality (the extent to which the whole program can be understood by understanding the components and the rules used to combine them).

Type systems also get more sophisticated to the extent that programs are pure. It’s easier to infer types for pure programs than it is for impure ones, for example. You also gain things like type constructor polymorphism (a.k.a. higher-kinded types) and higher-rank types. It’s my experience that in a sufficiently sophisticated program (such as a compiler for a programming language), you either use these abstractions or you repeat yourself.

Sophisticated type systems then in turn aid in program inference, which is the extent to which the computer (or the programmer with the aid of the computer) can infer the correct program from type information (think tab completion, only moreso). Program inference is currently an active area of research.

As a side-note a lot of OO folks are discovering the functional approach as a tool to aid in modular design. They call it “dependency injection”, “inversion of control”, “interpreter pattern” and the like.

On the way programmers think

I believe that the way programmers think is deeply influenced by their chosen tools. They learn to think in the concepts that pertain to the tool. I think ultimately the argument that OO mirrors how programmers think is an appeal to intuition. But as we know, there’s no such thing as intuition. When people say “intuitive”, they usually just mean “familiar”.

A word on OO and the arbitrary

In OO, as commonly practiced, the choice of distinguished argument is arbitrary. Consider this function:

K(x, y) = x

If we were to write this in OO style, then on which object, x or y, should the function K be dispatched? Should it be x.K(y), or y.K(x)? It’s arbitrary. And before you say that the function K itself is an arbitrary invention, I say no. It’s actually a constructor of emtpy lists in this implementation of a singly linked list data structure:

Empty(x, y) = x
Cons(a, b, x, y) = y(a, b(x, y))

On “types of languages”

I want to get clear on some concepts. First the question of “types of programming languages”. I don’t think it’s helpful to divide programming languages into “functional”, “object-oriented”, “imperative”, etc. Sure, certain languages are better suited to certain styles of programming, but it’s important to differentiate on essentials and not mere aesthetics.

Functional programming is a restriction that the programmer puts on his programs. Having a language with a compiler that will warn you if you’re breaking referential transparency is helpful, but not essential. I do functional programming in Java, for example, which most people consider an “imperative OO” language.

We can also do “OO” in Haskell, a purely functional language (in the sense that all Haskell programs are referentially transparent expressions). Haskell is also a very good imperative language. Here’s a Haskell program in imperative style:

main = do {
  putStrLn "What is your name?";
  name <- readLn;
  putStrLn ("Hello, " ++ name ++ "!");

Is this a sequence of instructions, or a referentially transparent expression? It’s both! Another way of writing the same program is:

main = putStrLn "What is your name?" >> readLn >>= (\name -> putStrLn "Hello, " ++ name ++ "!")

It’s important to note that these are not two different programs in any sense. The Haskell compiler literally translates the former into the latter.

The >> and >>= functions concatenate IO actions in exactly the same sense that the ++ function concatenates Strings. It’s possible for us to do this because e.g. calling readLn doesn’t immediately read from standard input. It is a referentially transparent expression that returns an IO action. The operating environment translates this IO action into instructions for the underlying machine at some convenient time. Or it may in fact not do that. The implementation of the IO data structure is abstracted away in a library, and there’s no reason that somebody couldn’t supply a different implementation with the same interface. This may be useful if you’re programming a missile silo and you need to be able to test the launchTheMissile action without starting a nuclear war.

Another benefit is that main can be composed with other programs. We could, for example, write this program:

mainTwice = main >> main

Or this one:

mainForever = main >> mainForever

I hope that helps.

Towards an Effect System in Scala, Part 1: ST Monad

Referentially Transparent Mutable State

In their paper “Lazy Functional State Threads”, John Launchbury and Simon Peyton-Jones present a way of securely encapsulating stateful computations that manipulate mutable objects. The result is Haskell’s ST monad. Its definition is very similar to the State data type. In Haskell, the ST monad is used to thread the manipulation of mutable state in such a way that the mutation is completely referentially transparent, because it is a type error for a mutable object to escape the monad.

I would like to present an implementation of this in Scala, which I recently committed to the Scalaz library. I was inspired to write this by Tim Carstens last summer, but never found a way of encoding the requisite rank-2 types in Scala’s type system in such a way that what should work does and what shouldn’t doesn’t. But Geoff Washburn got me going again. Following the technique on his blog, of representing universal quantifiers as doubly negated existentials, I was able to encode ST in a way that’s surprisingly nice to use, and actually does give you type errors if you try to access a naked mutable reference. And as Mark Harrah has pointed out, we end up not having to use the double negation after all. I’m surprised to find that doing this in the obvious way in Scala, just works.

OK, let’s get to the money. In Scala, we can declare the ST data type as follows:

case class World[A]()

case class ST[S, A](f: World[S] => (World[S], A)) {
  def apply(s: World[S]) = f(s)
  def flatMap[B](g: A => ST[S, B]): ST[S, B] =
    ST(s => f(s) match { case (ns, a) => g(a)(ns) })
  def map[B](g: A => B): ST[S, B] =
    ST(s => f(s) match { case (ns, a) => (ns, g(a)) })

def returnST[S, A](a: => A): ST[S, A] = ST(s => (s, a))

This is a monad in the obvious way. The flatMap method is monadic bind and returnST is monadic unit.

The World type represents some state of the world, and the ST type encapsulates a state transformer which receives the state of the world and returns a value which depends on that state together with a new state. Here, we are representing the world state by nothing at all. It turns out that for what we want to do with the ST monad, the contents of the state are not important, but its type very much is. A much more detailed explanation of how and why this works is given in the paper, but the punchline is that we are going to “transform the state” by mutating objects in place, and in spite of this the state transformer is going to be a pure function. This is achieved by guaranteeing that the type S for a given state transformer is unique. More on that in a second.

Purely Functional Mutable References

A simple object that we can mutate in place is one that holds a reference to another object through a mutable variable.

case class STRef[S, A](a: A) {
  private var value: A = a

  def read: ST[S, A] = returnST(value)
  def write(a: A): ST[S, STRef[S, A]] = ST((s: World[S]) => {value = a; (s, this)})
  def mod[B](f: A => A): ST[S, STRef[S, A]] = for {
    a <- read
    v <- write(f(a))
  } yield v

def newVar(a: => A) = returnST(STRef(a))

So we have monadic combinators to construct, read, write, and modify references. Note that the implementation of write blatantly mutates the object in place. The definition of mod shows how to compose state transformers in sequence, using monad comprehensions.

It’s important that an STRef is parameterized on a type S which represents the state thread that created it. This makes variables allocated by different state threads have incompatible types. Therefore, state threads cannot ever see each other’s mutable variables. Because state transformers can only be composed sequentially (with flatMap), it’s guaranteed that two of them can never simultaneously mutate the same STRef.

Running a State Transformer as a Pure Function

Note that the type of a reference to a value of type A in a state thread S is ST[S, STRef[S, A]]. If ST had a run function of type ST[S, A] => A, we would be able to get the reference out. But this type is more general than we want. What we want is for the compiler to reject code like newVar(10).run, which would give you access to the naked STRef, but to accept code like newVar(10).flatMap(_.mod(x => x + 1).flatMap(read)).run, which simply accesses an integer.

In Haskell, the type of runST is:

runST :: forall a. (forall s. ST s a) -> a.

This is a rank-2 type which Scala’s type system does not directly support.

To see why this type would prevent the leaking of a mutable reference, consider the type you would need in order to get an STRef out of the ST monad.

forall a. (forall s. ST s (STRef s a)) -> STRef ??? a

What type should go in place of the three question marks? There is no type that could possibly fit the bill because the type s is bound (introduced) by the universal quantifier to the left of the arrow. It’s a local type variable in the domain of the function, so it can’t escape to the codomain. This is why ST state transformers are referentially transparent.

Of course, if you get the value out of a reference, then you can run that just fine. In Scala terms, you can always go from ST[S, A] to A, but you can never go from ST[S, F[S]] to F[S] for any F[_].

Writing runST in Scala

So the problem becomes how to represent a rank-2 polymorphic type in Scala. I’ve shown before how we can represent a rank-2 function type by encoding it as a natural transformation. And Mark has posted on how to write natural transformations using universally quantified values. (And I just now realized that he’s using functional state threads for non-observable mutation!)

First, we need a representation of universally quantified values:

trait Forall[P[_]] {
  def apply[A]: P[A]

Now that we have rank-2 polymorphism, the implementation of runST is straightforward:

  def runST[A](f: Forall[({type λ[S] = ST[S, A]})#λ]): A =

I’m using the “type lambda” trick here to declare the type constructor inline. The realWorld object is just a dummy value.

Some Examples

Here’s a simple example of a computation that creates a mutable reference and mutates it:

def e1[S]: ST[S, STRef[S, Int]] = for {
  r <- newVar[S, Int](0)
  x <- r.mod(_ + 1)
} yield x

And this expression creates a reference, mutates it, and then reads the value out:

def e2[A] = e1[A].flatMap(_.read)

Running the latter expression is fine, since it just returns an Int:

runST(new Forall[A] { def apply[A] = e2 })

But running the former fails at compile-time because it exposes a mutable reference. Or rather, because when the compiler tries to unify with our existential type, it’s out of scope:

runST(new Forall[({type λ[S] = ST[S, STRef[S, Int]]})#λ] { def apply[A] = e1 })

found   : scalaz.Forall[[S(in type λ)]scalaz.ST[S(in type λ),scalaz.STRef[S(in type λ),Int]]]
required: scalaz.Forall[[S(in type λ)]scalaz.ST[S(in type λ),scalaz.STRef[_ >: (some other)S(in type λ) with (some other)S(in type λ), Int]]]

What are the practical implications of this kind of compile-time checking? I will just quote Peyton-Jones and Launchbury:

It is possible to encapsulate stateful computations so that they appear to the rest of the program as pure (stateless) functions which are guaranteed by the type system to have no interactions whatever with other computations, whether stateful or otherwise (except via the values of arguments and results, of course).

Complete safety is maintained by this encapsulation. A program may contain an arbitrary number of stateful sub-computations, each simultaneously active, without concern that a mutable object from one might be mutated by another.

This can be taken much further than these simple examples. In Scalaz, we have STArrays, which are purely functional mutable arrays. There’s an example of a pure binsort which uses a mutable array for sorting.

This technique can be extrapolated to implement Monadic Regions (currently underway for Scalaz), which allows compile-time tracking of not just mutable arrays and references, but file handles, database connections, and any other resource we care to track.

What we have here then is essentially the beginnings of an effect system for Scala. This allows us to compose programs from referentially transparent components which are internally implemented with mutation and effects, while those effects are guaranteed by the type system to be transparent to the rest of the program.

Simple SKI Combinator Calculus in Scala’s Type System

I would like to make two claims:

  1. The Scala type system is turing complete.
  2. This is not a bug.

The SKI combinator calculus is known to be turing equivalent:

  trait λ { type ap[_<:λ]<:λ }
  type I = λ{type ap[X<:λ] = X }
  type K = λ{type ap[X<:λ] = λ{type ap[Y<:λ] = X }}
  type S = λ{type ap[X<:λ] = λ{type ap[Y<:λ] = λ{type ap[Z<:λ] = X#ap[Z]#ap[Y#ap[Z]] }}}

And we can blow up the compiler with unbounded recursion, without resorting to circular definitions:

type Y = S#ap[I]#ap[I]#ap[S#ap[I]#ap[I]]

The latter should crash the compiler with a stack overflow error.

Hat tip: Edward Kmett

Functional Programming for Beginners

Video of a talk given at Boston Area Scala Enthusiasts’ Jan 2011 meeting at Google Cambridge. Covers what FP actually means in unambiguous terms, and how to perform various programming tasks while staying pure.

Slides are here:


Code for the IO data structure mentioned but not shown at the end of the talk is given here:

  sealed trait IO[A] 
  case object ReadLn extends IO[String]
  case class WriteLn(s: String) extends IO[Unit]
  case class Const[A](a: A) extends IO[A]
  case class Compose[A, B](a: IO[A], f: A => IO[B]) extends IO[B]

  // An example program which reads a line and writes it out again
  def readThenWrite: IO[Unit] =
    Compose(ReadLn, WriteLn)

  // An implementation for IO programs which turns them into side-effects
  def run[A](program: IO[A]): A = program match {
    case ReadLn => readLine
    case WriteLn(s) => println(s)
    case Const(a) => a
    case Compose(a, f) => run(f(run(a)))

  // An implementation for IO programs which turns them into list transitions
  def run[A](program: IO[A], in: List[String], out: List[String]): (A, List[String], List[String]) =
    program match {
      case ReadLn => (in.head, in.tail, out)
      case WriteLn(x)  => ((), in, x :: out)
      case Const(a) => (a, in, out)
      case Compose(ioa, f) => {
        val (a, tmpi, tmpo) = run(ioa, in, out)
        run(f(a), tmpi, tmpo)

Type-Level Programming in Scala, Part 8c: KList ZipWith

One use of KList and the ‘transform’ and ‘down’ methods from 8b is to implement methods like ‘zipWith’ for arbitrary tuple lengths. To start with, the signature of zipWith for a pair of Streams, operating on a fixed arity of 2, looks like:

def zipWith2[A,B,C](t2: (Stream[A], Stream[B]))(f: (A,B) => C): Stream[C] =
   t2 match
      case (ha #:: ta, hb #:: tb) => Stream.cons( f(ha, hb), zipWith2( (ta, tb) )(f) )
      case _ => Stream.empty

Example usage:

val nats = Stream.from(1)
val random = Stream.continually( math.random )
val seq = zipWith2( (nats, random) ) { (n, r) => if(r > 0.3) n else -n }

scala> seq.take(10).toList
res0: List[Int] = List(-1, 2, 3, 4, 5, 6, -7, -8, 9, 10)

For the implementation of zipWith2, if either Stream in the pair is empty, the resulting Stream is empty. Otherwise, there is a head element for each stream in the pair. We apply the provided function to these elements and make the result the head of a new Stream. The tail of this new Stream will be the result of recursing on the tails of the input pair.

To generalize this to arbitrary arity, we will operate on a KList of Streams. Because we want to abstract over arity, we use a heterogeneous list. We use KList instead of HList because we want to constrain each element in the tuple to be a Stream and we don’t care what the specific types of Streams the elements are, but we do want to preserve those types. When we take the head element of each Stream, the resulting list is the underlying HList type of the input KList. For example, given an input of type KList[Stream, A :: B :: HNil], when we take the head of each Stream in the KList we will get an HList of type A :: B :: HNil. This is like going from (Stream[A], Stream[B]) to (A,B).

So, if we end up with the underlying HList type, the function we will apply to the input KList must be a function from that HList type to some other type. In the example above, the function type would be A :: B :: HNil => T for some type T, which will be the type of the output Stream. With this, we have our signature for a generalized zipWith:

def zipWith[HL <: HList, T](kl: KList[Stream, HL])(f: HL => T): Stream[T]

To implement this function, we again break the problem into two parts. If any Stream is empty, the resulting stream is empty. Otherwise, we get all the head elements of the Streams as an HList, apply the input function to it, and make this the new head. For the tail, we get all of the tails of the Streams and recurse. To get the head elements, we use ‘down’ because we want KList[Stream, HL] => HL. For the tails, we use 'transform' because we need a mapping KList[Stream, HL] => KList[Stream, HL]. The implementation looks like:

def zipWith[HL <: HList, T](kl: KList[Stream, HL])(f: HL => T): Stream[T] =
      Stream.cons( f( kl down heads ), zipWith(kl transform tails )( f ) )

def anyEmpty(kl: KList[Stream, _]): Boolean = kl.toList.exists(_.isEmpty)

val heads = new (Stream ~> Id) { def apply[T](s: Stream[T]): T = s.head }
val tails = new (Stream ~> Stream) { def apply[T](s: Stream[T]): Stream[T] = s.tail }

The toList function on KList has type KList[M, HL] => List[M[_]] and has a trivial implementation. Since List is homogeneous, we can’t preserve each individual cell’s type, but we can at least use the common type constructor. In ‘zipWith’, this means we can call the ‘isEmpty’ method on the elements of the list but we would not get a very specific type if we called ‘head’, for example. ‘heads’ and ‘tails’ are natural transformations that map a Stream[T] to its head of type T and its tail of type Stream[T], respectively.

The original example translated to use the generalized zipWith looks like:

val nats = Stream.from(1)
val random = Stream.continually( math.random )
val seq = zipWith( nats :^: random :^: KNil ) {
   case n :: r :: HNil => if(r > 0.3) n else -n

scala> seq.take(10).toList
res0: List[Int] = List(-1, 2, -3, -4, -5, 6, 7, 8, 9, 10)

We can implement the related ‘zipped’ function in terms of ‘zipWith’.

def zipped[HL <: HList](kl: KList[Stream, HL]): Stream[HL] =
   zipWith(kl)(x => x)

Or, we could have implemented zipWith in terms of zipped. In any case, we can implement several other functions using zipped:

def foreach[HL <: HList, T](kl: KList[Stream, HL])(f: HL => T): Unit =

def collect[HL <: HList, T](kl: KList[Stream, HL])(f: HL => Option[T]): Stream[T] =

def flatMap[HL <: HList, T](kl: KList[Stream, HL])(f: HL => Stream[T]): Stream[T] =

def forall[HL <: HList](kl: KList[Stream, HL])(f: HL => Boolean): Boolean =
def exists[HL <: HList](kl: KList[Stream, HL])(f: HL => Boolean): Boolean =

An example using ‘foreach’:

val a = Stream(1,2,5,3,9,10,101)
val b = Stream("one", "two", "three", "four")
val c = Stream(true, false, false, true, true)

zipped(a :^: b :^: c :^: KNil) foreach {
   case n :: s :: b :: HNil =>
      println( s * (if(b) 1 else n) )