Heterogeneous Lists and the Limits of the Java Type System

Today, we’re going on a journey. It is a sojourn to the outer limits of the expressiveness of the Java type system, and to the edge of what can be considered sane programming. This is definitely one for the power users. You will need a firm grasp of the Java language, and an iron constitution for type annotations. But the reward will be something far greater than any treasure: understanding, entertainment, and perhaps even enlightenment. Remember that we choose to do these things in Java, not because they are easy, but because they are hard. Now then, to the ships.

A Most Versatile Vessel

In Java, we can create a list that contains values of type A, by constructing a value of type List<A>. The type system will enforce that each element in the list is in fact of type A. But sometimes we want lists of values that aren’t necessarily of the same type. Normally, for such a purpose, we would use a heterogeneous list, which in Java is just the raw list type List<?> or List<Object>. Since every class in Java is a subclass of Object (and now that we have autoboxing), such a list can contain any Java value. There are many kinds of situation where this would be necessary. For example, a row of database results will comprise values that are not all of the same type.

However, there’s a problem with the raw list approach. In using the List<?> type, we are dispensing with the type system. When you get a value from the list, how do you know what it is? How do you know which operations it supports? Well, you will have to defer that discovery until runtime, and use explicit type casting. Most will shrug at this and say: “So what?” After all, this is what we did anyway, before generics. Ah, but what if we don’t have to? Can we create generic heterogeneous collections that are type-safe? Yes, we can. Sort of.

Products of Types

What we would like to see is if it’s possible to declare some constraints on the types of a heterogeneous collection, to achieve essential type-safety while maintaining the extensibility of a list. Of course, it’s easy to create types that are the product of two or more types:

public abstract class P2<A, B> {
  public abstract A _1();
  public abstract B _2();
}

But the length of this kind of product is as fixed as the length of a string in Pascal. It isn’t extensible, so it’s more like a type-safe heterogeneous array than a list. If you want products of different lengths, you will need to declare separate classes for P3<A, B, C>, P4<A, B, C, D>, etc. What we’re trying to achieve is a product of arbitrary length, whose length might even vary at runtime. There’s no reason we couldn’t create products of products in a chain, like P2<A, P2<B, P2<C, D>>>, and this is more or less the approach that we will take.

Introducing HList

To achieve our goal, we’re going to implement linked lists in the type system. Let’s remind ourselves what a linked list looks like. A List<T> is essentially either the empty list or a value of type T paired with a List<T>. In Java, using the List<A> type from Functional Java, an unsafe heterogeneous list might be constructed in a manner like the following:

List<?> x = cons("One", cons(2, cons(false, nil()));

The cons method constructs a list, and the nil method returns the empty list. With just these two methods, we can create any homogeneous list. A list has two methods to access its members, head() which returns the first element, and tail() which returns the rest of the list. Getting the head or tail of the empty list is an error at runtime.

Let’s now take a step up into the type system, and say that a list of types is either the empty list or a type paired with a list of types. This gives rise to our heterogeneous list type:

public abstract class HList<A extends HList<A>> {
 private HList() {}

 private static final HNil nil = new HNil();

 public static HNil nil() {
   return nil;
 }

 public static <E, L extends HList<L>> HCons<E, L> cons(final E e, final L l) {
   return new HCons<E, L>(e, l);
 }

 public static final class HNil extends HList<HNil> {
   private HNil() {}
 }

 public static final class HCons<E, L extends HList<L>> extends HList<HCons<E, L>> {
   private E e;
   private L l;

   private HCons(final E e, final L l) {
     this.e = e;
     this.l = l;
   }

   public E head() {
     return e;
   }

   public L tail() {
     return l;
   }
 }
}

That’s not a lot of code, and it’s all relatively straightforward Java. The HList class is parameterised with a parameterised subclass of itself. There are only two concrete subclasses of HList that can possibly occupy that slot: the type HNil and the type constructor HCons. These represent the empty list and the list constructor, respectively. HCons takes two type parameters, the first representing the first element of the list, and the second being another HList, allowing us to form a chain of them. HNil does not take type parameters, so it terminates the chain.

As with regular old lists, you can access the head() and tail() of the list. Note, however, that the fact that you cannot get the head or tail of the empty list is now enforced by the type system. There’s a nil method to get the empty list, and a cons method to construct a nonempty list, just like with regular lists.

Here’s an example of how we would construct a heterogeneous list using this new type:

HCons<String, HCons<Integer, HCons<Boolean, HNil>>> x = cons("One", cons(2, cons(false, nil()));

This is more verbose than the unsafe version before, but not by much. Obviously, the HList example assumes a static import of HList.cons and the List<?> example assumes a static import of List.cons. Using the type-safe version is, however, much nicer. Compare these two contrived examples:

if (x.tail().tail().head()) {
  return x.head().length() == x.tail().head();
}

if ((boolean) x.index(3)) {
  return ((String) x.head()).length() == (int) x.index(2);
}

The latter, of course, offers no static guarantees and may throw ClassCastExceptions, or we might inadvertently get the head or tail of the empty list at runtime. The former will always work as long as it compiles, guaranteed.

Concatenating HLists

Now let’s do something more interesting with these lists. Notice that the cons methods for both type-safe and unsafe lists prepend an element to a list rather than appending. Sometimes we want to append a list to the end of another. This is unsurprisingly uncomplicated for unsafe lists:

List<?> c = a.append(b);

Behind the scenes, we can think of append as reversing the first list and consing each element to the second list in reverse order. Doing that for HList is a little more involved. We have to construct a chain of types in exactly the right way, at compile-time.

Appending an HList to another is a function that takes two HList-valued arguments and returns an HList. Using first-class functions from Functional Java, the append operation for HLists of specific types L and R, would be a function of the following type:

F2<L extends HList<R>, L extends HList<L>, LR extends HList<LR>>

Where LR is the type of the concatenated HList. Now, since we necessarily have the two arguments, we know the specific types of L and R. Since Java doesn’t have type inference, it cannot automatically figure out the specific type of LR. We will have to supply it as type annotation. Not to worry. Even though Java doesn’t infer types, it can be coerced into doing some type arithmetic. All we have to do is a little inductive reasoning.

Types as Formulae

According to the Curry-Howard isomorphism, a program is a proof, and the hypothesis that it proves is a type for the program. In this sense, Java’s type system is a kind of crude theorem prover. Put another way, a type is a predicate, and values of that type represent the terms for which the predicate holds. The function type above therefore asserts that for any two HLists, L and R, there exists some program to derive the HList LR. The function type by itself does not put any constraints on LR, however. It can be derived by any function, not just the concatenation function. We will remedy that presently. We need a formula that states that the two types L and R imply a third type LR which is the HList concatenation of L and R, given some concatenation function. Here is the type that represents that formula:

public static final class HAppend<L, R, LR> {
    private final F2<L, R, LR> append;

    private HAppend(final F2<L, R, LR> f) {
      append = f;
    }

    public LR append(final L l, final R r) {
      return append.f(l, r);
    }
}

At this point, HAppend is still just a hypothesis without evidence. Remember that a value of a type is proof of the formula that it represents. So we will need to supply two proofs in the form of constructors for values of this type; one for the base case of appending to the empty list HNil, and another for the case of appending to an HCons. The base case is easy. Appending anything to the empty list should result in that same thing. So the HAppend constructor for appending to the empty list looks like this:

    public static <L extends HList<L>> HAppend<HNil, L, L> append() {
      return new HAppend<HNil, L, L>(new F2<HNil, L, L>() {
        public L f(final HNil hNil, final L l) {
          return l;
        }
      });
    }

The case for the nonempty list is not quite as easy. Consider its type:

    public static <X, A extends HList<A>, B, C extends HList<C>, H extends HAppend<A, B, C>>
                         HAppend<HCons<X, A>, B, HCons<X, C>> append(final H h)

Read the return type first. This returns an HAppend that appends some B to an HCons<X, A>. The type of the head of the first list (X) becomes the type of the head of the concatenated list. The tail of the concatenated list is C. The type constraints state that C must be an HList, and that there must exist some way to append B (the second list) to A (the tail of the first list) so that they make C. We must supply proof that this last constraint holds, and you’ll see that such a proof is in fact supplied as an argument (in the form of the value h).
What this is saying is that, given the premise that A and B can be concatenated, the concatenation of HCons<X, A> and B can be inferred. A value of type HAppend<A, B, C> is precisely proof of the hypothesis that A and B can be concatenated, since there are only these two cases and we’ve supplied a proof for both. In other words, if we can append to the empty list, then that’s proof enough that we can append to a list of one element, which proves that we can append to a list of two elements, and so on. Given this, we can construct a chain of proofs. This concatenated proof, then, is a function that concatenates lists of the corresponding types.

OK, so how do we use this? Well, here’s an example program that appends one list to another:

public class HList_append {
  public static void main(final String[] args) {
    // The two lists
    final HCons<String, HCons<Integer, HCons<Boolean, HNil>>> a =
      cons("Foo", cons(3, cons(true, nil())));
    final HCons<Double, HCons<String, HCons<Integer&#91;&#93;, HNil>>> b =
      cons(4.0, cons("Bar", cons(new Integer[]{1, 2}, nil())));

    // A lot of type annotation
    final HAppend<HNil, HCons<Double, HCons<String, HCons<Integer&#91;&#93;, HNil>>>,
      HCons<Double, HCons<String, HCons<Integer&#91;&#93;, HNil>>>> zero = append();
    final HAppend<HCons<Boolean, HNil>, HCons<Double, HCons<String, HCons<Integer&#91;&#93;, HNil>>>,
      HCons<Boolean, HCons<Double, HCons<String, HCons<Integer&#91;&#93;, HNil>>>>> one = append(zero);
    final HAppend<HCons<Integer, HCons<Boolean, HNil>>, HCons<Double, HCons<String, HCons<Integer&#91;&#93;, HNil>>>,
      HCons<Integer, HCons<Boolean, HCons<Double, HCons<String, HCons<Integer&#91;&#93;, HNil>>>>>> two = append(one);
    final HAppend<HCons<String, HCons<Integer, HCons<Boolean, HNil>>>,
      HCons<Double, HCons<String, HCons<Integer&#91;&#93;, HNil>>>,
      HCons<String, HCons<Integer, HCons<Boolean, HCons<Double, HCons<String, HCons<Integer&#91;&#93;, HNil>>>>>>>
      three = append(two);

    // And all of that lets us append one list to the other.
    final HCons<String, HCons<Integer, HCons<Boolean, HCons<Double, HCons<String, HCons<Integer&#91;&#93;, HNil>>>>>>
      x = three.append(a, b);

    // And we can access the components of the concatenated list in a type-safe manner
    if (x.tail().tail().head())
      System.out.println(x.tail().tail().tail().tail().tail()[1] * 2); // 4
  }
}

Holy pointy brackets, Batman! Do we really need all of that? Well, look at what it’s doing. It’s constructing a concatenation function of the appropriate type, by supplying the premise at each step. If this seems mechanical, then that’s because it is. There is only one possible implementation for the HAppend we need, but Java does not have any mechanism for figuring this out, nor does it provide a facility for the programmer to tell it how.
Contrast that to Scala. The above is a clear example of where Scala’s implicit arguments come in handy. If we import this to Scala, we can make both of the append functions implicit, and we can further make the H argument to the append function for nonempty lists implicit as well. There can be only one possible implementation of each function, so it can be declared once and used implicitly wherever proofs of the corresponding types are required. Jesper Nordenberg has implemented an HList library for Scala that demonstrates this well. With implicits and Scala, the whole middle section of our program is condensed from 12 lines of type annotations to just:

 val x = a.append(b)

Now, if you’re really into this Java stuff, you’re probably thinking: “implicits are just dependency injection”. Well, in a sense, you would be right. Both dependency injection and inheritance are degenerate forms of implicits. However, there is currently no dependency injection framework for Java that can abstract over type constructors such that it provides injection of parameterised types with injection of type parameters also. If you can prove me wrong, by all means send me evidence in the form of working code.

Conclusion

Clearly, Java is not very useful for this kind of type-safe programming. I was actually quite surprised that you can do this in Java at all, but we’ve definitely hit the outer boundary of what can be considered reasonably expressible.

The code you’ve seen in this article uses the new HList package that was released with Functional Java 2.16. And is based on the Haskell HS library by Oleg Kiselyov.

Higher-Order Java Parallelism, Part 3: Threadless Concurrency With Actors

Multi-core processing is changing the way we write programs. Modern computers come with multi-core or multiple processors, and modern applications no longer do just one thing at a time. But all of this has left Java a little bit behind. While languages like Erlang and Haskell, even other JVM languages like Scala, have powerful concurrency abstractions, concurrent programming in Java is still mostly based around the threading practices of way back in the time of applets.

Threading in Java has long been the domain of the brave or foolish, weaving a brittle and intricate web of manual synchronization between threads, or threads waiting on other threads to feed them work. We all know the examples from old Java books where there might be a thread running in a loop, sleeping for a second, and waking up to check if work needs to be done. This kind of implementation is still being taught to newcomers to the Java language. And when they start writing their own applications in Java, they find that this approach does not scale. Not along the axis of program complexity, nor the axis of the number of threads. In a complex application or one with many threads, you may end up with a program that stops doing anything at all for long periods of time, or worse, hangs forever, while still consuming all the operating system resources that go with those threads.

Blocking vs. Nonblocking Concurrency

The new concurrency library in Java 5 help this situation a great deal. At least some advancement has been made towards better concurrency abstractions. But there are still pitfalls. For instance, as we’ve seen, the Future and Callable interfaces are really rather jolly useful, and they can take us a long way towards the kinds of things possible in those other languages. But at the end of the day, something has to call Future.get(), and that something will block unless the value is already available, until such time that it is. This can result in deadlocks or starvation, as you may end up in a situation where all threads are blocking on Futures and none are available to advance your program forward. This would be bad. In fact, we could say that in a highly concurrent system, blocking on shared state is a catastrophic event.

The Java libraries are a veritable minefield of methods that, ultimately, block on shared memory monitors. There’s Thread.join(), Future.get(), BlockingQueue.take(), and the list goes on. But what if we could solve this problem with a simple abstraction? Could we solve the blocking problem once and re-use our solution for different situations? Let’s see.

Instead of sleeping or blocking, what we really want is the ability for our code to say: “When X happens, wake up a thread to execute me.” That is, in some sense we want threads to go on doing useful work, and have them be notified when an event happens that requires more work to be done. For example, when Bob comes into my office and asks me for this month’s TPS reports, he doesn’t stand around and wait for them, nor does he check periodically to see if I have them done, and he certainly doesn’t go to sleep to periodically wake up and poll me for them. He will continue with his work (or whatever it is he does all day), since I have instructions to send him those TPS reports as soon as they’re ready. We’re independent and concurrent actors, so we don’t wait on each other.

There is a model of computation, called the Actor Model, that works very much the same way. This is the model employed in the design of Erlang, and it was a major motivator for languages like Simula and Scheme. An implementation of Actors comes with the standard Scala library, and as it happens an implementation of it for Java has just been released as part of the 2.8 version of the Functional Java library. You can read more about the actor model on your own, but I will explain in some detail how it works in Functional Java.

Algorithm + Strategy = Parallelism

The first thing to explain is Parallel Strategies. I’ve talked about them before, but I’ll do so again here since they’re important to what I’m trying to demonstrate. The idea of the Parallel Strategy is that we can capture a threading pattern, or some method of concurrency in general, in a separate abstraction called a Strategy, and write our concurrent programs independently of the particular threading pattern.

In Java terms, a Strategy<A> is a way of turning any Java expression, whose evaluated type is A, into a value of type A. How this happens is implementation-specific, but we’d like it to be concurrent. First, we need a way of capturing the idea of an unevaluated A. It’s a bit like the idea behind Runnable, only that we can get a value out of it. Java 5 has the Callable interface, which is really close to representing “any expression”, since we can create Callables anonymously, but its call() method throws Exceptions, creating a kind of barrier for the programmer. Instead, we’ll use fj.P1<A>. It’s an abstract class with one abstract method: A _1()

The Strategy class has an instance method called par(P1<A>). This will evaluate the given P1 concurrently, immediately returning another P1 that can get the value once it’s ready. Behind the scenes, it actually creates a Callable, turns it into a Future, then returns a P1 that calls Future.get(), but all of that is implementation detail. Strategy comes with some static construction methods that implement basic threading patterns, and you can implement your own by passing a Future-valued function to the strategy(F<P1<A>, Future<A>>) method.

You might be concerned that P1 can’t throw any checked exceptions. This is by design. You’ll find that once you start working with concurrent effects, the value of checked exceptions goes out the window. Besides, if our P1 did need to throw errors, we would use Lazy Error Handling and declare this in the P1's type.

Strategy + Effect = Actor

So now we have concurrent evaluation and parallel transformations. But we’re still ending up with objects that require us to call a blocking method to get their value. The only way around that is to not return any values at all. Instead, we’re going to let our P1s have side-effects. So instead of returning a P1 that lets us wait for the value, our P1s are now going to update a variable or send the value somewhere.

To model a P1 that doesn’t return a useful value, I’m going to use the Unit type. It’s another one of those trivial but useful classes. The Unit type is part of Functional Java, and it’s simply a type that only has one possible value. It’s a lot like the standard library’s Void type, except Void is even further impoverished. When you use this type, you’re saying that the actual value is immaterial. So to describe a P1 that doesn’t return anything useful, we use the type P1<Unit>. We can also describe a transformation from some type T to Unit, with F<T, Unit>.

Functional Java comes with a more honest version of F<A, Unit>, which is Effect<A>. It’s more honest in the sense that it doesn’t pretend to be a transformation, which properly should have no side-effects. Effect<A> is explicit about having side-effects, which is what we intend for Actors. It has one method that doesn’t return a value at all: void e(A a).

We now have what we need to instantiate the Actor class. Here’s an example of how to create and use a simple actor:


  Strategy<Unit> s = Strategy.simpleThreadStrategy();

  Actor<String> a = Actor.actor(s, new Effect<String>()
    {public void e(String s)
      {System.out.println(s);}});

  a.act("Hello, actors!");


The actor receives “messages” on its act method, and the Strategy serves as a kind of mailbox for it. You will note that there’s no dependency at all on any particular threading strategy or any part of the concurrency library by the Actor class. The strategy could be sending our Effects to be evaluated by Threads, remote server farms, by ForkJoin tasks, or even by a Mechanical Turk.

An Actor With its Own Mailbox

The Actor class by itself doesn’t yet quite achieve the kind of actors model you see implemented in Erlang or Scala. The important difference is that the Actor above can process multiple “messages” simultaneously. This solution, then, is more general, although if an actor such as the above mutates some state, we’re likely to run into race conditions. Not to worry, it’s easy to construct the more specific case. We just add a queue and ensure that only one message is processed at a time. This construct is available as part of Functional Java, and it’s called QueueActor.

Of course, the “one thread at a time” requirement is not implemented using any blocking or synchronization. Instead, The QueueActor has two possible states–“suspended” or “running”–and, behind the scenes, this is enforced with an AtomicBoolean to keep it consistent in the face of concurrency. If the actor is suspended when it receives a message, it becomes running and its Effect is immediately handed off to its Strategy. If it’s already running, then callers will leave a message on its queue. The QueueActor's Effect is a concurrent, threadless recursion (i.e. it uses a Strategy rather than a Thread) that completely empties the queue, then puts the QueueActor's state back to “suspended”.

QueueActor puts some sanity into managing locally mutable state within an actor’s Effect, since it’s ensured that the state can only be mutated by one thread at a time. It is guaranteed to act on its messages in some unspecified order, but is otherwise semantically equivalent to Actor.

The Obligatory Example

We now have a light-weight implementation of the actor model, in Java. Don’t believe it? Have a look at this implementation of the canonical Ping-Pong example (imports omitted), and compare it to similar examples for Erlang and Scala.

First, a Ping actor that sends a number of pings to a given Pong actor, waiting for a pong reply each time before sending the next ping.


  public class Ping
    {private final Pong pong;
     private final Actor<Pong> ping;
     private final Actor<Integer> cb;
     private volatile int n;

     public Ping(final Strategy<Unit> s, final int i, final Pong pong, final int id, final Actor<Integer> callback)
       {n = i;
        this.pong = pong;
        cb = callback;
        ping = actor
          (s, new Effect<Pong>() {public void e(final Pong pong)
            {n--;
             if (n > 0)
                pong.act(Ping.this);
             else // Done. Notify caller.
                cb.act(id);}});}

    // Commence pinging
    public P1<Unit> start()
      {return pong.act(this);}

    // Receive a pong
    public P1<Unit> act(final Pong p)
      {return ping.act(p);}}

The Pong actor simply receives ping messages and responds.


  public class Pong
    {private final Actor<Ping> p;

    public Pong(final Strategy<Unit> s)
      {p = actor(s, new Effect<Ping>()
        {public void e(final Ping m)
          {m.act(Pong.this);}});}

    // Receive a ping
    public P1<Unit> act(final Ping ping)
      {return p.act(ping);}}

And here’s the main program that uses the Ping and Pong actors. There’s only one Pong actor that responds to any number of Ping actors pinging it concurrently. There’s also a further QueueActor that is contacted by each Ping actor once that Ping actor has done its work. The example uses a thread pool to back its Strategy. When all the Ping actors have sent all their pings and received all their pongs, the program is terminated by shutting down the thread pool.


  public class PingPong
    {private final int actors;
     private final int pings;
     private final Strategy<Unit> s;
     private final Actor<Integer> callback;
     private volatile int done;

     public PingPong(final ExecutorService pool, final int actors, final int pings)
       {this.actors = actors;
        this.pings = pings;
        s = Strategy.executorStrategy(pool);

        // This actor gives feedback to the user that work is being done
        // and also terminates the program when all work has been completed.
        callback = QueueActor.queueActor
          (s, new Effect<Integer>() {public void e(final Integer i)
            {done++;
             if (done >= actors)
               {System.out.println("All done.");
                pool.shutdown();}
             else if (actors < 10 || done % (actors / 10) == 0)
                     System.out.println(MessageFormat.format
                       ("{0} actors done ({1} total pongs).", done, pings * done));}})
          .asActor();}

    public static void main(final String[] args)
      {final int actors  = Integer.parseInt(args[0]);
       final int pings   = Integer.parseInt(args[1]);
       final int threads = Integer.parseInt(args[2]);
       new PingPong(Executors.newFixedThreadPool(threads), actors, pings).start();}

    public void start()
      {// We will use one Pong actor...
       final Pong pong = new Pong(s);
       // ...and an awful lot of Ping actors.
       for (int i = 1; i <= actors; i++)
           {new Ping(s, pings, pong, i, callback).start();
            if (actors < 10 || i % (actors / 10) == 0)
               System.out.println(MessageFormat.format("{0} actors started.", i));}}}

What follows is an example run of this Java program, with a million concurrent Ping actors pinging 7 times each. Each actor takes about 300 bytes of memory, so we need a sizable heap for one million of them, but 19 real Java Threads handle this quite nicely on my 8-core machine.

$ time java -Xmx600m -cp ../../../.build/classes/src:. concurrent.PingPong 1000000 7 19
100,000 actors started.
200,000 actors started.
300,000 actors started.
400,000 actors started.
500,000 actors started.
600,000 actors started.
700,000 actors started.
800,000 actors started.
900,000 actors started.
1,000,000 actors started.
100,000 actors done (700,000 total pongs).
200,000 actors done (1,400,000 total pongs).
300,000 actors done (2,100,000 total pongs).
400,000 actors done (2,800,000 total pongs).
500,000 actors done (3,500,000 total pongs).
600,000 actors done (4,200,000 total pongs).
700,000 actors done (4,900,000 total pongs).
800,000 actors done (5,600,000 total pongs).
900,000 actors done (6,300,000 total pongs).
All done.

real    1m16.376s
user    3m53.612s
sys     0m10.924s

As you see, these simple tools, built on basic components of the Java 5 concurrency library, paired with powerful abstractions from programming in functional style, makes it seem like we have millions of tasks running concurrently. We get a virtually unbounded degree of concurrency while never seeing any locks nor performing any blocking calls. The number of concurrent tasks is limited only by the size of your heap.

Again, everything you see here, and more, has been released just recently as part of the Functional Java library. So head over to their download page and put those extra cores to use with Java, today!

Higher-Order Java Parallelism, Part 2: Parallel List Transformations

With a Callable monad and Parallel Strategies in hand, we’re ready to construct some general-purpose parallel functions of a higher order.

Something we’ll want to do quite often in parallel programming is run a function over a whole list of values, in parallel. So often, in fact, that it would be very tedious and repetitive to write a loop every time we want to do this, sparking off threads for each function call, and collecting the results. Instead, what we’re going to develop next is a kind of parallel list functor. That is, a higher-order function that will turn any function into a function that operates on every element of a list simultaneously.

First, let’s get some terminology out of the way, since I’ve been accused of not defining my terms.

A higher-order function is simply a function that takes a function as an argument.

A functor is any type T for which there exists a higher-order function, call it fmap, that transforms a function of type F<A,B> into a function F<T<A>,T<B>>. This fmap function must also obey the laws of identity and composition such that the following expressions return true for all x, p, and q:

  fmap(identity()).f(x) == x
  fmap(compose(p, q)).f(x) == fmap(p).f(fmap(q).f(x))

Here, identity is a first-class function of type F<A,A> that just returns its argument, and compose is function composition. Composition is defined to mean that compose(p, q).f(x) is the same as p.f(q.f(x)).

For example, Functional Java’s List class is a functor, because it comes equipped with a method List<B> map(F<A,B> f) which obeys the laws above. The implementation of List’s map method is trivial: It iterates over the list, calling the given function for each element, and returns the list of the results.

There’s a lot of information about functors and monads on the Internet, and I’ve written about them before, with examples in Java. Tony Morris also wrote a great post where he explains functors as “function application in an environment”. That’s a good way of looking at things.

I also want to make it clear, if you’re just now tuning in, that I’m using classes from Functional Java in the code examples, in addition to classes from the J2SE concurrency library. Basically only Callable and Future are from the J2SE library. Strategy, List, and the Callables utility class are from Functional Java. The code used here is from the latest trunk head revision of that library and may differ from the current official release.

Now, let’s get to the money.

Mapping Over a List in Parallel

We’re going to make a very similar kind of thing to a list functor, except it’s going to map a function over a list in parallel. Our parallel list transformation will take the form of a static method called parMap. It will take any existing function and turn it into a parallel function on lists. That is, if we already have written a function that takes an Integer and returns a String, we can pass that function to parMap, and it will return us a function that takes a list of Integers and returns a list of Strings. This means that nothing has to be rewritten or refactored to support the mapping of a function over a list in parallel. The parMap function will convert our existing code to parallel code, at runtime.

Here’s parMap:

  public <B> F<List<B>, Callable<List<A>>> parMap(final F<B, A> f) {
    return new F<List<B>, Callable<List<A>>>() {
      public Callable<List<A>> f(final List<B> as) {
        return sequence(as.map(compose(par(), callable(f))));
      }
    };
  }

That’s a lot of power for so little code. Let’s read through what it does. It takes a function f and returns a new function that takes a list called as. This new function transforms f with Callables.callable to wrap its return type in a callable (or: callable(f) applies the Kleisli arrow for Callables to f), and composes that with a function called par(). It then maps the resulting composition over the list as, which will result in a list of Callables, which we turn into a Callable of a List with the sequence function. The only new thing here is par(), which is a very simple instance method on Strategy:

  public F<Callable<A>, Callable<A>> par() {
    return compose(Strategy.<A>obtain(), f());
  }

So par() is just the Strategy’s function composed with the obtain() method that we created in Part 1. This turns the Future back into a Callable so that we can manipulate it in a lazy manner. This little guy is the meat of parMap. That is, parMap is basically just mapping par() over a list.

The result of parMap(myFun).f(myList), then, is a Callable of a list that, when called, will give us the results of calling myFun on every element of myList in parallel. What’s more, it will work for any kind of parallelisation Strategy, and it will return immediately (remember, Callable is lazy), ready for us to make further calculations on the results even while those results are being calculated.

I think that’s pretty cool.

Nested Parallelism

Sometimes the problem at hand is not quite so flat that it can be solved with just a map over a list. You might have a need for nested traversal of lists, where for every element in a list you traverse another list. We can model that behaviour with a higher-order function very similar to parMap, except that it takes a function that generates a list. This higher-order function is parFlatMap:

  public static <A, B> Callable<List<B>> parFlatMap(final Strategy<List<B>> s,
                                                    final F<A, List<B>> f,
                                                    final List<A> as) {
    return fmap(List.<B>join_()).f(s.parMap(f).f(as));
  }

In this definition, fmap is from the Callable monad, as described in Part 1. Notice that parFlatMap provides an example use of parMap, using it to turn the given function into a parallel function. The call to parMap with that function will actually yield a Callable<List<List<B>>> (can you see why?), so we use a final join lifted into the Callable environment with fmap, to turn the List of Lists into just a List.

The parFlatMap function is analogous to a parallel list monad, and it works very much like a parallel version of nested loops. For example, given a list of Points pts, a distance function dist that takes two Points and returns the distance between them, and a Strategy<Double> s, we can use parFlatMap to calculate the distance between every Point and every other, in parallel:

  Callable<List<Double>> distances = parFlatMap(s, new F<Point, List<Double>>() {
    public List<Double> f(final Point p) {
      return pts.map(dist.f(p));
    }
  }, pts);

The inner “loop” is represented by List.map, so we’re traversing the list serially, multiplying every element in parallel with each element in sequence. I.e. the outer loop is parallel and the inner loop is serial. It’s quite possible to replace map with parMap, and I’ll leave that as an exercise for the reader (hint: you will need both fmap and join). Note that the call to parFlatMap will return immediately, and the computation will be carried out in the background.

Position-Wise Parallelism and Concurrent Folding

Another function I want to talk about is the parZipWith function. By contrast to parFlatMap, it is like a parallel version of a single loop over two lists. Values from each list are taken at matching positions and the pairs are fed through the argument function all at the same time:

  public static <A, B, C> F2<List<A>, List<B>, Callable<List<C>>> parZipWith(final Strategy<C> s,
                                                                             final F2<A, B, C> f) {
    return new F2<List<A>, List<B>, Callable<List<C>>>() {
      public Callable<List<C>> f(final List<A> as, final List<B> bs) {
        return sequence(as.zipWith(bs, compose(Callables.<B, C>arrow(), curry(f))).map(s.par()));
      }
    };
  }

You will note that parZipWith simply uses zipWith from Functional Java’s List class. There’s some manouvering required to turn the argument function into the right form. The curry method turns the F2<A,B,C> into the equivalent F<A,F<B,C>> (sometimes called a Curried function). The arrow function is the first-class Kleisli arrow for Callables. That just means that composing it with f yields a version of f which returns a Callable.

Let’s take an example of using parZipWith to do useful work. If we had a list of Points representing a path, and we wanted to get the total length of the path, we could zip the list and its tail (every element except the first) with the dist function and take the sum of the results:

  Double length = parZipWith(s, dist).f(pts, pts.tail()).call().foldLeft(curry(sum), 0);

For brevity’s sake, I’m pretending that there’s already a function called sum, which is just an F2<Double, Double, Double> that returns the sum of its two arguments. If foldLeft is something new to you, then have a look at one implementation of it. It’s a more general abstraction of a for-loop than map() is. Folding a list {1,2,3} with the sum function (and 0) can be thought of as replacing all the commas with plusses to yield [0+1+2+3]. We could have just as well iterated over the list, adding each value to a total, instead of using a foldLeft function. But we didn’t, for a reason that will become apparent presently.

There’s something about the above example that’s dissatisfying. We’re getting the distances in parallel, but then call() sticks out like a sore thumb. The computation will block until the list of distances is ready, then it will fold the list with the sum function. What’s worse, call() might throw an Exception. There’s a way we can avoid all that by traversing the list of distances while they’re being calculated. What we need to do is somehow lift the foldLeft function into the Callable environment. The call to foldLeft above turns a List<Double> into a Double. What we want is to avoid the call to call(), and turn our Callable<List<Double>> into a Callable<Double> directly.

This is really easy to do, and it’s a good thing we’re using a folding function rather than a for-loop. Here’s what we do:

  Callable<List<Double>> edges = parZipWith(s, dist).f(pts, pts.tail());
  Callable<Double> length = par(fmap(List.<B, B>foldLeft().f(curry(sum)).f(0)).f(edges));

There. No exceptions, and no waiting for results. This code will return immediately and perform the whole computation in the background. The part that calculates the edge lengths is the same as before, and we’re keeping that in a variable edges for clarity. The other line may require some explanation. We’re getting a first-class foldLeft function from the List class, partially applying it to yield a function that folds a List<Double> into a Double. Then we’re promoting it with fmap into the Callable monad. The promoted function then gets applied to edges, and finally the whole fold is evaluated concurrently by passing it to par.

Hopefully you can see what an immensely powerful tool for parallel programming functional style and higher-order functions are, even in Java. We were able to develop a couple of one-liners above that are terser, clearer, more maintainable, more re-usable (and more type-safe) than anything we could write in imperative object-oriented style using producer/consumer, factories, inversion of control, etc. There’s no secret sauce, no special syntax, no functional fairy dust. It’s just plain old Java with a handful of very useful interfaces.

In the third installment of this series, we will develop a tiny light-weight library for threadless Actors that can juggle millions of simultaneous computations in only a few threads. Stay close.

Note: The code herein is adapted from a Haskell library by Phil Trinder, Hans-Wolfgang Loidl, Kevin Hammond et al.