Computing Thoughts Bruce Eckel's Programming Blog

Are Java 8 Lambdas Closures?

(Significantly rewritten 11/25/2015)

Based on what I’ve heard, I was surprised to discover that the short answer is “yes, with a caveat that, after explanation, isn’t terrible.” So, a qualified yes.

For the longer answer, we must first explore the question of “why, again, are we doing all this?”

Abstraction over Behavior

The simplest way to look at the need for lambdas is that they describe what computation should be performed, rather than how it should be performed. Traditionally, we’ve used external iteration, where we specify exactly how to step through a sequence and perform operations:

// InternalVsExternalIteration.java
import java.util.*;

interface Pet {
    void speak();
}

class Rat implements Pet {
    public void speak() { System.out.println("Squeak!"); }
}

class Frog implements Pet {
    public void speak() { System.out.println("Ribbit!"); }
}

public class InternalVsExternalIteration {
    public static void main(String[] args) {
        List<Pet> pets = Arrays.asList(new Rat(), new Frog());
        for(Pet p : pets) // External iteration
            p.speak();
        pets.forEach(Pet::speak); // Internal iteration
    }
}

The for loop represents external iteration and specifies exactly how it is done. This kind of code is redundant, and duplicated throughout our programs. With the forEach, however, we tell it to call speak (here, using a method reference, which is more succinct than a lambda) for each element, but we don’t have to specify how the loop works. The iteration is handled internally, inside the forEach.

This “what not how” is the basic motivation for lambdas. But to understand closures, we must look more deeply, into the motivation for functional programming itself.

Functional Programming

Lambdas/Closures are there to aid functional programming. Java 8 is not suddenly a functional programming language, but (like Python) now has some support for functional programming on top of its basic object-oriented paradigm.

The core idea of functional programming is that you can create and manipulate functions, including creating functions at runtime. Thus, functions become another thing that your programs can manipulate (instead of just data). This adds a lot of power to programming.

A pure functional programming language includes other restrictions, notably data invariance. That is, you don’t have variables, only unchangeable values. This sounds overly constraining at first (how can you get anything done without variables?) but it turns out that you can actually accomplish everything with values that you can with variables (you can prove this to yourself using Scala, which is itself not a pure functional language but has the option to use values everywhere). Invariant functions take arguments and produce results without modifying their environment, and thus are much easier to use for parallel programming because an invariant function doesn’t have to lock shared resources.

Before Java 8, the only way to create functions at runtime was through bytecode generation and loading (which is quite messy and complex).

Lambdas provide two basic features:

  1. More succinct function-creation syntax.

  2. The ability to create functions at runtime, which can then be passed/manipulated by other code.

Closures concern this second issue.

What is a Closure?

A closure uses variables that are outside of the function scope. This is not a problem in traditional procedural programming – you just use the variable – but when you start producing functions at runtime it does become a problem. To see the issue, I’ll start with a Python example. Here, make_fun() is creating and returning a function called func_to_return, which is then used by the rest of the program:

# Closures.py

def make_fun():
    # Outside the scope of the returned function:
    n = 0

    def func_to_return(arg):
        nonlocal n
        # Without 'nonlocal' n += arg produces:
        # local variable 'n' referenced before assignment
        print(n, arg, end=": ")
        arg += 1
        n += arg
        return n

    return func_to_return

x = make_fun()
y = make_fun()

for i in range(5):
    print(x(i))

print("=" * 10)

for i in range(10, 15):
    print(y(i))

""" Output:
0 0: 1
1 1: 3
3 2: 6
6 3: 10
10 4: 15
==========
0 10: 11
11 11: 23
23 12: 36
36 13: 50
50 14: 65
"""

Notice that func_to_return manipulates two fields that are outside its scope: n and arg (depending on what it is, arg might be a copy, or it might refer to something outside its scope). The nonlocal declaration is required because of the way Python works: if you just start using a variable, it assumes that variable is local. Here, the compiler (yes, Python has a compiler and yes, it actually does some – admittedly quite limited – static type checking) sees that n += arg uses n which, within the scope of func_to_return, hasn’t been initialized, and generates an error message. But if we say that n is nonlocal, Python realizes that we’re using the n that’s defined outside the function scope, and which has been initialized, so it’s OK.

Now we encounter the problem: if we simply return func_to_return, what happens to n, which is outside the scope of func_to_return? Ordinarily we’d expect n to go out of scope and become unavailable, but if that happens then func_to_return won’t work. In order to support dynamic creation of functions, func_to_return must “close over” and keep alive n when the function is returned, and that’s what happens – thus the term closure.

To test make_fun(), we call it twice and capture the resulting function in x and y. The fact that x and y produce completely different results shows that each call to make_fun() produces a completely independent func_to_return with completely independent closed-over storage for n.

Java 8 Lambdas

Now let’s see what the same example looks like with lambdas:

// AreLambdasClosures.java
import java.util.function.*;

public class AreLambdasClosures {
    public Function<Integer, Integer> make_fun() {
        // Outside the scope of the returned function:
        int n = 0;
        return arg -> {
            System.out.print(n + " " + arg + ": ");
            arg += 1;
            // n += arg; // Produces error message
            return n + arg;
        };
    }
    public void try_it() {
        Function<Integer, Integer>
            x = make_fun(),
            y = make_fun();
        for(int i = 0; i < 5; i++)
            System.out.println(x.apply(i));
        for(int i = 10; i < 15; i++)
            System.out.println(y.apply(i));
    }
    public static void main(String[] args) {
        new AreLambdasClosures().try_it();
    }
}
/* Output:
0 0: 1
0 1: 2
0 2: 3
0 3: 4
0 4: 5
0 10: 11
0 11: 12
0 12: 13
0 13: 14
0 14: 15
*/

It’s a mixed bag: we can indeed access n, but we immediately run into trouble when we try to modify n. The error message is: local variables referenced from a lambda expression must be final or effectively final.

It turns out that, in Java, lambdas only close over values, but not variables. Java requires those values to be unchanging, as if they had been declared final. So they must be final whether you declare them that way or not. Thus, “effectively final.” And thus, Java has “closures with restrictions,” which might not be “perfect” closures, but are nonetheless still quite useful.

If we create heap-based objects, we can modify the object, because the compiler only cares that the reference is not modified. For example:

// AreLambdasClosures2.java
import java.util.function.*;

class myInt {
    int i = 0;
}

public class AreLambdasClosures2 {
    public Consumer<Integer> make_fun2() {
        myInt n = new myInt();
        return arg -> n.i += arg;
    }
}

This compiles without complaint, and you can test it by putting the final keyword on the definition for n. Of course, if you use this with any kind of concurrency, you have the problem of mutable shared state.

Lambda expressions accomplish – at least partially – the desired goal: it’s now possible to create functions dynamically. If you step outside the bounds, you get an error message, but there’s generally a way to solve the problem. It’s not as straightforward as the Python solution, but this is Java, after all, and we’ve been trained to take what we are given. And ultimately the end result, while somewhat constricted (face it, everything in Java is somewhat constricted) is not too shabby.

I asked why the feature wasn’t just called “closures” instead of “lambdas,” since it has the characteristics of a closure? The answer I got was that closure is a loaded and ill defined term, and was likely to create more heat than light. When someone says “real closures,” it too often means “what closure meant in the first language I encountered with something called closures.”

I don’t see an OO versus FP (functional programming) debate here; that is not my intention. Indeed, I don’t really see a “versus” issue. OO is good for abstracting over data (and just because Java forces objects on you doesn’t mean that objects are the answer to every problem), while FP is good for abstracting over behavior. Both paradigms are useful, and mixing them together has been even more useful for me, both in Python and now in Java 8. (I have also recently been using Pandoc, written in the pure FP Haskell language, and I’ve been extremely impressed with that, so it seems there is a valuable place for pure FP languages as well).

View or add comments

Jet Conference Slides

Here are the slides for my opening keynote at the JET Conference in Minsk, Belarus, Sept 28, 2015. Note that many of the slides also have notes.

The evening before the conference, I will be presenting “Creating Trust Organizations,” and you can find the slides for that here.

 

View or add comments

What I Do

I have a very narrow set of skills, which can be summarized as “delving into a language and helping others understand and (sometimes) solve problems involving that language.” That’s too simple, however – because I ultimately seek “the best” language, for my own personal definition of “best.”

I also leave languages behind. While I once knew all the ins and outs of C++, for example, I stopped studying that language after C++98, searching for a language that gave me more.

I have always sought the most powerful languages. For me, the most important aspect of “power” is programmer productivity, and this is predominantly determined by simplicity and clarity. So I seek languages that emphasize simplicity and clarity above all. I am very aware of the cognitive overhead of language features, and the limitations of the human mind in managing complexity. The more of the mind that is used on arbitrary complexity, the less is available to solve the problems at hand. Thus, I seek languages that don’t force the programmer to jump through arbitrary complexity hoops in order to compensate for language decisions that were made for reasons other than “simplest for the programmer.” I’ve found that once I discover the compromises and what they cost – and especially when they cost more than the language benefits, in terms of programmer time and effort – I start losing interest in that language.

Over time, my perception of “the most powerful language” has been a moving and evolving target. When I long ago began working with C++, it seemed like the best combination of features supporting program abstraction while at the same time enabling C programmers to migrate from C (and reuse their C code in the process). C++’s constraint of “backwards compatibility with C” produced both that evolution path and the added complications that slow programmers down. Although I was heavily immersed in C++ during its early years – depending on how you count, I wrote three or four books on the topic and I was on the C++ Standards Committee from its inception and for the first eight years (for selfish reasons: I found it the best way to learn the intricacies of the language) – I did not study the language after the C++98 standard.

Languages like D and Go have emerged to attempt to solve the C++ complexity problem while maintaining the C++ speed of direct hardware connection. I have great admiration for Go, but I personally find the lack of classes to be too limiting. Classes are certainly not the answer to all problems, but there are times when they are an excellent solution and not having them feels overconstrained.

Java brought virtual machines and garbage collection into the mainstream, but in its headlong rush to dominate the programming world it made irrevocable decisions that have slowly but inexorably squeezed it out of most of the very areas it sought to dominate (for example: Applets, J2EE, Jini, and user-interface programming).

I’ve spent a couple of years immersed in Scala (longer in calendar time, but safe to say two years of full immersion) and feel like I have only a surface understanding of that language. I like to think I understand what’s in Atomic Scala, but even then I am unsurprised when someone points out some feature that turns out to have far greater complexity than I thought.

I’ve come to view Scala as a landscape of cliffs – you can start feeling pretty comfortable with the language and think that you have a reasonable grasp of it, then suddenly fall off a cliff that makes you realize that no, you still don’t get it. Scala has been an amazing experiment and a terrifically valuable learning experience for me, but I feel like I put a lot of effort in writing Atomic Scala so as to present the language in a simple fashion while hiding those cliffs (and I don’t feel too comfortable with that realization). The language left me feeling like I would never be able to completely understand it. On top of that, the reliance on the JVM (and its design decisions) adds some of the same needless complexity that encumbers Java.

One of the issues I had with Scala is the constant feeling of being unable to detect whether I just wasn’t “getting” a particular aspect, or if the language was too complex. This is a hard thing to know until you have a deep understanding of a language. A large portion of the Scala community seems quick to declare that yes, you don’t get it. But if you’re considering using Scala, you owe it to yourself to first watch this presentation by Paul Phillips (the video doesn’t show the slides so you need to get them here). Paul makes a compelling argument that he has written more Scala code than anyone (he worked on the compiler for years), and he also shows how broken the language is – while at the same time stating that it’s possible to be very effective with Scala. He also declares that the language is, in fact, too complex. This doesn’t mean you can’t or shouldn’t use Scala, or that you can’t have a perfectly good experience by staying within a self-imposed box of simplicity.

So I investigate these languages and, when I begin to find them too limiting for various reasons, move on. A language in which I might once have been considered an expert, I leave behind and in so doing lose that expertise. But ultimately I can’t tolerate the thought of programmers wasting time understanding and compensating for decisions that were made for efficiency or expedience or anything other than programmer productivity.

What you consider productive may differ from my definition. For me, it’s “How easy is it for the programmer to think about this problem using the programming language? Where does the language get in the way of expressing the solution in the simplest form?”

The one language that has kept calling me during all of my explorations is Python. To me, the philosophy of Python can be summed up like this: “Nothing is more important than simplicity and clarity.” (In 2009, I created the Pycon (Python Conference) T-shirt which said “Elegance Begets Simplicity”). While Python will do its best to be fast and to solve the numerous technical problems faced by programming language designers, it will never do so by compromising simplicity and clarity. Sometimes this means a feature will not appear in the language for a long time, while the community absorbs and understands the problem it’s meant to solve, but when that feature does appear it’s almost always the clearest and simplest way to think about that problem (and if it isn’t, eventually it gets fixed). For example, the new coroutine support in Python 3.5 took over a decade to evolve.

Whenever I’ve had to solve a problem, I’ve always reached for Python because it’s the fastest way to get to a solution. When I consider using one of the other languages I’ve studied, the overhead of development is so much greater that it just doesn’t make sense. And, during my study of organizations I’ve seen how important culture is, and the Python community has done a masterful job of creating a culture and a community, like nothing I’ve seen for any other programming language.

From these realizations, after my current project (collecting all my Java writings into an eBook), my plan is to write “Atomic Python” and to go dwell in the world of Python; it has become clear that this is the right place for me. I still seek an optimal solution for the user-interface problem, and right now the most likely candidate is Elm, precisely because its philosophy follows that of Python (However, right now I don’t understand Elm well enough to have written anything in it, but I do know of one UI designer who uses it exclusively).

My mission is to try to find the most productive language or combination of languages, using my own definition of “most productive,” which is a rather pure pursuit and does not take into account the various corporate contexts and constraints under which many folks must work.

View or add comments

News, Readables, Viewables

View or add comments

Pull Requests: The Linchpin of Open Source

When Linus Torvalds started creating Linux, he managed the code base himself. People would email him patches and he would either include them or not.

To maintain the code base – to have checkpoints and be able to back up to an earlier known point – he used a Distributed Version Control System (DVCS) which, as the name implies, is for managing versions.

So there were these two seemingly-separate things: incorporating patches and managing versions.

Linus originally used a proprietary DVCS called BitKeeper, but at some point became dissatisfied with that and created the open-source Git system instead. But even then he continued to take patch requests via email, incorporate them in and create new versions using Git.

Later, Github formed to provide repositories for Git projects (and BitBucket did the same thing, but started out using Mercurial, which was written in Python. More recently BitBucket has changed to predominantly support Git, although it still also supports Mercurial).

Github’s innovation was in incorporating the mechanism of the patch request into the DVCS process, rather than relying on email patches and hand-processing. Now you can review a patch, and if you accept it you incorporate it into your project with the push of a button.

For some reason, they chose to use the term “pull request” rather than “patch request.” For years this confused me and I just ignored it, thinking “well, if you want to pull something, pull it – don’t bother asking me about it!” (This confusion is one big reason for the failure of the Python Patterns book).

At OSCON in the speaker room I overhead someone declaring that the pull request is the cornerstone of the open-source process. Along with my studies of emerging organizational structures, this created an epiphany for me.

The project owner is the one with commit privileges (he or she can also grant others commit privileges and they can work out some internal way of making decisions, but let’s keep it simple for now). Anytime someone submits a pull/patch request, the committer can incorporate it or deny it, depending on whether it fits the committer’s vision/standards/requirements. It’s a yes/no decision, and at that point the committer re-asserts their leadership on the project.

Taken in isolation, this system produces a single autocratic ruler, and relies on that ruler’s benevolence. If the ruler decides to behave badly you end up with the scenario we’re all familiar with: “my way or the highway,” and this is what makes so many companies unpleasant to work for – even if they start out as nice places to work, the door is always open for them to become unpleasant, dictatorial environments, so simple Brownian motion means they’ll probably end up there.

Open source leaves a second door open, however. If your pull requests keep getting ignored and you feel strongly enough about a project, you can turn your fork into a new project and take it in another direction. If your leadership is better than the original, people will begin to prefer your version. Effectively, it creates a marketplace of genetic variations around a single project, rather than arbitrarily forcing that project to have a single implementation, even when something around that project is broken.

The pull request allows an individual to express their vision, and to make it clear to contributors whether they’re on the same page as the project owner. It prevents the leaden morass that is consensus (unless a group of committers choose to practice it). The fork allows genetic variation within an open-source project, so the marketplace can choose the best-suited version.

Here’s a very nice overview of the process.

(James Ward explained most of this to me)

View or add comments