Computing Thoughts Bruce Eckel's Programming Blog

Gophercon And The Concurrent Python Developer Retreat

I found Gophercon to be valuable and it restarted my interest in the Go language. I’m currently working my way through The Go Programming Language Phrasebook and plan to explore ways to call Go from Python (described later in this post).

If I hadn’t been attending with my friend Luciano Ramalho it would have been a different experience. The conference is clearly commercial and I had a strong sense of having my experience decided for me. Most of the time there was only one activity going on, perhaps a selection of talks (often too crowded to get into), but a number of times only a single keynote or something happening that was intended to force the attendees to hang out around the vendor booths. This served the conference organizers and the vendors, but it wasn’t all that valuable for the attendees.

I would strongly recommend that the Gophercon organizers attend and study Pycon to experience an attendee-centered conference. At Pycon, there’s always a multitude of things to do and experience. With Gophercon, what you see on the resulting online videos is not that different from what you get by attending. Sure, there are the occasional opportunities for meeting new folks. Meals are probably the best for this, because you need to sit somewhere and that tends to produce new connections. The conference parties are intended to achieve this, but these basically throw you into a big noisy space with food and possibly games, and rely on you to start conversations. A few thoughtful (and free) touches would catalyze the connections that people at the conference clearly want to make.

Here’s an example. I mentioned my observations to Luciano, and before I knew it he had spontaneously created an open-spaces conference at one of the tables, simply by making a sign on that table reading “Pythonistas who love Go” and tweeting about it. The people who showed up were hungry for this conversation and I think there could have been a lot more of this. Instead, if people weren’t interested in what was going on or the talk they wanted to see was too crowded, they ended up working on their own at one of the tables. I loved Luciano’s tactic and will try something like that myself if I end up in a similar situation – I’ll call it guerilla open spaces.

We stayed for part of the “community day,” an optional post-conference event when everyone really started to connect. This was, to my perception, the highest-energy and most collaborative day of the conference, and showed what kind of potential this conference really has. It was also an opportunity for open-source project sprinting, but as was discovered long ago at Pycon, one day doesn’t do it. In fact, you’re lucky if you get people up to speed with tools and the project on the first day, which is why Pycon has four post-conference sprint days (I only sprinted for a couple of days last year, three days this year, and next year I will stay the whole four days – the experience and learnings I’ve gotten from sprinting has just been fantastic).

A lot of people had either completely come over from Python or were programming in both Python and Go. A number of speakers told this story. My impression was that the Python-Go crossover was the most common one, but that’s probably because I was particularly interested in it. I did find it a little unfortunate that several of the speakers said mildly deprecating things about Python, especially since I’d like to explore using Python everywhere possible, for the power and development speed, and to see if Go can be used to solve many of Python’s concurrency issues.

Indeed, I went to Gophercon not just because Luciano asked me to, but because I hoped to understand more about Go’s concurrency model. I came away with a much better but far-from-perfect understanding. Everything I’ve learned so far continues to support the idea that Go can be used in conjunction with Python, at least for some set of concurrency problems. So I want to continue that experiment.

I had a very interesting discussion with one person who is a big fan of both Python and Go, and had a strong understanding of concurrency. He made a fascinating observation, which is that there are areas where Go is “more Pythonic” than Python, as per Tim Peters’ The Zen of Python. In particular, “There should be one–and preferably only one–obvious way to do it.” When it comes to concurrency, there are many competing ways to solve that problem, and the main fork happens between asynch (when you’re waiting around for something external to happen) and parallelism (when your problem takes lots of processing power). The basic solutions in Python are async/await and multiprocessing. But in Go, you don’t have to think about the type of concurrency problem you’re solving; you just use goroutines for everything. Thus, more Pythonic: one obvious way to do it.

I was especially interested in the grpc cross-language bridge. There was a presentation that used this, and the grpc intro pages have working examples, one of which is Python to Go, which I tested on my Windows 10 machine. The grpc unboxing experience was seamless and exceptionally good, which means I’ll probably explore it further when I need high-speed foreign-function calls. There’s a certain attraction to using something supported by Google.

However, during the developer retreat, Jim Fulton told us about MessagePack which has a cross-language RPC implementation. This also seems worthy of exploration.

The conference made Go a particularly interesting way to offload concurrency problems from Python. Using Go is the one solution we didn’t get to during the developer retreat, but I see a lot of possibility here for Concurrent Python and I intend to explore this avenue further.

The Conconcurrent Python Developer Retreat

I’ll say that the “core” members of the event were myself, Luciano and Jim Fulton of Zope fame. We had others who were attending and contributing while still remotely working at their jobs (un-named in case that’s an issue) and one intern from the local college who was attempting to drink from the firehose.

In this retreat, we were too ambitious. I’ve had plenty of experiences trying to do too much, but the topic was interesting and we had some good minds working on it so I got drawn into the enthusiasm.

We were too ambitious but we learned a great deal. We usually take more breaks and do more outside activities, but our ambition drove us more this time so we exhausted ourselves on a couple of occasions. I need to pay more attention to this in the future and perhaps even add a little structure around breaks and activities.

We did manage to get one walk in on the third day, after we had all gotten mentally exhausted.

At Gophercon, Luciano and I talked to the folks at the Twitch.tv booth, who convinced us that Twitch is about more than gaming, and that it would be a good way to livestream programming events. This idea was interesting enough that after Luciano told me Thoughtworks’ choice for streaming camera and microphone, I ordered them from Amazon so we could use them for part of the conference. Alas, Amazon failed: the days kept passing and I kept hearing about delays in delivery, so it never happened. Next time we’ll give it a try.

As an experiment I bought a Google Chromecast perhaps a year ago. I’ve hardly used it for anything, but during workshops like this it’s priceless to be able to have anyone in the room quickly and easily cast their desktop to the TV. We used it almost constantly during this developer retreat.

The Projects

All these projects reside in the Concurrent Python Github Repository.

As previously mentioned, Python tends to partition concurrency into async and parallel, so we came up with projects to test those approaches.

The first was based on something I’ve been thinking of for awhile. I enjoy webcomics but the reading tools require too much mousing-and-clicking. In addition, while they originally might open the comic directly in the reader, it turns out that the artists often only get paid if you open the page (and the associated ads) on the supporting site.

So the first project used asynch calls to open a list of pages, opening each in a tab when it became available. In WebComicReader you’ll see multiple Python files, each containing the different approaches we tried, including basic async/await, concurrent futures, and gevent. Not everything worked, or worked easily, or produced the results we expected (Remember we were in full exploration mode; pull requests welcome).

To explore parallelism, Luciano came up with the idea of grabbing some images and performing CPU-intensive image processing on them – we chose the application of a blur filter. Various solutions are in the Parallel Image Processing repository. linear.py just does them one at a time, to show the longest possible processing duration, while the other two experiment with parallel versions.

One of the more illuminating days was the last one. As we implemented the photo-blurring problem using various techniques, Jim kept mentioning Rust. Jim had implemented a ZODB server in Rust a year before, so it seemed like an ideal opportunity to learn more about this language. Lots of people seem to use the refrain “I don’t know much about Rust but it sounds interesting and I’d like to explore it more.” So I asked Jim if he’d be willing to work through the photo-blurring problem in Rust and he agreed.

The first interesting thing was setup. Jim worked on this for awhile before he just built the environment in a Docker container. This showed that Rust is still in flux; there seemed to be language feature error messages that popped up around libraries unless you had exactly the right combination of compiler and tools; I think we ended up using the nightly build. It seems like just getting set up could be quite daunting for a novice.

Jim hadn’t done much with Rust in the past year, but before that he had built a fairly complex piece of software. In the intervening year, many of the details of Rust had slipped away from him and he had to recover them. I’ve had this same experience with C++. When you have so many facets you need to keep in your head, and when the features in question are not language-intuitive but rather are low-level details necessary to satisfy some other need, then it’s easy to lose track of those details.

In the case of C++, the extra mental baggage came from both the requirement of backwards compatibility with C, and performance. This threw in lots of special cases you had to remember. In the cast of Rust, there’s no backwards-compatibility issue, which was a relief, but that simplification was far more than overwhelmed by all the performance issues.

From my limited exposure to the language, I feel safe in saying that the entire raison d’etre of Rust is performance. You’d choose this language if you are solving a problem fundamentally driven by performance. One oft-repeated use case is rewriting Firefox in Rust. Considering how many people use web browsers, something like this could, for example, significantly reduce electric power usage while at the same time increasing the responsiveness of the browser. For that kind of problem it makes a lot of sense.

But the mental overhead and the amount of work necessary is enormous. In Rust you must constantly pay attention to memory management. When you call a function, you must also understand how your memory management is going to interact with that function, and make decisions about it. This is on top of the complexity of the language. My perception is that if you decide to create a project in Rust, you must live and breathe that language and it will completely occupy your brain, with no room for any other language.

Before committing to Rust for a project, you must be very clear that the costs are worth the benefits. Rewriting FireFox, for sure. But for a problem like ours it wasn’t clear that the performance was faster than the other approaches. This might well be due to a poor implementation of the Rust image processing library we used, but after all the work it would be pretty disappointing to discover that a solution in Go or even Python was around the same speed or even faster.

You can find the Rust implementation of the image processor here. Jim put this up a few days after the conference, and it now looks deceptively much simpler than what we were seeing during development.

Rust is not the language I’m seeking because it’s not a rapid-development solution, so I’m putting it on the back shelf for the time being. I can definitely imagine projects where it could be a good solution, but I doubt I’d want to work on those projects.

Once I get far enough through the Go book, I want to try implementing the image-processing problem in Go and see what kind of performance comes from that.

Keep in mind that I have always had a particular orientation towards achieving desired results while minimizing development effort. If you have different objectives and needs, you will probably come to different conclusions than I do.

View or add comments

On Java 8 And The Concurrent Python Developer Retreat

For many years I’ve been getting requests for some kind of sequel to Thinking in Java, 4th Edition. Over two years ago, I finally decided to “pull together something quickly.” After all, how much time have I spent writing about the language? I should be getting pretty fast by now.

Self-delusion knows no bounds. No matter how many books I write, every one seems to take longer than the previous ones, not shorter. I hope this is because I’m getting more meticulous.

Another factor is that Java 8 is a dramatic departure from previous versions of Java. It has pulled a major rabbit out of a hat with the introduction of lambdas and functional programming—perhaps not as pure as you expect from a real functional language, but a huge step forward nonetheless. Along with CompletableFutures and Streams, Java 8 is a radical improvement in the experience of programming with Java.

I’ve rolled the book out very slowly; for the last couple of months it’s been in beta to make sure there were no glitches in the delivery or reading experience, but it is now officially released.

You can find it at www.OnJava8.com.

This book is far too large to publish as a single print volume, and my intent has always been to only publish it as an eBook. Color syntax highlighting for code listings is, alone, worth the cost of admission. Searchability, font resizing or text-to-voice for the vision-impaired, the fact you can always keep it with you—there are so many benefits to eBooks it’s hard to name them all.

Anyone buying this book needs a computer to run the programs and write code, and the eBook reads nicely on a computer (I was also surprised to discover it reads tolerably well on a phone). However, the best reading experience is on a tablet computer. Tablets are inexpensive enough you can now buy one for less than you’d pay for an equivalent print version of this book (which, note, does not exist). It’s much easier to read a tablet in bed (for example) than trying to manage the pages of a physical book, especially one this big. When working at your computer, you don’t have to hold the pages open when using a tablet at your side. It might feel different at first, but I think you’ll find the benefits far outweigh the discomfort of adapting.

I’ve done the research, and Google Play Books provides a very nice reading experience on every platform, including Linux and iOS devices. As an experiment, I’ve decided to try publishing exclusively through Google Books.

The Concurrent Python Developer Retreat

I will be attending GopherCon in Denver with my friend Luciano Ramalho (Author of Fluent Python), and afterwards he’ll be coming with me to my home in Crested Butte, Colorado where we’ll be holding the Concurrent Python Developer retreat, July 16-19 2017. This will focus on all the aspects of concurrency in Python, as part of the development work for my book project Concurrent Python. This will be a free ebook, available now and throughout development (it might also turn into a print book if the interest is there). The target audience is people who know Python but don’t know anything about concurrency, so if that’s you, please consider joining the retreat. You can find out more, and register, here.

View or add comments

Pycon 2017

This year I repeated my strategy of “Don’t go to any recorded sessions,” and also did more volunteering (one of these was at the green room, and again one of the session chairs failed to show up so you’ll see me introducing three of the talks).

Dinners

In the past I’ve kind of stumbled into dinner groups, but not always. This time I made more of an effort and was rewarded with dinner friends every night. I think, however, that I’d like to make more of an effort to pre-plan these next year somehow, perhaps using some online tool. Admittedly this could reduce the chances for serendipity, but once a core of people are going to dinner it’s not usually hard to add others.

I wonder if there’s already an app for setting up group dinners. One of the hopeful things that happened is that I discovered that someone I have started to know is a web developer and we’ve agreed to do some experiments. I’ve been approached by any number of people over time who want to build web projects, but these are invariably people who have figured out how to configure a technology (often Wordpress, which I’ve come to loathe) and recommend that I use that particular technology. But in this case, I’ve got a real programmer who can adapt to my arcane needs. As a result, I’ve started imagining asking him to create all kinds of new experiments, and one of these could be a dinner planner app for conferences. The possibilities for this and other ideas are very exciting.

Hunting for Self-Guided Tutorials

For years, many people in Crested Butte have expressed to me an interest in learning to program, while declaring they know nothing about it. Perhaps the idea gestated long enough, or some other experience triggered this, but I recently began formulating a free class for new programmers that relies completely on self-guided tutorials. The benefit of this is that people do not get driven away by the class going either too fast or too slow; by its nature it adapts to the learning experience of each individual. In addition, once anyone has solved one of the steps, they can help their classmates solve that step (indeed, they are probably better at it, being closer to “beginner’s mind”).

The site is EveryoneProgram.com and might still be rough when you look at it, but the content is basically there, including links to the tutorials.

While at Pycon I talked to various folks who are in education to try to see if there were other, better tutorials and ended up with the ones you see on the site. Of course it’s an experiment so we’ll adjust it depending on how things work and new discoveries.

One of my goals is to create new coaches from the people that attend the class. There are other folks in town who can help and take over if I’m traveling, as well.

Open Spaces

Although I attended more open spaces than the ones described here, these were the ones I found most engaging.

Inclusion

This was convened by a young woman who had a poor experience in a workshop. From the context, I don’t think it was one of the pre-conference tutorials; I think it had happened awhile ago but it had left a mark and she needed to discuss it. Others in the session shared similar experiences.

I have not found these kinds of discussions at other conferences. One of the unique things about Pycon is that it (intentionally) has a very high percentage of women attendees and speakers. Guido’s stated goal is 50% and it seems like that number is fairly close in terms of speakers, and I might guess the women attendees could be as high as 35% or more. Minorities of various kinds are also becoming more represented. I wonder if one reason Python’s popularity keeps rising is its inclusiveness.

What Confuses You About Concurrency?

I held this session to gather more ideas to support the development of Concurrent Python. You can find my notes from this session here by scrolling down to that heading.

Improve Your Writing

This was requested directly to me by one person (who ended up not being able to make the session). I had actually submitted a session proposal for something similar, but it wasn’t accepted. We had about 10 people.

To keep it simple I gave three core ideas I think make the biggest impact on someone’s writing:

  1. Less is More. The more words you have, the more work your reader must do. Shorter sentences are easier to read and understand. Multiple editing passes are usually required to achieve short sentences.

  2. Active Voice. You can find some examples here, although these are rather basic. Perhaps we need another category of “super- active” or “direct” voice, as it’s usually possible to make your sentences even more immediate – although it takes even more work to get there. Active voice helps a lot with point 1, because active sentences are almost always shorter than passive sentences.

  3. Read It Aloud. This is a trick I learned from a friend many years ago. When you read silently, your mind tends to skip and skim. When you read every single word out loud, it forces you to scrutinize your prose. Problems jump out at you and you’re surprised you missed such glaring errors. It takes a lot of time (especially for book chapters) but it pays off.

I tried to keep the description of these points short and take questions. Afterward there was general Q&A and discussion.

Starting Teal Organizations

I never know what’s going to happen when I hold one of these sessions. Usually I get curious folks and it gives me practice in explaining the concepts, which you can find at Reinventing Business.

The turnout was larger than I expected, perhaps 8 folks, and we had a stimulating discussion. At this point I’m trying to discover the details, things not covered in the Reinventing Organizations book, such as bringing people into and out of the organization, payment, investing, and things like that. Frederic Laloux (the author) says these topics are different from one Teal organization to another and chooses not to cover them for that reason. But if you’re starting such an organization, you need those structures, even if you just use them as a starting point. I think I can make an important contribution to this field by discovering, collecting and passing on such information.

While I was in Portland I was able to attend a separate one-day open-spaces event around non-violent communication, and there I also held a “Starting Teal Organizations” session, which produced more good discussion.

Sprint days

Last year was my first experience sprinting (If you don’t include what was arguably the very first sprint, coached by Jim Fulton at Zope Corp in Virginia after one of the D.C. Pycons – but I can’t remember much from that one; I do know he tried to teach us Git). Then I went to many different projects and contributed a little to each one, primarily either documentation or testing their onboarding process, but in a few cases I added code. I learned a great deal through this sampling process, but this year I ended up spending all three days (and wishing I had the fourth) on a single project: Beeware.

I’ve been experimenting with and using a small decorator framework for building command-line applications, much like (for example) Click although mine is much simpler. I have a friend or two who would like some automation, so I’ve been imagining that a command-line system would work for them.

Then a friend pointed out that most people have no idea what the shell is, and are unfamiliar with command-line applications. At Pycon it occurred to me that it might be possible to use the same decorator approach, but instead of producing a command-line option, the decorator would insert a menu item in a windowed program, and thus be easier and more familiar for the vast majority of users, while at the same time making the creation of such a program far easier. I went to the Beeware booth and ask the creator, Russell Keith-Magee (whom I had met and spent time with at PyCarribbean), if he thought this was possible and he said yes.

It turns out his answer was a bit … premature. When you go to the Beeware Site, it can be hard to discover where to start. There’s a reason for that: this is a collection of projects, so you must first (A) Know that and (B) Decide what project fits your needs. And even then, some parts of the project are still more visionary than completely fleshed out, so you won’t always find a tutorial or example of what you’re looking for.

I quite like the vision of BeeWare: make Python work everywhere. I think it will get there. We even had some discussions about funding so Russell (and ideally other programmers) could work on it full time, and get it there faster. But be warned that at this writing the vision is only partly there.

The sub-project that met my needs is called Toga, with Docs here. The first thing I discovered was that there were menus for the Mac and I think Linux, but not for Windows. Thus, my effort was spent on learning the Toga architecture and figuring out how to add menus for Windows apps. This was tremendously educational but I’m still not close to creating my “decorators for menus.” However, I’d rather continue working on Beeware until it gets to the point where I can, rather than building something from scratch. This way I’ll produce results that works across platforms, by taking advantage of the Beeware architecture.

Virtual Environments Revisited

Setting up the virtual environment for this project was one of my learning experiences. When you install a package that you want to modify and test, it turns out you must make it editable using the -e flag, as in pip install -e. Otherwise your changes won’t be used, and only the original installation will ever be seen when you run your program. Without knowing this I experienced some frustration until Russell walked me through it. If I hadn’t been doing this in a sprint, I probably would have given up.

In the process, I ended up making multiple installations and building the virtual environment multiple times. I also realized that I sometimes hesitate to build a virtual environment because I must look up and re-learn the configuration commands, and running the activate script is always a bother. If only we could get the computer to do annoying, repetetive things for us! Here’s a Windows batch file that does the trick (anyone with basic shell skills can create the equivalent for Mac/Linux):

@echo off
rem venv.bat

rem Works if you're outside the starting directory:
if defined VIRTUAL_ENV (
  deactivate.bat
) else (
  rem Only works inside starting directory, otherwise
  rem creates a new virtual environment:
  if exist virtualenv (
    virtualenv\Scripts\activate.bat
  ) else (
    python -m venv virtualenv
    virtualenv\Scripts\activate.bat
  )
)

I’m choosing to call my virtual environment directory virtualenv, and so can use that directory’s existence to indicate whether there’s an installed virtual environment.

You can place venv.bat somewhere in your Windows PATH and use it anytime you want to create and use virtual environments. Note that for it to work correctly, you must call it from the directory where the virtualenv subdirectory will be created or already exists.

activate.bat sets VIRTUAL_ENV and deactivate.bat clears it, so VIRTUAL_ENV means that the virtual environment has been activated. This way, running venv just turns it on if it’s off, and off if it’s on, so that’s two less commands I need to remember.

If the virtual environment doesn’t exist, it is created and activated. But while working on Toga, I needed the special -e editable installations in that new environment, so I created a local venv.bat and added the following lines after the virtual environment creation and activation:

  cd src\core
  pip install -e .
  cd ..\winforms
  pip install -e .
  cd ..\..
  pip install -e .

Notes

The rest of the time I sought out experiences and interactions. And this is a place where the smart phone really has enhanced my life: whenever anyone said something interesting, I pulled out the phone and entered it in a list in Google Keep. I used to write notes and then set them aside and forget them, but entering them in Keep and knowing they will show up on all my other devices seems to really make the key difference.

Here are my notes, in no particular order:

The Concurrent Python Project

I’ve started working on a big, ambitious Python project: a book called Concurrent Python which assumes you know Python but that you don’t know anything about concurrency. I considered writing yet another introductory book, but realized there are already plenty of good ones and that I wouldn’t contribute much there. Concurrency, however, is a topic I’ve struggled with over the years and I know I could add some value to that discussion. It also presents opportunities for much more interesting training, conferences, speaking and consulting (these days I’m far more invested in discovering stimulating experiences, instead of just any experiences).

The first chapter has mostly been taken from the Concurrency chapter in On Java 8, which is now available in beta (as in “nearly finished”) form but still requires a little more work. The Atomic Kotlin book has also inserted itself in my schedule. So don’t expect a lot of progress right now, although I am thinking about it and watching for ideas, tools and libraries which I’m capturing in the 00_Notes.md file – feel free to make pull requests or add issues if you think something belongs there.

View or add comments

PyCaribbean Keynote On Youtube

I gave the closing keynote at PyCaribbean, the Python conference held in Puerto Rico. It’s called Science is What Works, and you can see it on YouTube. The slides are available here.

I think a closing keynote should provide perspective and look at things from a higher point of view. By then, the audience is full of technical information and, I believe, looking for relief rather than more code.

Even though I was a physics major as an undergraduate, I’ve begun to realize over the last ten years or so that I didn’t understand what science was, and that’s probably why I was only a mediocre physics student, at best. I thought that science was about “the truth,” so I had a very hard time when the teacher manipulated models, throwing away higher-order terms and arguing that a billiard-balls-and-springs model that produced a specific heat within an order of magnitude was “pretty darn good.”

Had I understood that science is just coming up with a model that fits the data, I wouldn’t have been stuck on ideas of the truth. That is not to say that science doesn’t disprove models, because it certainly does—indeed, that’s all you can know with any certainty, and falsifiability is a requirement for any theory. A model works until it doesn’t, and then you must either make adjustments or throw it away altogether and come up with a new one.

One way to think about science is that it is built on doubt, down to the point that we don’t even bother trying to believe whether our models are true or not. They just fit the data…so far. Whereas what came before the scientific revolution was belief without evidence, and doubt had to be crushed lest the whole operation topple.

Once you understand that science is just models representing parts of the world, you can start observing how those models are created. Some models are purely observational: cell theory in biology (“all organisms are made of cells”) required us to look at every living thing we could get our hands on, through a microscope. In physics, we’re fond of equations, but we have a limited set of approaches to use in order to formulate those equations, because our brains can only deal with so much complexity. The algorithms produced through machine learning have no such limitations, which may produce breakthroughs in science that we have up until now been unable to achieve.

In this presentation, I briefly look at a lot of different sciences and see how they work and when they don’t, and finally ask the question of whether computer science is actually a science (aspects of it certainly seem to be, while other parts are clearly not).

I did learn something important from seeing the video. I used one of Google’s standard slide formats (including color choices), and the video is just a single camera pointed at both me and the screen. This is certainly the easiest way to capture a presentation, and I can’t count the number of videos I’ve watched where they had hired a “professional” who kept pointing the camera at the speaker when the speaker was describing code, and at the screen when there was nothing interesting going on there. But I will be thinking in the future of the one-static-camera capture and ensure that the font and background are contrasty enough, and of course large enough. In the past this wasn’t even an option because pointing a camera at a video projection would cause all kinds of interference; technology has improved to the point where this is no longer an issue.

View or add comments

Constructors Are Not Thread-Safe

When you imagine the construction process, it can be easy to think that it’s thread-safe. After all, no one can even see the new object before it finishes initialization, so how could there be contention over that object? Indeed, the Java Language specification (JLS) confidently states:

“There is no practical need for a constructor to be synchronized, because it would lock the object under construction, which is normally not made available to other threads until all constructors for the object have completed their work.”

Unfortunately, object construction is as vulnerable to shared-memory concurrency problems as anything else. The mechanisms can be more subtle, however.

Consider the automatic creation of a unique identifier for each object using a static field. To test different implementations, we’ll start with an interface:

// HasID.java

public interface HasID {
  int getID();
}

Then implement that interface in an obvious way:

// StaticIDField.java

public class StaticIDField implements HasID {
  private static int counter = 0;
  private int id = counter++;
  public int getID() { return id; }
}

This is about as simple and innocuous a class as you can imagine. It doesn’t even have an explicit constructor to cause problems. To see what happens when we make multiple concurrent tasks that create these objects, here’s a test harness:

// IDChecker.java
import java.util.*;
import java.util.function.*;
import java.util.stream.*;
import java.util.concurrent.*;
import com.google.common.collect.Sets;

public class IDChecker {
  public static int SIZE = 100_000;
  static class MakeObjects
  implements Supplier<List<Integer>> {
    private Supplier<HasID> gen;
    public MakeObjects(Supplier<HasID> gen) {
      this.gen = gen;
    }
    @Override
    public List<Integer> get() {
      return
        Stream.generate(gen)
          .limit(SIZE)
          .map(HasID::getID)
          .collect(Collectors.toList());
    }
  }
  public static void test(Supplier<HasID> gen) {
    CompletableFuture<List<Integer>>
      groupA = CompletableFuture
        .supplyAsync(new MakeObjects(gen)),
      groupB = CompletableFuture
        .supplyAsync(new MakeObjects(gen));
    groupA.thenAcceptBoth(groupB, (a, b) -> {
      System.out.println(
        Sets.intersection(
          Sets.newHashSet(a),
          Sets.newHashSet(b)).size());
    }).join();
  }
}

The MakeObjects class is a Supplier with a get() that produces a List<Integer>. This List is generated by extracting the id from each HasID object. The test() method creates two parallel CompletableFutures that run MakeObjects suppliers, then takes the results of each and uses the Guava library Sets.intersection() to find out how many ids are common between the two List<Integer> (Guava is much faster than using retainAll()).

Now we can test the StaticIDField:

// TestStaticIDField.java

public class TestStaticIDField {
  public static void main(String[] args) {
    IDChecker.test(StaticIDField::new);
  }
}
/* Output:
47643
*/

That’s a rather large number of duplicates. Clearly, a plain static int is not safe to use for construction. Let’s make it thread-safe using an AtomicInteger:

// GuardedIDField.java
import java.util.concurrent.atomic.*;

public class GuardedIDField implements HasID {
  private static AtomicInteger counter =
    new AtomicInteger();
  private int id = counter.getAndAdd(1);
  public int getID() { return id; }
  public static void main(String[] args) {
    IDChecker.test(GuardedIDField::new);
  }
}
/* Output:
0
*/

Constructors have an even more subtle way to share state: through constructor arguments:

// SharedConstructorArgument.java
import java.util.concurrent.atomic.*;

interface SharedArg {
  int get();
}

class Unsafe implements SharedArg {
  private int i = 0;
  public int get() { return i++; }
}

class Safe implements SharedArg {
  private static AtomicInteger counter =
    new AtomicInteger();
  public int get() {
    return counter.getAndAdd(1);
  }
}

class SharedUser implements HasID {
  private final int id;
  public SharedUser(SharedArg sa) {
    id = sa.get();
  }
  @Override
  public int getID() { return id; }
}

public class SharedConstructorArgument {
  public static void main(String[] args) {
    Unsafe unsafe = new Unsafe();
    IDChecker.test(() -> new SharedUser(unsafe));
    Safe safe = new Safe();
    IDChecker.test(() -> new SharedUser(safe));
  }
}
/* Output:
47747
0
*/

Here, the SharedUser constructors share the same argument. Even though SharedUser is using its argument in a completely innocent and reasonable fashion, the way the constructor is called causes collisions. SharedUser cannot even know it is being used this way, much less control it!

synchronized constructors are not supported by the language, but it’s possible to create your own using a synchronized block. Although the JLS states that “… it would lock the object under construction”, this is not true—the constructor is effectively a static method, so a synchronized constructor would actually lock through the class object. We can reproduce this by creating our own static object and locking on that:

// SynchronizedConstructor.java
import java.util.concurrent.atomic.*;

class SyncConstructor implements HasID {
  private final int id;
  private static Object constructorLock = new Object();
  public SyncConstructor(SharedArg sa) {
    synchronized(constructorLock) {
      id = sa.get();
    }
  }
  @Override
  public int getID() { return id; }
}

public class SynchronizedConstructor {
  public static void main(String[] args) {
    Unsafe unsafe = new Unsafe();
    IDChecker.test(() -> new SyncConstructor(unsafe));
  }
}
/* Output:
0
*/

The shared use of the Unsafe class is now safe.

An alternate approach is to make the constructors private (thus preventing inheritance) and provide a static Factory Method to produce new objects:

// SynchronizedFactory.java
import java.util.concurrent.atomic.*;

class SyncFactory implements HasID {
  private final int id;
  private SyncFactory(SharedArg sa) {
    id = sa.get();
  }
  @Override
  public int getID() { return id; }
  public static synchronized
  SyncFactory factory(SharedArg sa) {
    return new SyncFactory(sa);
  }
}

public class SynchronizedFactory {
  public static void main(String[] args) {
    Unsafe unsafe = new Unsafe();
    IDChecker.test(() ->
      SyncFactory.factory(unsafe));
  }
}
/* Output:
0
*/

By synchronizing the static Factory Method you lock on the class object during construction.

These examples emphasize how insidiously difficult it is to detect and manage shared state in concurrent Java programs. Even if you take the “share nothing” strategy, it’s remarkably easy for accidental sharing to take place.

View or add comments