October 15, 2010

Maven 3 and Plugin Mysteries

You probably know that Maven 3 has landed. Before testing it with our projects, I was curious about the plugins that are defined in the Maven master POM's pluginManagement section and hence are locked-down with respect to their version. Since all projects inherit from this master POM, they will use the respective version of the plugins if not explicitely overwritten anywhere in the project's POM hierarchy.

Maven 3 is a bit more strict concerning automatic version resolution of invoked plugins. Other than Maven 2, it will always use the latest release (i.e. non-SNAPSHOT) version of a plugin if there was no explicit version specified in the POM or on the command line. Moreover, it will issue a warning when missing plugin versions are detected "to encourage the addition of plugin versions to the POM or one of its parent POMs". This is to increase reproducability of builds.

Thus, in Maven 3 the desired build stability is ensured by urging the POM author to give explicit plugin versions, and doesn't any more rely on a full list of plugins (with versions) defined in the master POM. That's why I expected to find a small or even empty pluginManagement section. Well, let's see.

To find out what's in the pluginManagement of master POM, you just have to create a minimal POM and show the effective POM (that results from the application of interpolation and inheritance, including master POM and active profiles) by calling help:effective-pom for this simple project.

So, what do we get? The following list shows the plugin versions that are defined in the Maven 2.2.1 master POM, the Maven 3 master POM, as well as the most recent version of those plugins.

Well, we can see some interesting details here:

  • The number of plugins defined in the master POMs pluginManagement section is drastically less for Maven 3 than for Maven 2.2.1 – that's what we expected. However, there are still a few.

  • Which pluings are listed and which are not? It seems like the plugins for the most basic lifecycle phases (like clean, install, deploy) are predefined, but others are not (like compile or jar). Is there any policy?

  • What is really odd: for some of the plugins that are predefined, there is a newer version available than is listed in the Maven 3 master POM (colored red). Why could that be? I have not checked, but Maven 3 is out for a few days now, so I suspect for most of those plugins the new version has been available before. Is that intentionally? Are the new versions not considered "good" or "stable" by the Maven guys? Or did they just forgot to upgrade? Or did not found it important in any kind?

  • Another thing I can't explain: when you look on the Maven 3 Project Plugin Management site, there are listed a lot more plugins, and some are even of other version than what we got by showing the effective POM for a minimal project POM. How could this be? I have no clue...

In a previous post, I have listed the plugins predefined by Maven 3.0-alpha5. Interestingly, there have been a lot more of them (like for Maven 2.2.1), but the "stale version" question was the same...

October 13, 2010

World of Versioning

Today, we had a discussion on how to name a hotfix release of our framework product, built with Maven (you knew I'm a fan of Maven, didn't you?). It's a very basic question, but still an interesting one and it opens a whole universe of ideas, opinions and rules...

The previous versions of our product had been named like this:

1.3.0. 1.3.1 ... 1.4.0, 1.4.1, ... 1.5.0, 1.5.1, 1.5.2, ... 1.5.6

They all are based on a release plan and contain bugfixes as well as improvements and new features. For each of those versions, we have written release notes and built a site.

Now, what do we do when there is the need to release a bugfix version of a regular release we built a few days ago? There are some options:

  1. 1.5.7 – i.e. increment last number; however, this doesn't seem to fit well because the bugfix release is of another character than standard releases
  2. 1.5.6.1 – i.e. add an additional numerical identifier
  3. 1.5.6.a – i.e. add another non-numerical identifier
  4. 1.5.6-patch1 – i.e. add another qualifier describing it's actually a patch release

When searching the Net for version number rules in the Maven world, you'll stumble upon the DefaultArtifactVersion class in the core of Maven which expects that version numbers will follow a specific format:

<MajorVersion [> . <MinorVersion [> . <IncrementalVersion ] ] [> - <BuildNumber | Qualifier ]>

Where MajorVersion, MinorVersion, IncrementalVersion and BuildNumber are all numeric and Qualifier is a string. If your version number does not match this format, then the entire version number is treated as being the Qualifier (see Versions Maven Plugin).

This means, options 1 and 4 of above would be a viable alternative in the Maven world. However, note that there is some discussion about this Maven schema. It suffers from inconsistent/unintuitive parsing, lexically sorting of qualifiers and some other flaws. This would yield to unexpected comparison results especially when using Maven SNAPSHOT versions. The Proposal given on that page seems to be integrated with Maven 3.

Actually, we wouldn't have this discussion if the third level would not be named Incremental version in Maven world, but rather bugfix version or patch version. There is a Semantic Versioning Specification (SemVer) that recommends this version schema:

A normal version number MUST take the form X.Y.Z where X, Y, and Z are integers. X is the major version, Y is the minor version, and Z is the patch version. Each element MUST increase numerically. For instance: 1.9.0 < 1.10.0 < 1.11.0.

There are some rules describing when to increase which part. The main idea is to use the first numerical (major version) to indicate backwards incompatible changes to the public API, and in contrast the last numerical (patch version) suggests that only backwards compatible bug fixes have been introduced.

This SemVer schema is fully compatible with Maven (regardless of SNAPSHOT versions). If we had used this, we would probably have ended up in a "higher" version number like 5.4.0, but now the upcoming patch would have the version number 5.4.1 without any consideration.

By the way, a lot of public recommendations for software versioning follow this <major>.<minor>.<patch> schema. See this question and Wikipedia for more information on Software Versioning.

So. What do we do now? We'll release a version 1.5.6-patch1 for the patch, but think about changing our versioning according to SemVer, i.e. to upgrade the major number when introducing incompatible changes, and the minor number in most other cases.

September 27, 2010

Fix Foreign Code

Well, finally, I'm back! I have been busy working on-site for a customer of my company, helping to fix their project and increase quality to successfully conduct the rollout. Additionally, I spent my evenings working as a release manager and keeper of the Maven based infrastructure for several projects developed in-house. So this was more than a fulltime job and unfortunately no time was left to read or write blog posts :-(

However, that project assignment is nearly over now and I intend to write more regularly now about my findings, trials and tribulations.

Foreign Code Dilemma

My main task when working for our customer was to fix bugs and improve quality of their application, which was nearly completely implemented with respect to use cases and business requirements. This is a situation that might be known to most developers: you are thrown into a project you don't know much of, lots of source code is already implemented, quality is, well, varying, and some important milestone or release date is right ahead. This is what I call the Foreign Code Dilemma.

What do you do to quickly get up to speed and rescue the project? Well, there are some things that I find quite useful in situations like this. In no particular order...

Introduce Continuous Integration

It should be common sense these days that Continuous Integration (CI) is able to improve software quality and reduce integration issues as well as overall risks. CI is a software development practice where changes are integrated frequently – usually at least daily – and the result is verified by an automated build and test to detect issues as quickly as possible. The distinguished article about Continuous Integration by Martin Fowler is a must-read.

Fortunately, the customer's project already provided automated Ant build scripts to checkout, build and test the software. Moreover, they were running on a Cruise Control server each night, so we were quite close.

The first thing I did was to move to Hudson, the best integration server available today (if you'd ask me). The transition was quite smooth and done within a few hours, including setting up a brand new build server. If you're still using Cruise Control, you really should consider to move over to Hudson... I think I should post about the cool distribution feature of Hudson soon.

One issue with the project was the build time: a full build takes 3-4 hours, mainly due to long-running unit and selenium test cases. Of course, this inhibits doing real CI. All we could do for now was to split up the build into the four main tasks, creating a Hudson job for each of them: (1) checkout & compile & package, (2) static code checks, (3) unit tests, (4) selenium tests. Since (1) and (2) are running rather quick (about 10 min) those jobs qualify for CI builds. This is not perfect but still better than doing no CI at all.

Introduce Test Cases

Test cases are an essential part of a software development project these days, and I always consider a tasks not being finished unless there are test cases ensuring that the functionality is implemented correctly. I'm sure you agree ;-)

The project I was working on had lots of JUnit test cases, as well as hundreds of Selenium tests checking the web application in the browser. That's not bad, really. Nevertheless, there were two issues:

  • The number of test cases does not tell anything about the test coverage. For example, the Selenium tests all did test a "happy day" scenario, moving through the wizard pages of the web application straight from the first to the end. But does it still work, for instance, if you step to the forth page, choose some options on that page, step back two pages, change an option, and go to the forth page again? Nobody tested.

  • Selenium web tests are slow, which is no surprise taken the fact that the tests are running in a browser and need to connect to the deployed web application. In my project, the full test suite took more than 3 hours to run... What's even worse is that some of the JUnit tests have not been designed as unit tests, i.e. they required a full service stack to run successfully, as such being more an integration than a unit test. As a consequence, those tests require to startup all services which takes a lot of time.

Thus, the task for this project actually was not to introduce, but to improve unit tests: increase code coverage and separate unit from integration tests. This way, unit tests can be run within CI builds, providing a quick result for the quality of committed code.

Introduce Code Metrics

When more than a few people are working on a project, establishing a coding standard is usually a rewarding idea. It helps you to be comfortable with the sources of anybody else from your team, and when doing code comparison you don't see differences all the time that are just caused by reformatting, hiding the significant changes.

If you have defined such coding standards, you need to check them. Checkstyle is the tool of choice. Here is what you should do:

  • Define a Checkstyle configuration to be used for your project. Discuss the rules with developers and stakeholders.
  • Run Checkstyle with your CI and/or nightly builds to create a report, including a list of violations for defined rules.
  • Establish Checkstyle within your IDE of choice to provide immediate feedback to the developers before they commit.
  • Define which exceptions to the rule are acceptable (should not be more than a dozen or so) and suppress them permanently, using Checkstyle suppression filters.
  • Get rid of all remaining violations, which might take a few days of effort. Still, this investment will pay off.
  • Once the number of Checkstyle violations is "small" (meaning less than 10, ideally zero), make sure it remains small.
  • Establish a team culture where committing code with Checkstyle violations is anything else but cool.

That works quite well in my experience. For the mentioned project, we already had common Eclipse formatting settings, but Checkstyle helped to further improve the code and people adopted it right from the start.

The Debugger is Your Best Friend

When you have to fix bugs in code you have never seen before, use the debugger as much as possible. To find the hot spot, you usually don't have to read or understand the whole class or even hierarchies of classes. Thus, it'll save you a lot of time when you don't start with code reviews but use the debugger to find the piece of code to blame.

BTW, the same applies to the look and feel of web applications. Instead of consulting lots of layout code and stylesheets, use browser tools like Firebug to debug pages, styles and JavaScript code (including Ajax requests) right in the displayed page.

Of course, this approach is not appropriate when fixing larger design issues...

Don't Be Shy!

When using this toolset, you shouldn't be shy. If you think some code needs refactoring, do so – maybe not a week before going live, but you get the point. The CI build should give you immediate feedback if the change could be integrated, and the tests will tell you if everything still works. Take your chance to improve the code. If your change anyhow is causing an issue, fix it, add another test and don't be discouraged!

April 19, 2010

HDD / SSD Battle

The Problem

You know, the laptop I'm using for my daily work job is not the fastest one. In contrast, it's more than 5 years old and pretty slow. Yeah I know, hardware can never be fast enought, but it's really slow considering the things I have to work on.

For instance, we are using xtext modeling and hence usually have a couple of Eclipse instances running at the same time (outer & inner workbench), additionally to using m2eclipse to build the projects with Maven in Eclipse. Moreover, we have some quite big workspaces with tens of thousands of class files.

All of this is probably not unusual, but unfortunately too much for my poor old laptop. It takes minutes to start or end Eclipse, not to mention the times required for cleaning all projects. However, my company currently does not really like the idea to buy new laptops so we have to find ways to speed things up without spending too much money. I have blogged before about some ways to speeding up your system.

The Solution?

It's pretty clear that the bottleneck is the hard drive currently. We have proven this by some inspection tools, the drive is working hard all the time when executing some build, for instance. Now we managed to get a solid state drive (SSD) to test the performance improvements it would offer. Well, fasten your seatbelt...

We have measured some typical tasks with real data and projects on a developer's laptop – first with the built-in hard disk, then after installing the SSD and copying the harddrive content over. Note that we have tried to make a fair comparison, keeping the setup indentical in both scenarios. These are the results.

The Battle

Working With Eclipse:

  • Start Eclipse 3.5.1 with an empty workspace until Welcome screen is displayed: 52 s → 12 s (factor 4.3)
  • Start Eclipse with a medium-size workspace: 125 s → 30 s (factor 4.2)
  • Clean all projects in that workspace: 445 s → 115 s (factor 3.9)
  • Exit Eclipse and wait until workspace is saved: 28 s → 7 s (factor 4.0)

Working With Maven:

  • Maven "clean install" in mdium-size project: 668 s → 336 s (factor 2.0)

Booting Windows:

  • Turn on computer and wait for login screen: 62 s → 33 s (factor 1.9)
  • After login, until Windows is ready (autostart applications are loaded): 135 s → 44 s (factor 3.1)

The Bottom Line

As you can see, the SSD is speeding up boot time by factor 2-3, which already is impressive. Maven build usually gets executed 2 times faster. Eclipse speed-up is even more, namely around factor 4. That's pretty cool! You really feel the performance difference!

Additionally, after some more weeks of testing, what we like most is that the whole system feels much more reactive; that is, when executing some big job like rebuild a huge workspace, you can switch context and nicely work in another instance of Eclipse, for instance – a single tasks is not blocking the whole system any more.

All in all, that's an incredible speed-up considering the prices of SSD! Now, go and tell your boss ;-)

March 26, 2010

Having Fun with Encoding!

Compiler Plugin

Recently, I edited our main company root POM to upgrade some plugins to new versions. Of course, we are following best practice to lock down the plugin version, so when a new version is available we only need to adjust the parent POM. Nearly all version updates were on the last build number digit, which is the z in x.y.z version string – so I didn't expect much difficulties.

However, for the compiler plugin, it was a jump from version 2.0.2 to 2.1, and indeed it turned out that some of the test cases failed compiling with strange encoding issues when using the new compiler plugin version.

Specify Encoding

We are following the suggestion to specify a POM property for source file encoding, for not being forced to configure encoding for all relevant plugins individually. Moreover, we were exactly using what's shown in the example:

<project>
...
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
...
</properties>
...
</project>
That is, we assumed our source files were UTF-8 encoded, which is the most widely used encoding for unicode characters. But, for some of the projects, that's actually not the case since we are using Eclipse with the default setting for text file encoding which is Cp1252 (Western European) on our german Windows.

Why didn't we ever notice that? Well, it happens that both the UTF-8 as well as Cp1252 encodings are backwards compatible with ASCII. We are coding most of the stuff in english (concerning package, class, method, attribute and parameter names, and even Javadoc comments), so the resulting byte stream will never be different for both encodings. However, some of the files used german umlauts in line comments which are exactly the files that can't be compiled any more with new compiler plugin version.

When looking at the debug output of compiler plugin 2.0.2 mojo configuration, you can see that the encoding is not explicitely set, probably meaning that the platform default encoding is used (which is again Cp1252 on all build machines):

[DEBUG] Configuring mojo 'org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile' -->
[DEBUG] (f) basedir = ...
[DEBUG] (f) buildDirectory = ...
[DEBUG] (f) classpathElements = [...]
[DEBUG] (f) compileSourceRoots = [...]
[DEBUG] (f) compilerId = javac
[DEBUG] (f) debug = true
[DEBUG] (f) failOnError = true
[DEBUG] (f) fork = false
[DEBUG] (f) optimize = true
[DEBUG] (f) outputDirectory = ...
[DEBUG] (f) outputFileName = xxx-0.2.0-SNAPSHOT
[DEBUG] (f) projectArtifact = xxx:jar:0.2.0-SNAPSHOT
[DEBUG] (f) showDeprecation = false
[DEBUG] (f) showWarnings = false
[DEBUG] (f) source = 1.6
[DEBUG] (f) staleMillis = 0
[DEBUG] (f) target = 1.6
[DEBUG] (f) verbose = false
[DEBUG] -- end configuration --

The new version 2.1 of compiler plugin is now considering what has been configured in project.build.sourceEncoding property, and hence tries to compile the Cp1252 coded source file with UTF-8 encoding which doesn't work when umlauts are used.

Specify Correct Encoding

Of course, the solution is to specify the correct encoding in project.build.sourceEncoding property, matching the encoding that is used in the development environment when writing the source files.

Oh, yes, Cp1252 is quite similar to ISO 8859-1 encoding (only some special characters on positions 0x80–0x9F are different which we don't use), so in fact we are using ISO 8859-1 now to allow builds on non-Windows platforms as well.

Certainly, it would be nice if the plugins had a history on their site where you can find this type of changes for new versions, without having to search in the Jira...

February 26, 2010

Eclipse: Update Manager Needs Update!

Eclipse Update Manager is really a special piece of software... I have blogged before about my battle, and here is another one.

We have a simple Eclipse plugin (created by xtext to provide an editor for our DSL, but actually this doesn't matter). I have a particular version (let's say 1.0.0) of that installed in my Eclipse 3.5.1. Now I want to upgrade to 1.1.0, but unfortunately the feature id has changed, so I need to uninstall my 1.0.0 version prior to installing the new one.

But... when I try to uninstall this plugin Eclipse tells me that it is "Calculating requirements and dependencies". To do so, Eclipse downloads a lot of stuff, including Eclipse features, mylyn, and much more. Seems to be half the internet which takes while. And then, about 15 min later, Eclipse tells me that it could not find a download site for some weird mozilla plugin.

Hello? What's that? I want to uninstall a plugin and Eclipse downloads tons of jars only to tell me that one is missing and it couldn't uninstall? Gosh!

After some googling, I found a trick to force Eclipse to just do what I want:

  1. In Eclipse Preferences, on Install/Update > Available Software Sites page, export all sites to your filesystem.
  2. Then remove all update sites and press OK.
  3. Now uninstall the plugin -- for me, it just worked like a charm.
  4. After restarting Eclipse, open Install/Update > Available Software Sites again and import the previously exported update sites.

That's it. Maybe just pulling the network cable would have worked, too... Oh boy.

February 20, 2010

Maven vs. Ant: Stop the Battle

Maven? Ant?

Oh boy, how this bothers me. The endless debate and religious battle about which build tool is the better build tool, no, is the one and only right build tool...

There are many people out there who love Ant, who defend Ant with their blood and honour. That's fine, but some of them at the same time shoot at Maven. There is so much rant about Maven, so much unfair allegation and just plain wrong claims. This is just one example that has been discussed in the community lately.

Don't get me wrong. Maven has its flaws and issues, sure, and you don't have to like it. Use Ant, or Gradle, or Buildr, or Schmant, or batch files, or anything else if you like that more. But, Maven definitely can be used to build complex software projects, and lots of people are doing exactly that; and guess what -- some of them even like this tool... So, can everybody just please use what he or she likes the most for building their software, and stop throwing mud at each other? Let's get back to work. Let's put our effort in building good software.

We've Come a Long Way...

You may have guessed, I think Maven is the best build tool, at least for the type of projects I am dealing with in my company. We have started using a complex system of mutual calling batch files long time ago, and switched to Ant in 2000. That was a huge step ahead, but still it was a complex system with lots of Ant-Scripts on different levels. So we moved to Maven 1.0.2 in 2004 for another project. That brought nice configuration and reporting features, but still did not feel right, especially for multi-module projects that were not supported in the Maven core at that time.

When Maven 2 came out, we adopted that early and suffered from many teething troubles, but nevertheless we were sure to be on the right track. Today, Maven is a mature, stable, convenient build tool for all our projects, and the first time we are quite happy with how it works and what it provides. Moreover, it sound really great what the brave guys from Sonatype have in their pipeline: Maven 3, Tycho, and all those nice tools like Nexus and m2eclipse...

Hence, I am happy and honestly don't really care very much about what the blogosphere is telling about Maven. But the sad thing is, my colleagues (mostly used to Ant build systems) are complaining with the same weird theses about Maven. I'll give you one example.

The Inhouse Battle

In my current project, we create EJBs in some JARs and assemble an EAR file for the whole application. Now we have to create another RAR to be put in the EAR, so I setup a new project (following Maven's convention "one project, one artifact") for the RAR. This is what the "Ant guys" didn't like: "Why can't Maven create that RAR within the main project, you know Ant could do that, so maybe we should use Ant here again, why have so many small projects, this is polluting our Eclipse's Project View, so much complexity, Maven sucks, I knew that before, blah blah..."

Well, I tried to explain that Maven of course can be configured to create multiple artifacts per project, but that's not the recommended way because it violates Maven convention. It's all about modularity and standardization. That is how Maven works, and it's great this way. A small project is not much overhead at all, it is going to have a clean and simple POM, and by the way we discovered a dependency cycle in the code that had to be fixed in order to move the RAR code into a separate module.

So, what's wrong with Maven? Is it just that you want to do it your way and not to subordinate the Maven way? A matter of honor and ego? Is that enough to kick out Maven and go back to your Ant and script based build system (which BTW is so complex that only few guys really know how it works)? Come on.

The Bottom Line

IMHO, standardization of build systems is one of the main benefits that Maven brought to the world. If you know one Maven project, you can switch to any other project built with Maven and feel comfortable immediately. This increases productivity, both personally and for your company, which is one of the reasons more and more companies switch over from Ant to Maven.
We have clean conventions, a nice project structure, and a highly modular system. And, we have world class reporting with minimal effort.

You see, that's why we are using Maven. If you don't like it, go your own way but let us just do our job.

February 8, 2010

@Override Changes in Java 6

Today I have ported a Java 6 project back to Java 5. This led to compiler failures in Eclipse, but not in Maven which seemed quite strange at first glance. Interestingly, they are caused by the @Override annotation.

The Java 5 API for @Override says:

Indicates that a method declaration is intended to override a method declaration in a superclass. If a method is annotated with this annotation type but does not override a superclass method, compilers are required to generate an error message.

Note that it says "superclass", not "supertype". Hence, it's not allowed to add this annotation to methods that implement methods of an interface. Javac (which is called by Maven) does not report this as an error, but the Eclipse compiler does.

Well, if you take a look at Java 6, the API didn't change at all so I was surprised to see a different behavior: the @Override annotation is allowed for methods implementing interface methods in Javac, too. In the end, I had to remove those annotations to make the code compile with Java 5 in Eclipse.

After some googling, I found out that this has just been forgotten by Sun developers: the compiler's behavior is changed but the documentation does not reflect that (see here). And indeed, when you look at the API of @Override in upcoming Java 7 it looks like:

Indicates that a method declaration is intended to override a method declaration in a supertype. If a method is annotated with this annotation type compilers are required to generate an error message unless at least one of the following conditions hold:
  • The method does override or implement a method declared in a supertype.
  • The method has a signature that is override-equivalent to that of any public method declared in Object.

Here you got it: @Override may now bee used for interface methods, too.

February 2, 2010

Optimization: Don't do it... The compiler will!

The Two Rules of Program Optimization

I've seen some bad code lately which was designed in an effort to improve performance. For instance, there was a long method (80 lines) that was not split into several methods for a single reason: to avoid the method call overhead (around 15 nanoseconds!). The result was code that was just hard to read.

This reminded me of the rules of program optimization (coined by Michael A. Jackson, a British computer scientist) we were teached back on university:
The First Rule of Program Optimization: Don't do it.
The Second Rule of Program Optimization (for experts only!): Don't do it yet.

Well, this is true for mainly two reasons:

  1. Optimization can reduce readability and add code that is used only to improve the performance. This may complicate programs or systems, making them harder to maintain and debug.
  2. Doing optimizations most of the time means we think to be smarter than the compiler, which is just plain wrong more often than not.

Cleaner Code

Donald Knuth said "Premature optimization is the root of all evil". Whereas "Premature optimization" means that a programmer lets performance considerations drive the design of his code. This can result in a design that is not as clean as it could have been, because the code is complicated by the optimization and the programmer is distracted by optimizing.

Therefore, if performance tests reveal that optimization or performance tuning really have to be done, they usually should be done at the end of the development stage.

Wrong Intuitions

This is what Sun Microsystem's Technology Evangelist Brian Goetz thinks: "Most performance problems these days are consequences of architecture, not coding – making too many database calls or serializing everything to XML back and forth a million times. These processes are usually going on outside the code you wrote and look at every day, but they are really the source of performance problems. So if you just go by what you're familiar with, you're on the wrong track. This is a mistake that developers have always been subject to, and the more complex the application, the more it depends on code you didn't write. Hence, the more likely it is that the problem is outside of your code." Right he is!

Smarter Compiler

Often, the best way to write fast code in Java applications is to write dumb code – code that is straightforward, clean, and follows the most obvious object-oriented principles in order to get the best compiler optimization. Compilers are big pattern-matching engines, written by humans who have schedules and time budgets, so they focus their efforts on the most common code patterns, in order to get the most leverage. Usually hacked-up, bit-banging code that looks really clever will get poorer results because the compiler can't optimize effectively.

A good example is string concatenation in Java (see this conversation with Java Champion Heinz Kabutz where he gives some measures)...

  1. Back in the early days, we all used the String addition (+ operator) to concatenate Strings:
    return s1 + s2 + s3;
    However, since Strings are immutable, the compiled code will create many temporary String objects, which can strain the garbage collector.
  2. That's why we were told to use StringBuffer instead:
    return new StringBuffer().append(s1).append(s2).append(s3).toString();
    That was around 3-5 times faster those days, but the code became less readable. Was it worth it? Is your code doing enough String concatenation to make you really feel a difference after you (for instance) made that execute three times faster?
  3. Is that still the recommended way? A main downside of StringBuffer is its thread safety that is usually not required (since they are not shared between threads), but slows things down. Hence, the StringBuilder class was introduced in Java 5, which is almost the same as StringBuffer, except it's not thread-safe. So, using StringBuilder is expected to be significantly faster, and know what? When Strings are added using the + operator, the compiler in Java 5 and 6 will automatically use StringBuilder:
    return s1 + s2 + s3;
    Clean, easy to understand, and quick. Note that this optimization will not occur if StringBuffer is hard-coded!

That was just one example.... All in all, it's quite simple: today's Java JIT Compilers are highly optimized and clever in optimizing your code. Trust them. Don't try to be even more clever. You aren't!

January 13, 2010

Concurrent Builds with Hudson

Multiple Build Executors

We are using the Hudson Continuous Integration Server for our integration builds and are quite happy with it. It is fast, stable, feature-rich, extensible, well integrated with Maven and has an appealing user interface.

One of the nice features that we are using regularly is the Build Executor setting that allows you to specify the number of simultaneous builds. This is useful to increase throughput of Hudson on multi-core processor systems, where the number of executors should (at least) match the number of available cores.

However, Maven isn't really designed for running multiple instances simultaneously since the local respository isn't multi-process safe. The chance for conflicts seems small (multiple processes must access the same dependency at the same time, at least one of them writing). However, in praxis, we encounter this type of concurrency issue at least once a day now, which is starting to hurt us! The build is failing with a message like this:

[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Failed to resolve artifact.

GET request of: some/group/some-artifact-1.2.3-SNAPSHOT.jar from my-repo failed
some.group:some-artifact:jar:1.2.3-SNAPSHOT
...

Caused by I/O exception: ...some-artifact-1.2.3-SNAPSHOT.jar.tmp (The requested operation cannot be performed on a file with a user-mapped section open)

or this:

[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Failed to resolve artifact.

Error copying temporary file to the final destination: Failed to copy full contents from ...some-artifact-1.2.3-SNAPSHOT.jar.tmp to ...\some-artifact-1.2.3-SNAPSHOT.jar

The reason is, the JAR file is locked by another process that is executing some long-lasting test cases, for instance. At the same time, a second build tries to download a new version of this snapshot into the local repository, which is done with the help of the mentioned .tmp file.

Safe Maven Repository

The only way to avoid this type of issue is to use separate local Maven Repositories for each of the processes. You can tell Maven to use a custom local repository location by specifying the localRepository setting in your settings.xml file.

In Hudson, this is even more convenient. There is a checkbox Use private Maven repository in the advanced part of the Build section of Maven projects. Just click that to setup a private local Maven repo for that project. You should consider to do so when you run into the described issue now and then.

Obviously using private repos will increase the total amount of disk space due to caching the same dependencies in multiple places. Additionally, the first build will take significantly more time because everything has to be downloaded once. However, both consequences are well acceptable given the better stability and isolation of projects.

Instead of clicking the Hudson checkbox for all your projects, you should consider to setup the local Maven repo in your settings.xml instead. This has a number of advantages:


  • You don't have to setup the option for each and every project, but have it in a central place.
  • You can use a common root for all local Maven repos, like d:/maven-repo. This allows you to easily purge all your local repositories from time to time, in order to reduce disk space as well as validate the content (i.e. make sure the build is still running in a clean environment and all required artifacts are in your corporate Maven repository).

For instance, here is what works fine for us:

<localRepository>d:/builds/.m2/${env.JOB_NAME}/repository</localRepository>

This is using a Hudson environment variable (JOB_NAME) to create subfolders for the actual projects aka jobs. See here for a list of available variables.

Oh yes, what I suggest is also encouraged by Brian Fox in his Maven Continuous Integration Best Practices blog post, so you should consider twice to adopt this best practice :o)

January 2, 2010

Cargo Maven Plugin: Not Made for JBoss

Again...

Well, actually this blog was supposed to be about Java in general and all the ups and downs I experience during my daily work. However, I've not been doing much other than Maven configuration and build management lately, so here is another Maven related post. Sorry folks.

As already shown in this post, I have been doing integration tests with JBoss by using the Cargo Maven plugin to start the JBoss locally and deploy the application to it. This all works quite as soon as you have figured out how to configure Cargo for JBoss.

But Remotely Now!

Now, the next step is to deploy our EAR file which is generated during nightly build to a running JBoss instance on a separate computer. This is different because no JBoss configuration has to be created locally and no JBoss has to be started. Instead, the EAR file must be transferred to a remote server where JBoss is already running, and JBoss must be persuaded to deploy this file.

That sounds feasible, and I've done exactly this before for other servers like Tomcat, so I did not expect any issue here. However, I was wrong.

Itch #1

First trouble was caused by my lack of knowledge regarding JBoss. With standard installation, you are not able to connect to the server remotely and all the services are bound to localhost only (see here or here). This is intentionally, to prevent unprotected installations appearing all over the net. You have to pass the option -b 0.0.0.0 when starting JBoss to allow remote connections to the services, but take care to secure your JBoss accordingly!

Itch #2

Okay, after this has been configured, I tried to use Cargo to deploy my EAR file to JBoss. This is the configuration I ended up with:

<!-- *** Cargo plugin: deploy the application to running JBoss *** -->
<plugin>
<groupId>org.codehaus.cargo</groupId>
<artifactId>cargo-maven2-plugin</artifactId>
<version>1.0</version>
<configuration>
<wait>false</wait>
<!-- Container configuration -->
<container>
<containerId>jboss5x</containerId>
<type>remote</type>
</container>
<!-- Configuration to use with the Container -->
<configuration>
<type>runtime</type>
<properties>
<cargo.hostname>...</cargo.hostname>
<cargo.servlet.port>8080</cargo.servlet.port>
</properties>
</configuration>
<!-- Deployer configuration -->
<deployer>
<type>remote</type>
<deployables>
<deployable>
<location>...</location>
</deployable>
</deployables>
</deployer>
</configuration>

<executions>
<execution>
<id>deploy</id>
<phase>deploy</phase>
<goals>
<goal>deployer-redeploy</goal>
</goals>
</execution>
</executions>
</plugin>

However, I always got this error message:

[INFO] Failed to deploy to [http://...]
Server returned HTTP response code: 500 for URL: ...

The configuration seems to be correct, so what is the problem?

After asking Google, I realized that Cargo is not able to transfer a file to JBoss! Instead, it requires the deployable to be deployed to be present on the server filesystem (see here). This is obviously caused by the JBoss JMX deployer which is used by Cargo, but actually you don't care who is to blame – you just want it to work. The name "Cargo" implies the parcel is transferred to its destination, right? Also note that this issue is dated from Sep 2006, so there has been some time to fix it in either way.

What Can We Do?

Well, there are probably not many options. Since current version of Cargo is not able to transfer the file to the server, you'd have to do this on your own. The location given in our Cargo configuration above actually is the path on the JBoss server. So, when the file exists locally on the JBoss server, Cargo should be able to deploy it successfully.

For transferring the file to JBoss server, we could use the maven-dependency-plugin, a quite useful plugin for all kind of analyzing, copying or unpacking artifacts. We configure it to run in pre-integration-test phase and to copy the EAR file (produced by this POM) to some temp directory on the JBoss server:

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>copy</id>
<phase>install</phase>
<goals>
<goal>copy</goal>
</goals>
<configuration>
<artifactItems>
<artifactItem>
<groupId>${project.groupId}</groupId>
<artifactId>${project.artifactId}</artifactId>
<version>${project.version}</version>
<type>${project.packaging}</type>
<destFileName>test.ear</destFileName>
</artifactItem>
</artifactItems>
<outputDirectory>${publish.tempdir}</outputDirectory>
<overWrite>true</overWrite>
</configuration>
</execution>
</executions>
</plugin>

The property ${publish.tempdir} can be anything on the JBoss server (which must be available in the network!) and is exactly what has to be used for the value of location element in Cargo configuration.

Another option would be to use the hot-deploy directory of JBoss as outputDirectory for the dependency plugin, and hence rely on hot deployment of JBoss instead of Cargo and JBoss JMX deployer. This way, we could get rid of Cargo configuration and cleanup the POM a bit, but in the end it seemed a bit less clean to me... your mileage may vary.

So, as always, in the end we got it to work, but not without unforeseen pain. When will Cargo be fixed to get the EAR file to JBoss server? Who knows.

Updates

2010/01/22: Note that the dependency plugin must be bound after the install phase so that the artifact has been copied at least to your local Maven repository. As a consequence, the Cargo plugin must be run in deploy phase, which is actually a good choice anyways. I have changed this in my code above.