Java Moods: March 2009

March 27, 2009

Maven Repositories: define in POM or settings?

If you are using Maven for more than just playing, you certainly have a repository Manager installed to both proxy artifacts downloaded from public repositories, and host your own artifacts to make them available to other team members and teams. (If you really really don't have one yet, consider using Nexus, an open source Maven repository manager created by Sonatype.)

Well, so you use a repository manager. Now, you need to tell Maven to use it to download all the missing artifacts. Moreover, as an organization, you usually want to control where the artifacts are downloaded from. This means you need to make sure that all developers are using the identical set of repositories for all the projects.

There are two placed you could use to configure your repositories: in the project's POM, or in the <settings> element on the settings.xml file. This post will discuss both ways and tell you which one you should use ;-)

The POM

The "innocent way" is to add a definition like this to your POM:

<repositories>
  <repository>
    <id>internal</id>
    <name>Company internal repo</name>
    <url>http://your.company.org/nexus/content/groups/public</url>
    <releases>
      <enabled>true</enabled>
      <updatePolicy>always</updatePolicy>
      <checksumPolicy>warn</checksumPolicy>
    </releases>
    <snapshots>
      <enabled>false</enabled>
    </snapshots>
  </repository>
</repositories>

Maven uses all declared repositories to find missing artifacts. If it can't find what it's looking for, Maven will also fall back to the repository central which is defined in the built-in parent POM. However, this is usually not what you want; instead, all artifacts should be proxied by your repository manager.

This can be prevented by "overloading" the central repository with your own repository manager, i.e. you just add a definition with <id>central</id>.

When using a central repository manager, the definition of repositories should be the same for all your projects. This is usually done by putting this stuff to a company's base POM. But... if someone is starting in a clean environment, Maven will have to know where the repository is to find the project's parent POM, where it will find where the repository is... you're stuck.

Moreover, any POM that is included by transitive dependencies may specifiy additional repositories which are not redirected like the central repo. These external repositories are used by Maven to find all dependencies, even your internal ones that for sure are not hosted there; and you still do not really control where the artifacts are coming from.

Hence, using POM to define your repositories does not really solve any problem. Just don't do that!

The Settings

So, we end up with the alternative and recommended way of defining your repositories: the settings.

To really make sure that all developers and all projects are using the identical set of repositories, you should use mirrors to tell Maven to redirect all artifact requests to your internal repository manager.

Both the mirror and repository settings can be defined in the settings.xml file. I think I will do another post to explain what a good setup would look like...

There are two locations for settings.xml file:

the Maven installation at $M2_HOME/conf/settings.xml
the user's local settings: ${user.home}/.m2/settings.xml

Both can contain the same set of definitions. So, which one to use?

Actually, both locations require the Maven user (your team members) to manually do some configuration in their setting file. This is general not preferred for the known reasons: everybody needs to do this manually, no automatic update if settings change later on, unexpected behaviour when someone forgets to adopt his/her file etc.

The only way to avoid these drawbacks is to avoid all manual editing work, i.e. provide a central version of the settings file that is checked in to your sourcecode management system (SCM). This can't be done for the second option (user's settings) but it can be done for the first (installation settings) – if you put Maven installation under source control.

This sounds a bit strange at first (after all, it's an executable!) but is really clever IMO, for the following reasons:

You make sure everybody uses the same version of Maven; no more Maven version dependencies!
You can use a relative path from your projects to the Maven installation if they are part of the same source repository, for instance in batch files, Eclipse launch configurations etc.
The correct settings are applied automatically and may be updated in the repository without requiring any editing by the developers (except for updating).
Maven installation is only around 2 MB of size, which is not really an issue for any SCM.

The Bottom Line

To summarize, by putting your Maven installation in your SCM as part of your project environment, and using central settings to configure Maven to use your repository manager, you reduce dependency to local environment. And stabilizing your builds is always a good thing!

March 25, 2009

How big is BigDecimal?

Lately, there was a debate in our company about rounding of numbers, more specific on how, when and where to do that.

One of the questions was if a calculation method should return a rounded value, or if the result should be precise and rounded by the caller. Another question was how to represent the values and which functionality to use to actually do the rounding.

There was a suggestion to use BigDecimal objects everywhere instead of simple double types because this class provides convenient methods for doing rounding.

Of course, when you need the higher precision, this might be a great choice. However, when you don't need that and are just using the class for being able to easily use it's rounding capabilities, the solution is probably over-engineered. Well, I voted against that mainly for two reasons: performance and object size.

1) Performance

It's obvious that calculations with primitive data types are faster than with BigDecimals (or BigInteger). But... how much?

A small Java code snippet helps to estimate the performance penalty:

final long iterations = 1000000;
long t = System.currentTimeMillis();
double d = 123.456;
for (int i = 0; i < iterations; i++) {
    final double b = d * (
                       (double)System.currentTimeMillis() 
                     + (double)System.currentTimeMillis());
}
System.out.println("double: "+(System.currentTimeMillis() - t));

t = System.currentTimeMillis();
BigDecimal bd = new BigDecimal("123.456");
for (int i = 0; i < iterations; i++) {
    final BigDecimal b = bd.multiply(
      BigDecimal.valueOf(System.currentTimeMillis()).add(
      BigDecimal.valueOf(System.currentTimeMillis())));
}
System.out.println("java.math.BigDecimal: "+(System.currentTimeMillis() - t));

We are not interested in absolute numbers here, but only in the comparison between double's and BigDecimal's. It turns out that one million operations (each is one multiplication and one addition of a double value) takes approximately 3-4 times longer with BigDecimal than with doubles (on my poor old laptop with Java 5).

Interestingly, when trying the same for BigInteger and long, the factor is approximately 5, i.e. the performance difference is even higher.

With Java 6, the method runs faster for all types, but calculation with primitives has a greater improvement so that the performance penalty for using Big* is even higher: 4-5 for BigDecimal, 6 for BigInteger.

2) Object Size

Everybody would expect that a BigDecimal would need more memory than a primitive double, right? But, how much is it? We are going to have big objects with up to hundreds of decimal values, so the bigger BigDecimal's might sum up to a critical value when thinking of transporting those objects between processes (web service calls) or holding them in the session (for web applications).

It happended that I have blogged about how to determine an object's size in my last post ;-) Hence, we can just move on to the actual figures:

double: 8 bytes

Double: 16 bytes (8 bytes overhead for the class, 8 bytes for the contained double)

BigDecimal: 32 bytes

long: 8 bytes

Long: 16 bytes (8 bytes overhead for the class, 8 bytes for the contained long)

BigInteger: 56 bytes

Wow. It seems that BigDecimal is 4 times as big than double and twice the size of Double – which is not that bad. As before, BigInteger has a bigger penalty with respect to object size as well.

3) Conclusion

All in all, when using BigDecimal instead of double, this means factor 4 for both memory footprint as well as performance penalty. A good reason to not use BigDecimal's just for using the rounding functionality...!

March 23, 2009

Size of Java Objects

I'm sure you know that measuring the size of objects in Java is not so easy since there is no C style sizeof() functionality. Additionally, the actual size used to store an object on the Java heap depends on several variables: the JVM implementation, operation system (32/64 Bit) etc. Hence, a particular value for the amount of storage consumed by an object can be compared to the size of another object, but not between different runtime environments.

So... how can the size of an object (i.e. it's memory usage) be determined? There are actually two possibilities. Both are well-known and not invented by me, so I only provide some basic information and links.

1) Use Runtime.freeMemory()

The usual (old-fashioned) way to estimate the size of an object is like this: call garbage collector (GC) to ensure all unused memory is freed, then count current memory consumption (M1), construct the object, GC once again and count memory (M2). The difference M2-M1 indicates the amount of memory used for the created object.

There are a few things to note:

A single call to GC is more or less only a suggestion to the Java Virtual Machine to reclaim space from all discarded objects – it's no guarantee that GC has been finished (method is not blocking) and all old objects have been removed. To be a bit more aggressive, GC should be used a couple of times.

To make sure that supplementary memory (for static data etc.) is already allocated before starting memory count, you should construct an object and set the handle to null before starting the estimation cycle described above.

The precision might increase when creating not a single object, but a large amount of them.

JavaWorld's Java Tip 130 described this approach, and Heinz Kabutz published two JavaSpecialists newsletters (Issue 29, 78) about determining memory usage in Java.

Addtionally, there is an open source project java.sizeOf at SourceForge using this approach.

2) Use Instrumentation.getObjectSize()

Starting with Java 5 there is a new method to determine object size: the instrumentation interface. It's getObjectSize() method is still an estimate, but seems to provide more accurate results, albeit a bit slower than counting free memory.

In short, you have to implement an instrumentation agent that contains a premain(String, Instrumentation) method that will be called by the JVM on startup. The given instrumentation can be used to call methods on it later on. The agent has to be packaged into a JAR file that requires a Premain-Class specification in it's manifest file. To use the instrumentation agent, call java with the -javaagent option. For more information, see here and this blog post.

Guess what, Heinz Kabutz has published another JavaSpecialists newsletter 142 describing this approach (you see, it's really worth subscribing!). Refer to this java.net article for another example of how to use Java instrumentation.

That's it for today... One more remark: note that both described ways are not able to provide exact figures, but only estimates of memory consumption. This is not really an issue because these estimates are typically exact for small objects, and size of complex data structures can be calculated using the known size of basic types and data structures.

March 20, 2009

Sonar is SO cool!

Do you know Maven? Then you know the project site that can be easily generated with Maven. For instance, look at the site for the Tomcat Maven plugin. It provides information on using the plugin, project related information and – probably most importantly for most "normal" (i.e. non-plugin) projects – the project reports.

You can easily configure Maven to execute a number of useful reports like JavaDoc, Checkstyle, PMD (coding rule verification), CPD (duplicate code detection), and JUnit test coverage. Additionally, you can install custom reports to also participate in project site.

This is great, but still lacks some features:

What if you would like to see overall code quality, without having to consult several detailed reports? Just a single, combined indicator?

With the reports, it's not always easy to drill down a particular issue up to source code level.

It would be nice to be able to access historic versions and compare quality between them to early recognize trends, wouldn't it?

All this (and more) is provided by Sonar (http://sonar.codehaus.org/), an open source tool that "enables to collect, analyze and report metrics on source code. Sonar not only offers consolidated reporting on and across projects throughout time, but it becomes the central place to manage code quality."

Sonar collects data provided by well-known Maven reports, stores them into a database, and provides a modern, fast and convenient user interface to browse the projects and quality metrics, and to drill down from project to Java code level.

Just look at the screenshot of an internal test project... how cool is that?

Installation is as simple as it can be. You can use Sonar with provided Jetty or install it in your existing container. For production, you can switch from embedded Derby database to a "real" database (like MySQL, Oracle, SqlServer, ...). Of course, you can adjust the rules to be checked or import your existing configuration (for Checkstyle or PMD).

To send data to Sonar, you just execute a Maven command to call a particular sonar-maven-plugin for your project:

mvn clean install org.codehaus.sonar:sonar-maven-plugin:1.6:sonar

That's it... now watch all the magic going on.

This Maven goal can be called manually, but is best integrated with nightly builds. To further simplify this, there is a Sonar plugin for Hudson, my favorite continuous integration engine. Using this nice plugin, configuration of a Job to connect to Sonar is as simple as clicking a checkbox in post-build section!

Sonar is so great I really wonder why I didn't find this tool earlier – current version is 1.6, so it must have been out for a while... You definitely should give it a try!

March 19, 2009

Maven Profiles: Activation... or not

I love Maven. Really, I do. I should say that since this is my first post in my own blog (I know, I'm probably the last man on the planet... ;o) Having said this, here is another annoyance. There are lots of them, inside and outside of Maven, and I'm going to share some of my experiences here.

This time, it's about Profiles. I'm not going to explain the basics (see Introduction to Build Profiles or the chapter on profiles in Maven: The Definitive Guide, for instance).

I wanted to use profiles to specify some re-usable configuration (for code generation out of oAW models, but that doesn't matter) in a parent POM, without having to repeat all the stuff in child POMs. The profile definition in parent POM consists of a plugin configuration along with required dependencies and looks like this:

<profile>
  <id>adsl</id>
  <build>
    <plugins>
      <plugin>
        <groupId>org.fornax.toolsupport</groupId>
        <artifactId>fornax-oaw-m2-plugin</artifactId>
        ... all the execution and configuration details ...
      </plugin>      
    </plugins>
  </build>
  <dependencies>
    <dependency>
      <groupId>com.fja.ipl.adsl</groupId>
      <artifactId>adsl</artifactId>
      <version>${ipl.current.version}</version>
    </dependency>
    ... and some more ...
  </dependencies>
</profile>

Then I tried to activate the profile in those child POMs that should use it since they are based on an oAW model. Note that I wanted to activate the profile in the POM itself, not throug settingings or command line options. And this is where the pain begins...

First attempt was to just activate the profile in a child POM like this:

<profile>
  <id>adsl</id>
  <activation>
    <activeByDefault>true</activeByDefault>
  </activation>
</profile>

However, instead of activating the inherited profile with the given id, Maven just overrides the profile from parent POM with a new (and empty) one. Okay, this might be the desired behaviour.

So, next attempt was to use the activation on a present file: whenever there is a workflow file src/main/resources/generateAll.oaw, the profile should be active. This is obvious, since oAW generation uses this workflow file anyways. Parent POM now looks like this:

<profile>
  <id>adsl</id>
  <activation>
    <file>
      <exists>src/main/resources/generateAll.oaw</exists>
    </file>
  </activation>
  ...
</profile>

This works great... almost. When dealing with multi-module projects, it depends on from where you start the Maven build. For instance, let's assume project P aggregates modules M1 and M2. When you step down to M1 and start Maven, the file can be found in context of M1 and the profile is activated, so everything is just fine. Same is true for M2. However, when the build is started for project P, the file is searched in P's context where it can be found and hence the build of M1 and M2 fails.

Sounds not that complicated... let's just use the ${basedir} property:

<exists>${basedir}/src/main/resources/generateAll.oaw</exists>

Well, it appears that this property is not expanded at all (see MNG-1775). Thus, this doesn't work either.

So, we're stuck. I don't have any other idea (do you?)...

This might not be the typical use case for profiles, which are meant to modify the POM at build time to support environment/platform specifics. Agreed. However, as long as there is no other way of defining complex configuration in parent POM that should be reused in some (not all) child POMs, this would have been a useful approach, IMHO.

Now, we end up in defining the plugin configuration in <pluginManagement> section and to enable it and add required dependencies in child POMs. That is, we have to repeat the following section in every single child project that want to harness oAW generation:

<build>
  <plugins>
    <plugin>
      <groupId>org.fornax.toolsupport</groupId>
      <artifactId>fornax-oaw-m2-plugin</artifactId>
    </plugin>
  </plugins>
</build>
<dependencies>
  <dependency>
    <groupId>com.fja.ipl.adsl</groupId>
    <artifactId>adsl</artifactId>
    <version>${ipl.current.version}</version>
  </dependency>
  ... and more of them ...
</dependencies>

Cumbersome, redundant, hard to maintain. Maven should do better, really!

Java Moods