Java and other things: collections

Showing posts with label collections. Show all posts

Sunday, August 26, 2007

Contract of the interfaces

As I mentioned in my previous post titled Don't test everything, unit testing is very valuable in the software development process, but on some occasions the test expects more that it should. In some cases those expectation work fine, in other they fail.

What the hell am I talking about? If you follow the good practice of coding against interfaces, you know that the interface defines the contract all implementation classes must adhere to. If you write an implementation of an interface your unit tests should test possibly everything that this contract specifies and nothing more.

Take a Set from Java Collections Framework for example. This interface is defined as:

A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. As implied by its name, this interface models the mathematical set abstraction.

At some point I had a unit test that had some input values and my test was assuming the correct results in the set that was returned. I knew that my implementation wa returning a HashSet and I also knew what was in that set. The asserts were quite simple (the code is simplified):

String[] addresses = new String[] {"address1", "address2", "address3"};
Set set = new HashSet(Arrays.asList(addresses));
Iterator i = set.iterator();
assertEquals("address1", i.next());
assertEquals("address2", i.next());
assertEquals("address3", i.next());

It's a good test, you may think. Well, it isn't! This test runs or fails depending on which JDK you use. The problem with this test is that it assumes the order of the elements in the returning set. Set interface does not guarantee the order of its elements. We should not test that.

Let's have a look at the following unit test

import junit.framework.TestCase;

import java.util.Set;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Iterator;

public class TestJdkDiff extends TestCase
{
    public void testOrder()
    {
        String javaVersion = System.getProperty("java.version");
        String javaVendor = System.getProperty("java.vendor");
        System.out.println("Running " + javaVendor + " " + javaVersion);

        String[] addresses = new String[] {"address1", "address2", "address3"};
        final Set set = new HashSet(Arrays.asList(addresses));
        for (Iterator i = set.iterator(); i.hasNext();)
        {
            System.out.println(i.next());
        }
    }
}

Execution of the previous test on Java 6 produces the following output

Running Sun Microsystems Inc. 1.6.0_02
address1
address2
address3

The same test executed on Sun's JVM 1.4 prints out

Running Sun Microsystems Inc. 1.4.2_12
address2
address3
address1

Oops! Now I know why my test failed. Even, in JDK the implementation of some classes can change from time to time. As long as the contact is kept all should be fine.

So to fix my initial test one should write

String[] addresses = new String[] {"address1", "address2", "address3"};
Set set = new HashSet(Arrays.asList(addresses));
assertEquals(3, set.size());
assertTrue(set.contains("address1"));
assertTrue(set.contains("address2"));
assertTrue(set.contains("address3"));

Happy coding!

Wednesday, September 27, 2006

First element in the List

Imagine this scenario. You are working on an existing application. There is a framework used. One of the benefits that this framework gives you is retrieving and processing parameters coming from the client, let's say a web application. The framework gives you all parameters as a List. Now in this particular case you always get only one parameter, but is it enclosed in the List. How do you get it out efficiently?

The intended method would be described: Get the list of parameters and if not empty, return first element in the list otherwise return null. The code was looked similar to this:

1 public E getSingleParam(List params) {
2     E param = null;
3     if (params != null && params.size() >= 1) {
4         param = params.iterator().next();
5     }
6 }

This code works fine but it has several points for improvement. First is how the code at line 3 expresses the intention of check whether the list of parameters is not null and not empty, specifically the later. List interface defines boolean isEmpty() method that is in most cases more efficient to run than getting the size and comparing it to 0 (greater than) or 1 (grater than or equal). That is if the list implements an internal flag for its "emptiness" state or has an internal counter as opposed to re-counting its elements. In the worst case scenario the efficiency will be the same as getting the size and comparing it with 0. In that case isEmpty() method is still a nice convenience method to call and should be preferred before the size() > 0 alternative. Another reason is that it speaks for itself. All we need to know is whether there are any elements in the list or not. Getting the size should be used to for other purposes (e.g. calculating the width of the table column when displaying the results in a tabular form – 100% / size()).

1 public E getSingleParam(List params) {
2     E param = null;
3     if (params != null && !params.isEmpty()) {
4         param = params.iterator().next();
5     }
6 }

The second point of making the code to perform better and to make it easier to read is the line 4. On this line we are getting the first element of the list. As shown in the example above, this is an inefficient way as we construct an Iterator only for the purpose of one iteration. Constructing new objects and disposing of them is usually expensive and should be avoided. A better way is to access the element directly, such as:

1 public E getSingleParam(List params) {
2     E param = null;
3     if (params != null && !params.isEmpty()) {
4         param = params.get(0);
5     }
6 }

This way no new objects are constructed (no Iterator). Another difference between the two is the exception that could be possibly thrown. The exception would be thrown if we did not have the previous check or in the case of concurrent modification of the list which is very unlikely in this case. The exception thrown in first case would be a NoSuchElementException. In the second way, an IndexOutOfBoundsException would be thrown. Both of them are runtime exceptions and do not have to be declared. In this case getting one exception or the other should not make any difference.

Sunday, August 13, 2006

Improved Map Iteration

When some one new to Java comes to play, first thing they do, they re-invent the wheel. Such wheel can be a class that is already implemented in java.util package. Yes! Collections and Maps! Later, as you become a senior Java developer you alredy know the narrow places in the "Utils" valley. And yet we can make mistakes.

Such mistake can be the following implementation of the method that calculates the size of the collection.

private Collection myStuff = ...

public int calculateSize() {
   int count = 0;
   for (Iterator it = myStuff.iterator(); it.hasNext(); it.next()) {
       count++;
   }
   return count;
}

This can be easily fixed by

public int calculateSize() {
   return myStuff.size();
}

Another, more frequent example of bad performance is a condition where one wants to know if there are any objects in the collection, but does not really care how many there are.

if (myStuff.size() == 0) {
   // do something
}

Remember size() method may possibly calculate the size and therefore generally is slower than a call to isEmpty() method. Therefore the fix is obvious.

if (myStuff.isEmpty()) {
   // do something
}

Could you imagine that I once worked for a company where the senior developers did not know about Iterators? So instead of

for (Iterator it = myStuff.iterator(); it.hasNext();) {
   Object item = it.next();
   // do something with the item
}

they had

for (int i = 0; i < myStuff.size(); i++) {
   Object item = myStuff.get(i);
   // do something with the item
}

and on top of that myStuff was a Vector, which is synchronized. Very sloooow code!

Nevertheless my point is: Know your collections!

Now back to the topic of improving the Map iteration. Well, it really depends what we need to do at each step of our iteration.

If you need to iterate over the keys of the Map do this

for (Iterator it = myMap.keySet().iterator(); it.hasNext();) {
   Object key = it.next();
   // do something with the key
}

And if you need to iterate over the values of the Map do this

for (Iterator it = myMap.values().iterator(); it.hasNext();) {
   Object value = it.next();
   // do something with the value
}

But what if we need both the key and the value? The usual approach and very bad approach is to get the set of keys, iterate over them and get the value for each key.

for (Iterator it = myMap.keySet().iterator(); it.hasNext();) {
   Object key = it.next();
   Object value = myMap.get(key);
   // do something with the key and the value
}

What is wrong about this? Nothing! You can do it this way and it's perfectly fine. It is just very inefficient as at each iteration a map needs to look up the value for a given key. It is better to iterate over map entries. There is a special interface Map.Entry that can be used to retrieve the key and the value of each entry in the map. So the previous example can be transformed into

for (Iterator it = myMap.entrySet().iterator(); it.hasNext();) {
   Map.Entry entry = (Map.Entry) it.next();
   Object key = entry.getKey();
   Object value = entry.getValue();
   // do something with the key and the value
}

Don't forget to change myMap.get(key) to entry.getValue(), otherwise you are not gaining anything from this modification. I did this recently, I omitted to change it, even worse - I changed it to myMap.get(entry.getValue()). As a result the map was not finding anything... Luckily, we had tests around the class I modified and my stupid mistake was caught early.

Have the tests ready before making changes! Even experienced people make mistakes.