Constructing an unmarshaller using reflection

In this blog we will construct a simple unmarshaller at runtime using reflection and measure its performance.

This blog is part 2 of 4, the series consists of the following posts:

Writing an unmarshaller by hand
Constructing an unmarshaller using reflection
Generating an unmarshaller using annotations
Creating an unmarshaller using bytecode

If you haven’t done so yet, read part 1 first. For a definition of the some of the terminology used here see Phases for generating code.

Writing an unmarshaller for a simple format and a simple structure is quiet easy. However for more complex formats and more complex structures this gets rapidly more complex. So is becomes beneficial to write a library which automates this.

Unlike manual writing the program at code-time when constructing a library to be used at run-time the structure of the bean to be decoded is not known when writing the library. So the library will need to inspect the provided bean and generate code on the fly.

The employee bean is the same as in the first part:

public class Employee {

    private int id;

    private boolean active;

    private String firstName;

    private String lastName;

    private int startYear;

    private String jobTitle;

    /* Getters and setters for all fields */
}

The library will need to extract the elements in order (id, active, firstName, …) as well as their types in order to be able to unmarshall the input.

In Java core reflection the order in which methods defined in a class are returned is undefined. Fortunately the order in which fields are returned is defined as the same in as in which they were declared in the source.

So, given the class of the bean, we extract the declared fields are loop over them:

var fields = clazz.getDeclaredFields();

for (Field field : fields) {
    var name = field.getName();
    var type = field.getType();
}

Now we have the name of the fields and its type. Next we need to determine the name of the corresponding set method or setter. Fortunately this is as easy as capitalizing the field name and putting ‘set’ in front of it:

var sb = new StringBuilder(fieldName.length() + 3);
sb.append("set");
sb.append(Character.toUpperCase(fieldName.charAt(0)));
sb.append(fieldName.substring(1));
return sb.toString();

Finally we need to invoke the correct parse method and use its result to invoke the setter:

Class<?> clazz = Employee.class;

var employee = (Employee) clazz.getDeclaredConstructor().newInstance();

var fields = clazz.getDeclaredFields();

for (Field field : fields) {
    var name = field.getName();
    var type = field.getType();
    var setterName = ReflectionUtils.determineSetter(name);
    var setter = clazz.getDeclaredMethod(setterName, type);
    if (boolean.class.equals(type)) {
        setter.invoke(employee, parser.readBoolean());
    } else if (int.class.equals(type)) {
        setter.invoke(employee, parser.readInteger());
    } else if (String.class.equals(type)) {
        setter.invoke(employee, parser.readString());
    } else {
        throw new IllegalArgumentException("Unknown type");
    }
}

return employee;

Benchmarking this code using JMH gives the following result:

Benchmark             Mode  Cnt     Score    Error   Units
RunTimeBenchmark.benchmark  thrpt   25  1162,564 ± 20,441  ops/ms

For comparison this was the result from the manual written code:

Benchmark                     Mode  Cnt     Score    Error   Units
CodeTimeBenchmark.benchmark  thrpt   25  6249,235 ± 29,212  ops/ms

So when using reflection, we don’t need to write any code by hand, but the performance is more than 5 times slower.

Although that might be acceptable for certain applications, for most this is unacceptable slow.

However there are some things we can try to improve the performance.

First of all the code extracts the fields and setters over and over again, while they do not change. So we can cheat a bit, extract this functionality and move it to deploy-time. Like so:

this.constructor = clazz.getDeclaredConstructor();

List<FieldInfo> list = new ArrayList<>();
for (Field field : clazz.getDeclaredFields()) {
    var name = field.getName();
    var type = field.getType();
    var setterName = ReflectionUtils.determineSetter(name);
    var setter = clazz.getDeclaredMethod(setterName, type);
    list.add(new FieldInfo(type, setter));
}
fields = list.toArray(FieldInfo[]::new);

This builds up an array of FieldInfo object which represent the structure of the bean. FieldInfo is defined as:

static class FieldInfo {
    private Class<?> type;
    private Method setter;

    public FieldInfo(Class<?> type, Method setter) {
        this.type = type;
        this.setter = setter;
    }

    public Class<?> getType() {
        return type;
    }

    public Method getSetter() {
        return setter;
    }
}

With this precompute information we can simplify the actual unmarshaller code:

var employee = (Employee) constructor.newInstance();

for (FieldInfo field : fields) {
    if (boolean.class.equals(field.getType())) {
        field.getSetter().invoke(employee, parser.readBoolean());
    } else if (int.class.equals(field.getType())) {
        field.getSetter().invoke(employee, parser.readInteger());
    } else if (String.class.equals(field.getType())) {
        field.getSetter().invoke(employee, parser.readString());
    } else {
        throw new IllegalArgumentException("Unknown type");
    }
}

return employee;

Running the benchmark again:

Benchmark                 Mode  Cnt     Score    Error   Units
RunTimeBenchmark.deploy  thrpt   25  4335,814 ± 35,485  ops/ms

Shows an almost 4 times improvement over the first attempt. But still 30% slower than the hand written code.

One of the bottlenecks in reflection is that access checks are performed on every invocation.

To improve we can use method handles introduced in Java 8. Acesss checks for method handles are done when looking up (creating) the method handle, rather than every time it is invoked.

Lookup is done through a MethodHandles.Lookup object. Note that this object is associated with the permission of the class that create it. So if you can get hold of a Lookup object from a certain class you can access other objects with the permissions of that class.

Fortunately converting a Method object into a MethodHandle is trivial:

var mh = lookup.unreflect(setter);

And we need to change the definition of FieldInfo:

static class FieldInfo {
    private Class<?> type;
    private MethodHandle setter;

    public FieldInfo(Class<?> type, MethodHandle setter) {
        this.type = type;
        this.setter = setter;
    }

    public Class<?> getType() {
        return type;
    }

    public MethodHandle getSetter() {
        return setter;
    }
}

Running the benchmark gives:

Benchmark                 Mode  Cnt     Score    Error   Units
RunTimeBenchmark.handle  thrpt   25  4907,891 ± 40,642  ops/ms

Which gives a 13% improvement, however this is still 21% slower than the hand written code. That may seem like a lot, and if your code is performance critical is definitely is. But consider what we are doing, a performance penalty of only 20% is actually pretty amazing, come to think of it.

It really is a testament to the skill of everyone working on the JVM JIT compiler that this penalty is so low.

The full code can be found on GitHub. If you can improve on the code, feel free to create a pull request.

In part 3 we will combine the performance of the hand written code with the easy of automatic code generation using annotation processor at compile-time.