Code is generated at many phases during the development & deployment cycle. Everyone is familiar with compile-time at which the compiler generates (byte)code. However code is generated at other times/phases as well. In this post I divide code generation into four phases. I will also discusses some of the benefits and drawbacks of those choosing each of these phases.
For the purpose of this discussion I am explicitly ignoring the (machine) code generated by e.g. C1 or C2 in the HotSpot VM.
The most often overlooked phase at which (source) code is generated is code-time. The time at which the developer actually creates the code. This obviously includes code written manually, but may also include code generated by the IDE e.g. getters and setters, Live Templates, Postfix Completion, etc.
Although often overlocked, most of the code generated is done at code-time, manually, by developers. Over the past decades many initiatives have attempted to reduce the need for writing code by hand. Starting with 4GL in the 1970s and more recently as Low Code.
Generating code at code-time is, my definition, familiar to developers. It is extremely flexible, fast, but also time consuming, expensive, and error-prone. At code-time the exact run-time environment may not be known, so no optimizations can be performed.
The compiler converts source code into bytecode. However it can do much more. As of Java 5 the Annotation Processing Tool or apt is included. Apt allows custom code to hook into the compiler, read the syntax tree, and generate either Java code, or any other code.
A well-known example of compile-time code generation is Lombok, which (ab)uses the annotation processor interface to not only generate code, but modify code as well. Other examples for generating Java code include MapStruct, Immutables, and Dagger.
Besides Java code annotation processor can also be used to generate XSDs (JAXB’s schemagen could have been implemented as an annotation processor, but is not) or Kubernetes manifests (using dekorate).
Writing your own annotation processor is relatively easy, the API is easy to use and well documented, and there are plenty of tutorials available. It is obviously less flexible than manually writing the code, but also cheaper (to use) and less error-prone. Like code-time no assumptions about the run-time environment can be made. And the source code, obviously, needs to be available.
Traditionally deployment time refers to the time at which an Enterprise Application would be deployment into a Java EE application server. Now that we are moving away from traditional application servers towards more flexible approaches such as Spring Boot, Payara Micro, or Quarkus, distinguishing between the deployment phase and the application start-up has become blurred. For this discussion I consider anything between the Java VM starting and the application being ready to fully perform its function as deployment time.
Quarkus is a special case in that it targets GraalVM, a run-time environment which does not allow generating code. Therefore all code needs to be generated when building the (native) image. Quarkus uses Quarkus extensions to generate code for libraries which would normally do so at deployment-time (e.g. Hibernate, Jackson, and RESTEasy). Conveniently Quarkus refers to this phase as deployment-time.
At deployment-time code generation can make assumptions about the run-time environment, this is referred to as the closed world assumption in GraalVM. This is providing no code is dynamically loaded. No source code needs to be available to generate at deployment-time, meaning binaries for which the source code is no (longer) available can also be used as input for code generation. Often no Java compiler is present either, which means that bytecode needs to be generated directly, which makes writing such generators more complex. Although libraries such as ASM and ByteBuddy make generating bytecode easier, it still requires knowledge about how bytecode works.
Code generation at run-time is usually referred to as reflection. Although it may not seem like it, using the Reflection API to read a value or invoke a method can be considered generating code.
It is also possible to generate bytecode at run-time, so not during start-up (deployment-time) but as part of the normal application functionality. However I have personally never seen this done.
Major advantage of code generation at run-time is that it also works when dynamically loading code and that most developers are, at least somewhat, familiar with it. Also no source code of the input code is required. Major disadvantage is that is it considerably slower than any of the other methods.
So which phase should you use? Code-time and run-time are familiar to most developers. However both are expensive. Code-time is expensive in developer hours both to create and to maintain, and run-time in CPU cycles. Compile-time and deployment-time require more knowledge to build the processors, but require less developer hours to use and less CPU cycles to execute, and are therefore cheaper. Compile-time often allows the developer to inspect the source code and more easily understand what it does. Deployment-time does not require source code and therefore can be applied to legacy code or code that you do not own.
If you are not familiar with annotation processing nor bytecode I suggest going for code-time, unless you need to apply the code generation to code you don’t write yourself, in which case go for reflection (run-time). If your team has a sufficient knowledge level I recommend starting with annotation processors for code generation at compile-time. Only when you must generate code based on other code that you do not own I recommend deployment-time.
These are the four phases I divide code generation into. However you may have a scenario which does not fit into my categorization, or have define your phases differently. If you do, please let me know by e-mail or Twitter, I am curious to learn about your views.