Abstractions are tricky to get right. In order to be good, they have to be easy enough for programmers to understand, performant enough for the problem at hand, and their utility must ultimately outweigh the cost of implementing (and maintaining) them.
My definition of a “good abstraction” above is the reason why I don’t like getters. In my opinion, they are a useless abstraction. They introduce additional code that tries to solve a non-existent problem. Sure, it’s trivial code, and it can be auto-generated by your IDE. But it’s additional code. And additional code == additional work. Right?
After a recent debate with my teammate, I decided to measure the overhead introduced by getters in a modern HotSpot JVM running on a x86 CPU. Getters may look innocent, but any compiler person knows that they can translate to fairly complicated CPU instructions. I wanted to see if these complex inner workings would translate to degraded performance.
Getters from a compiler’s perspective
Accessing a field of some object is a very basic operation. It boils down to taking some reference to an
object and accessing memory at address reference + field offset
. This operation is natively supported on
most CPU architectures.
Getters wrap field access in a method. Most obviously this introduces a method
call. A method is basically a fancy function that receives a pointer to this
object as its first implicit
argument. This means that in order to invoke a method we need to perform all the same steps involved in calling a function –
push the return address onto the stack, pass the arguments, branch to the function, execute the instructions, pass the
return value, pop the return address from the stack, and branch back to the caller. Quite a ritual.
Functions are a cheap abstraction, but they’re not free. In fact, one of the easiest, yet most impactful optimizations performed by almost all compilers is function inlining.
Finally, we’re working with Java, which means that we’re forced to do virtual method dispatch
almost every time we call an instance method. Why? Well, think about it. Everything is an object, and every
object inherits from at least one other object. When we call foo.bar()
we have to check what foo
’s real type
is. Maybe it’s not Foo
which defines the method bar()
, but a subclass of Foo
that overrides that method.
Designing the benchmark
I wanted to design the benchmark in such a way that field accesses happen in isolation. Also, I wanted to let the optimizer have its way with the field access, but I couldn’t allow it to optimize the access away entirely.
With these requirements in mind, I’ve devised the following procedure:
- Create a large array of wrapper objects that store random integers. These wrappers will expose their integer in different ways, allowing us to compare field access strategies.
- Loop over the array and compute the sum of all integers.
- Display the sum.
These are the wrapper objects that I came up with:
RawWrapper
RawWrapper
is just a simple object that exposes the number through a public final
field.
public final class RawWrapper {
public final int number;
public RawWrapper(int number) {
this.number = number;
}
}
GetterWrapper
GetterWrapper
exposes the number through a getter int getNumber()
.
public final class GetterWrapper {
private final int number;
public GetterWrapper(int number) {
this.number = number;
}
public int getNumber() {
return number;
}
}
ParentGetterWrapper and ChildGetterWrapper
The last two wrappers were designed with a tricky goal in mind. I’ve read before that the HotSpot compiler can un-virtualize method calls, which means that it could make getter accesses just as fast as raw field accesses. I wanted to create a scenario that would prevent the compiler from doing that.
Class ParentGetterWrapper
is a getter wrapper object just like GetterWrapper
. However,
it’s not declared as final
, meaning that it can be extended by other classes.
public class ParentGetterWrapper {
private final int number;
public ParentGetterWrapper(int number) {
this.number = number;
}
public int getNumber() {
return number;
}
}
ChildGetterWrapper
extends ParentGetterWrapper
. It inherits the getter from its parent, but
now the optimizer should be confused as to where getNumber()
is supposed to be coming from.
public class ChildGetterWrapper extends ParentGetterWrapper {
public ChildGetterWrapper(int number) {
super(number);
}
}
The wrappers array was populated such that it contained both objects.
public static ParentGetterWrapper[] generateVirtualGetterWrappers(int benchmarkSize) {
ParentGetterWrapper[] wrappers = new ParentGetterWrapper[benchmarkSize];
Random randomGenerator = new Random();
for (int i = 0; i < benchmarkSize; i++) {
int number = randomGenerator.nextInt();
int coinToss = randomGenerator.nextInt(2);
ParentGetterWrapper wrapper;
if (coinToss == 0) {
wrapper = new ParentGetterWrapper(number);
} else {
wrapper = new ChildGetterWrapper(number);
}
wrappers[i] = wrapper;
}
return wrappers;
}
Analyzing benchmark output
You can find all the code and all the output in this article’s
GitHub repo. All output discussed below is
located under output
.
Output will depend on your JDK, OS, and host CPU architecture. I compiled and ran the benchmark with Temurin OpenJDK 20.0.2 running on a MacBook with an Intel Core i7-9750H.
Bytecode
First, let’s take a look at bytecode that the compiler generated for different benchmark methods.
RawWrapper
This access boiled down to a single getfield
instruction at byte offset 25 in the benchmarkRawWrappers
method.
public static int benchmarkRawWrappers(foo.rida.RawWrapper[]);
descriptor: ([Lfoo/rida/RawWrapper;)I
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=6, args_size=1
0: iconst_0
1: istore_1
2: aload_0
3: astore_2
4: aload_2
5: arraylength
6: istore_3
7: iconst_0
8: istore 4
10: iload 4
12: iload_3
13: if_icmpge 36
16: aload_2
17: iload 4
19: aaload
20: astore 5
22: iload_1
23: aload 5
25: getfield #80 // Field foo/rida/RawWrapper.number:I
28: iadd
29: istore_1
30: iinc 4, 1
33: goto 10
36: iload_1
37: ireturn
GetterWrapper and ParentGetterWrapper
Bytecode generated for both benchmarkGetterWrappers
and benchmarkVirtualGetterWrappers
was absolutely identical.
Unsurprisingly, we got an invokevirtual
at byte offset 25, which performs virtual method dispatch for getNumber()
.
The actual getter inside GetterWrapper
and ParentGetterWrapper
contained a getfield
instruction.
public static int benchmarkGetterWrappers(foo.rida.GetterWrapper[]);
descriptor: ([Lfoo/rida/GetterWrapper;)I
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=6, args_size=1
0: iconst_0
1: istore_1
2: aload_0
3: astore_2
4: aload_2
5: arraylength
6: istore_3
7: iconst_0
8: istore 4
10: iload 4
12: iload_3
13: if_icmpge 36
16: aload_2
17: iload 4
19: aaload
20: astore 5
22: iload_1
23: aload 5
25: invokevirtual #84 // Method foo/rida/GetterWrapper.getNumber:()I
28: iadd
29: istore_1
30: iinc 4, 1
33: goto 10
36: iload_1
37: ireturn
public int getNumber();
descriptor: ()I
flags: (0x0001) ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: getfield #7 // Field number:I
4: ireturn
Machine code
Java bytecode may get compiled by two different compilers at runtime. At first, HotSpot will just interpret the bytecode, which is slow, but bearable for infrequently running code. When HotSpot detects that some method (or some hot loop) is running often, it will compile it to machine code using C1. C1 is a compiler that’s designed to compile fast at the cost of making worse optimizations. Then, if the compiled method continues to run often, HotSpot will compile it using C2. C2 is a slower compiler that optimizes code aggressively.
For brevity I’ll only discuss C2-generated code. I’ll also omit large portions of assembly and focus only on the part
where number
is retrieved from the wrapper and added to the sum
.
RawWrapper
Field access was predictably compiled down to a mov
instruction with an offset. Additionally, HotSpot made an
optimization where it fetched three wrappers in one loop iteration. Optimized assembly is a bit jarring to read, but
you can see the 3 mov
s that load number
into eax
, r11d
, and r10d
. These registers are then added to r13d
that holds the sum.
mov r10d,DWORD PTR [rbx+r14*4+0x10]
add r13d,DWORD PTR [r12+r10*8+0xc] ; implicit exception: dispatches to 0x000000011721afa8
movsxd rsi,r14d
mov r10d,DWORD PTR [rbx+rsi*4+0x14] ;*aaload {reexecute=0 rethrow=0 return_oop=0}
; - foo.rida.SimpleBenchmark::benchmarkRawWrappers@19 (line 76)
mov eax,DWORD PTR [r12+r10*8+0xc] ; implicit exception: dispatches to 0x000000011721afa8
;*getfield number {reexecute=0 rethrow=0 return_oop=0}
; - foo.rida.SimpleBenchmark::benchmarkRawWrappers@25 (line 77)
mov r10d,DWORD PTR [rbx+rsi*4+0x18] ;*aaload {reexecute=0 rethrow=0 return_oop=0}
; - foo.rida.SimpleBenchmark::benchmarkRawWrappers@19 (line 76)
mov r11d,DWORD PTR [r12+r10*8+0xc] ; implicit exception: dispatches to 0x000000011721afa8
;*getfield number {reexecute=0 rethrow=0 return_oop=0}
; - foo.rida.SimpleBenchmark::benchmarkRawWrappers@25 (line 77)
mov esi,DWORD PTR [rbx+rsi*4+0x1c] ;*aaload {reexecute=0 rethrow=0 return_oop=0}
; - foo.rida.SimpleBenchmark::benchmarkRawWrappers@19 (line 76)
mov r10d,DWORD PTR [r12+rsi*8+0xc] ; implicit exception: dispatches to 0x000000011721afa8
;*getfield number {reexecute=0 rethrow=0 return_oop=0}
; - foo.rida.SimpleBenchmark::benchmarkRawWrappers@25 (line 77)
add r13d,eax
add r13d,r11d
add r13d,r10d ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - foo.rida.SimpleBenchmark::benchmarkRawWrappers@28 (line 77)
GetterWrapper
The code is… completely identical to RawWrapper
. HotSpot not only inlined and un-virtualized the access, but
applied the same optimization of fetching three wrappers in one iteration.
ParentGetterWrapper and ChildGetterWrapper
C2 saw right through my charade with phony inheritance and produced the same code as for the other wrappers. It even used the same registers as before which made it feel surprisingly personal.
Confirming results with microbenchmarks
I’m not a big fan of microbenchmarks, but I wanted to double-check the results just to make sure that I wasn’t misunderstanding the assembly. JMH confirmed that average time taken by all three field access approaches were within the margin of error.
Benchmark | Score | Error |
---|---|---|
JmhBenchmark.benchmarkRawWrappers | 311.917ns | ±8.322ns |
JmhBenchmark.benchmarkGetterWrappers | 312.967ns | ±8.474ns |
JmhBenchmark.benchmarkVirtualGetterWrappers | 323.835ns | ±8.074ns |
So there is no overhead at all?
Nope. We always pay a cost for abstractions, whether it’s execution time, compile time, engineer time, or something else. Here are compilation statistics for the benchmark:
Method | Bytecode size | Native size | Compile time |
---|---|---|---|
JmhBenchmark.benchmarkRawWrappers | 38 | 496 | 2ms |
JmhBenchmark.benchmarkGetterWrappers | 38 | 496 | 2ms |
JmhBenchmark.benchmarkVirtualGetterWrappers | 38 | 512 | 3ms |
Notice that C2 generated more code for benchmarkVirtualGetterWrappers
and spent more time doing it? What happened?
benchmarkVirtualGetterWrappers
receives an array of ParentGetterWrapper
objects, which can actually contain
ChildGetterWrapper
instances. Both these classes use the same implementation of getNumber()
, so the compiler
figured it can cheat by inserting a hard reference to that method instead of performing virtual dispatch each time.
It also later figured that it can inline that method, resulting in code identical to raw field access.
However, JVM cannot guarantee that it won’t see a class passed to benchmarkVirtualGetterWrappers
that doesn’t override
getNumber()
in the future. What will it do then? It cannot execute incorrect code for the sake of speed.
HotSpot solves this by inserting “uncommon traps” into generated assembly. If code that violates
optimizer’s view of the world (uncommon code) enters an optimized method, it will trigger this “trap”
and execution will continue from unoptimized, but correct bytecode. In our case, the uncommon trap generated by C2 transfers
execution back to an address that contains virtual dispatch of getNumber()
.
call 0x000000011c906180 ; ImmutableOopMap {}
;*invokevirtual getNumber {reexecute=0 rethrow=0 return_oop=0}
; - foo.rida.SimpleBenchmark::benchmarkVirtualGetterWrappers@25 (line 93)
; {runtime_call UncommonTrapBlob}
Why didn’t HotSpot generate uncommon traps for benchmarkGetterWrappers
? Since GetterWrapper
was declared as final
, HotSpot
reasonably concluded that no other implementation of getNumber()
will ever exist.
Am I going to start using getters now?
I still think that the only good reason to use getters is to follow an existing project’s convention. But convincing myself to use getters wasn’t my goal.
My goal was to explore and share how getters are handled by a modern HotSpot JVM. Hopefully, you’ve now seen that even things that seem trivial to the programmer can be fairly complex under the hood. Oracle hired the best engineers to make HotSpot an excellent tool. But that’s all it is. A tool. And a tool can never reason about the program better than you, the programmer. It’s your responsibility to master your tool and make informed decisions on how to use it well.