Virtual methods in C++ - smalahov.com

Published: 7 July 2025 | By Sergey Malahov

Virtual methods in C++ are well-known and regular, but their machinery is hidden from a regular developer. Most of us write virtual methods having the documentation in mind without knowing the internals which is okay until you get curious and want some fun. So here it is.

The aim of the article is to explore internals of C++ dynamic dispatch mechanism. The purpose, use cases and syntax details of virtual methods are out of the scope. Code examples are artificial and use simplified syntax.

The tools used:

Ubuntu clang version 18.1.3 is used to compile C++ sources, compile options --std=c++17 -g -O0 -fomit-frame-pointer -fno-pic -static
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) is used to analyze assembler code and process memory

Some basics

A regular method call works simple: the compiler knows the object's type at compile time and can use direct static address of the method for the call. Assume we have the following code:

simple_call.cpp

struct Base {
    int Sum(int a, int b) {
        return a + b;
    }
};

int main() {
    Base b;
    return b.Sum(1, 2);
}

After compilation, we can find the call to the regular method Sum in assembler code of main:

asm

Dump of assembler code for function main():
0x0000000000401110 <+0>: push   %rax
0x0000000000401111 <+1>: movl   $0x0,0x4(%rsp)
0x0000000000401119 <+9>: lea    0x3(%rsp),%rdi
0x000000000040111e <+14>: mov    $0x1,%esi
0x0000000000401123 <+19>: mov    $0x2,%edx
0x0000000000401128 <+24>: call   0x401130 <_ZN4Base3SumEii>
0x000000000040112d <+29>: pop    %rcx
0x000000000040112e <+30>: ret
End of assembler dump.

The actual call instruction is call 0x401130 and its target address is static and is part of the instruction opcode, this instruction always calls the same method (same address) and may change only after recompilation of the binary. The arguments 0x01 and 0x02 are passed via %esi and %edx registers. There is one additional argument being passed to the call via %rdi - the address of the object, is actually passed to the method to be used as this to access object non-static data. This is hidden from us for convenience when we write C++ code but under the hood this is passed as a regular argument. We will need this later. Another thing to mention is the size of the objects of class Base:

gdb

(gdb) print sizeof(Base)
$1 = 1

This is regular for an empty object.

Virtual method calls work a bit differently. Let's gradually dive into this magic.

Let's modify our Base making Sum virtual, so the derived classes can override the logic of this method:

virtual_call.cpp

struct Base {
    virtual int Sum(int a, int b) {
        return a + b;
    }
};

struct Derived: Base {
    int Sum(int a, int b) override {
        return a * 2 + b;
    }
};

int main() {
    Derived d;
    return d.Sum(1, 2);
}

Disassembled main changes a bit but not much:

asm

Dump of assembler code for function main():
0x0000000000401110 <+0>: sub    $0x18,%rsp
0x0000000000401114 <+4>: movl   $0x0,0x14(%rsp)
0x000000000040111c <+12>: lea    0x8(%rsp),%rdi
0x0000000000401121 <+17>: call   0x401140 <_ZN7DerivedC2Ev>
0x0000000000401126 <+22>: lea    0x8(%rsp),%rdi
0x000000000040112b <+27>: mov    $0x1,%esi
0x0000000000401130 <+32>: mov    $0x2,%edx
0x0000000000401135 <+37>: call   0x401170 <_ZN7Derived3SumEii>
0x000000000040113a <+42>: add    $0x18,%rsp
0x000000000040113e <+46>: ret
End of assembler dump.

The call itself still uses an absolute address, so nothing has changed here. A bit unexpected but the explanation is simple: the compiler still can compute the call address during compile time, and if it can use it - why not? In fact, we create an instance of Derived d and use this object (of a known type Derived) in the call d.Sum(1, 2). There is no reason for the compiler to do anything more than a direct call. So it does.

But one change still can be named even in this trivial example: a new call call 0x401140 <_ZN7DerivedC2Ev>, and that is a call of default Derived constructor. For some reason the compiler decided to insert a constructor call for the class with no data fields! It must be initializing something? Let's check the size of the Derived:

gdb

(gdb) print sizeof(Derived)
$1 = 8

It has something in there 8 bytes length (the object is not empty anymore as empty objects have 1 byte size). And obviously, this is what gets initialized and that is why the constructor is called. Let's step into its assembler code:

asm

Dump of assembler code for function _ZN7DerivedC2Ev:
0x0000000000401140 <+0>: sub    $0x18,%rsp
0x0000000000401144 <+4>: mov    %rdi,0x10(%rsp)
0x0000000000401149 <+9>: mov    0x10(%rsp),%rdi
0x000000000040114e <+14>: mov    %rdi,0x8(%rsp)
0x0000000000401153 <+19>: call   0x401190 <_ZN4BaseC2Ev>
0x0000000000401158 <+24>: mov    0x8(%rsp),%rax
0x000000000040115d <+29>: lea    0x2c5c(%rip),%rcx        # 0x403dc0 <_ZTV7Derived>
0x0000000000401164 <+36>: add    $0x10,%rcx
0x0000000000401168 <+40>: mov    %rcx,(%rax)
0x000000000040116b <+43>: add    $0x18,%rsp
0x000000000040116f <+47>: ret
End of assembler dump.

We see base constructor call call 0x401190 <_ZN4BaseC2Ev> which is expected, but also the _ZTV7Derived symbol address stored somewhere with lea 0x2c5c(%rip),%rcx # 0x403dc0 <_ZTV7Derived>. Tracking down the path of this symbol we find that:

its address is stored in %rcx register via lea instruction
the value of %rcx (the address of _ZTV7Derived) is increased by 0x10 (16 bytes or two 8-byte pointers);
the value of %rcx is stored at the memory location pointed to by %rax using mov %rcx,(%rax)
in its turn %rax stores the value from memory at 0x8(%rsp) - from a variable of the current stack frame (%rsp is stack pointer register) with 0x8 offset

...and that is where the value from %rdi is stored at the beginning of the constructor call using instruction mov %rdi,0x8(%rsp). As shown above, %rdi is used to pass the actual object address this!

So, the strange 8-bytes hidden field of the object is initialized with some _ZTV7Derived address with a constant offset 0x10. We can get some info about the symbol using its address:

gdb

(gdb) info symbol 0x403dc0
vtable for Derived in section .data.rel.ro of a.out

Getting clearer now, the address loaded into the hidden field inside the Derived instance is the address of a record in VTable (with a small offset) created by the compiler for our class. Every object of class Derived has a pointer to this record.

bash

user@host:~$ objdump -t a.out | grep _ZTV7Derived
0000000000403dc0  w    O .data.rel.ro	0000000000000018              _ZTV7Derived

The table itself has the following data inside:

gdb

(gdb) x/16xg 0x403dc0
                                      .------ VTable start for Derived
                                      |
0x403dc0 <_ZTV7Derived>:        0x0000000000000000      0x0000000000403de8 ------- TI pointer -----.
                                                                                                   |
                                      .-------- This is where the hidden field points to           |
                                      |                                                            |
0x403dd0 <_ZTV7Derived+16>:     0x0000000000401170      0x00007ffff7e6dd28                         |
                                                                                                   |
0x403de0 <_ZTI4Base+8>:         0x000000000040200d      0x00007ffff7e6ebe8 <-----------------------'
0x403df0 <_ZTI7Derived+8>:      0x0000000000402004      0x0000000000403dd8
0x403e00 <_ZTV4Base>:           0x0000000000000000      0x0000000000403dd8
0x403e10 <_ZTV4Base+16>:        0x00000000004011b0      0x0000000000000001
0x403e20:                       0x0000000000000071      0x0000000000000001
0x403e30:                       0x0000000000000080      0x0000000000000001

The value actually stored in the hidden Derived instance field is 0x403dd0 (16 offset), easy to check it (after constructor exists):

gdb

(gdb) print/x *(unsigned long long*)&d
$2 = 0x403dd0

So the hidden field is a VMT (virtual methods table) pointer and points to 0x401170 - the very same address that was used as static address for direct call to Derived::Sum (call 0x401170 <_ZN7Derived3SumEii>). This means that the first record in VMT is the address of the first virtual method Sum.

Another pointer in VTable is worth mentioning: TI pointer (see the picture above), which stands for Type Information and actually points to RTTI (Run-Time Type Information) for the class

The call

VMT info seems redundant, why would the compiler save the address of the method (and, moreover, increase the sizes of all instances) if it can call it directly and actually does so? To find the answer, we need to modify the code a bit, changing the way we refer to the object and its methods:

virtual_ref_call.cpp

struct Base {
    virtual int Sum(int a, int b) {
        return a + b;
    }
};

struct Derived: Base {
    int Sum(int a, int b) override {
        return a * 2 + b;
    }
};

int main() {
    Derived d;
    Base& b = d;
    return b.Sum(1, 2);
}

The only change made is that a reference to Base& is used now to call Sum method trying to fool the compiler so it doesn't know the object type. And according to the assembler code the attempt succeeded:

asm

Dump of assembler code for function main():
0x0000000000401110 <+0>: sub    $0x18,%rsp
0x0000000000401114 <+4>: movl   $0x0,0x14(%rsp)
0x000000000040111c <+12>: lea    0x8(%rsp),%rdi
0x0000000000401121 <+17>: call   0x401150 <_ZN7DerivedC2Ev>
0x0000000000401126 <+22>: lea    0x8(%rsp),%rax
0x000000000040112b <+27>: mov    %rax,(%rsp)
0x000000000040112f <+31>: mov    (%rsp),%rdi
0x0000000000401133 <+35>: mov    (%rdi),%rax
0x0000000000401136 <+38>: mov    $0x1,%esi
0x000000000040113b <+43>: mov    $0x2,%edx
0x0000000000401140 <+48>: call   *(%rax)
0x0000000000401142 <+50>: add    $0x18,%rsp
0x0000000000401146 <+54>: ret
End of assembler dump.

There is no direct call to Sum anymore, it is replaced with indirect call call *(%rax). That is, it calls whatever address is stored in dereferenced %rax at the moment of instruction execution. And this is the key principle of virtual method dispatch, the actual code executed as a result of such call is defined dynamically. %rax content at the moment of call is defined by the next operations:

%rdi (the this register) is initialized at the beginning of main via lea 0x8(%rsp),%rdi and retains its value
%rax is loaded a few lines before call in mov (%rdi),%rax with dereferenced value of %rdi, which means the dereferenced value of this - first record in VMT for class Derived

gdb

(gdb) print/x $pc
$1 = 0x401140
(gdb) p/x $rax
$2 = 0x403dd0
(gdb) info symbol 0x403dd0
vtable for Derived + 16 in section .data.rel.ro of a.out

So in fact Derived::Sum is called via VMT record even if its called via reference to base class Base, that is what virtual stuff exists for. The effective behavior of the object is defined by its actual type and not the way it is referred to. Any reference or pointer to any base sub-object works. Even in the case of the so-called diamond inheritance the mechanism will find a way to get the proper method address and call it

To summarize:

each instance of a class with virtual methods has a hidden VMT pointer to corresponding record in VMT created by the compiler at compilation time
VMT has a record for each virtual method of the class with the addresses of actual method implementation. Overriding a method actually means storing its address in the class's VMT and VMTs of all descendants
calls of virtual methods are indirect via VMT records of the class
compiler still can use direct calls for virtual methods when possible for optimization

Construction and destruction

Previously in the assembler code for Derived constructor, Base constructor was called. What is important is that this call is done before Derived class sets up its VMT pointer. This is because Base wants to setup its own VMT pointer to its own VMT so Base class instances (created directly using Base constructor) use Base virtual methods:

asm

Dump of assembler code for function _ZN4BaseC2Ev:
0x0000000000401180 <+0>: mov    %rdi,-0x8(%rsp)
0x0000000000401185 <+5>: mov    -0x8(%rsp),%rax
0x000000000040118a <+10>: lea    0x2c6f(%rip),%rcx        # 0x403e00 <_ZTV4Base>
0x0000000000401191 <+17>: add    $0x10,%rcx
0x0000000000401195 <+21>: mov    %rcx,(%rax)
0x0000000000401198 <+24>: ret
End of assembler dump.

Base constructor uses the same mechanism Derived does (as shown before) to set up its VMT pointer. What about destruction? Same but in the opposite direction. To illustrate this, the code needs to be modified a bit to force the compiler to call the destructors. Additionally, call to virtual Sum is added into the destructors to force the compiler to add some virtual-related code there:

virtual_dtor_call.cpp

struct Base {
    virtual int Sum(int a, int b) {
        return a + b;
    }

    ~Base() {
        Sum(1,1);
    }
};

struct Derived: Base {
    int Sum(int a, int b) override {
        return a * 2 + b;
    }

    ~Derived() {
        Sum(1,1);
    }
};

int main() {
    Derived d;
    return d.Sum(1, 2);
}

Compacted asm code for Derived destructor:

asm

Dump of assembler code for function _ZN7DerivedD2Ev:
 . . .
0x0000000000401233 <+19>: lea    0x2b56(%rip),%rax        # 0x403d90 <_ZTV7Derived+16>
0x000000000040123a <+26>: mov    %rax,(%rdi)
 . . .
0x0000000000401243 <+35>: mov    $0x1,%edx
0x0000000000401248 <+40>: mov    %edx,%esi
0x000000000040124a <+42>: call   *%rax
 . . .
0x0000000000401256 <+54>: call   0x4012c0 <_ZN4BaseD2Ev>
 . . .
End of assembler dump.

For Base class:

asm

Dump of assembler code for function _ZN4BaseD2Ev:
 . . .
0x00000000004012c9 <+9>: lea    0x2b00(%rip),%rax        # 0x403dd0 <_ZTV4Base+16>
0x00000000004012d0 <+16>: mov    %rax,(%rdi)
 . . .
0x00000000004012d9 <+25>: mov    $0x1,%edx
0x00000000004012de <+30>: mov    %edx,%esi
0x00000000004012e0 <+32>: call   *%rax
 . . .
End of assembler dump.

A common pattern here, similar to what we have seen in constructors: each destructors reloads object's VMT pointer according to its class. This fact leads us to an important conclusion: construction and destruction code is always executed in the context of current class's VMT address (and not the most derived). A kind of a lifeline of the object's VMT pointer can be drawn as that:

code

Method     | <-- Creation --> | <-- Object usage --> | <-- Destruction --> |   VMT pointer
           |  Base  | Derived |                      |  Derived  |  Base   |
                              .                      .
Base        ────────┐         .                      .           ┌─────────   <_ZTV4Base+16>`
virtual             │         .                      .           │
                    │         .                      .           │
Derived             └────────────────────────────────────────────┘            <_ZTV7Derived+16>
override                      .                      .

This may seem a bit strange at first glance and still makes sense: imagine Base::~Base() using VMT pointer of Derived. Calls to virtual methods during destruction would call overridden implementations (Derived::Sum() for example). But overridden methods of Derived were written supposing that all the Derived resources (variables, dynamically allocated memory or whatever was created in Derived constructor) are available. But this is not true during Base destructor execution as it is executed after Derived destructor (that could have destroyed something required for normal execution of Derived methods)! So the only safe way for Base destructor to execute virtual methods is to execute its own implementations (or any of its parents, their destruction code is not executed yet). And that is what is done via VMT pointer manipulations. Same logic applies to the constructors but in the opposite direction: Base constructor just can't safely call Derived implementations as Derived data may not be ready by that time. More explanation here.

Btw, this doesn't completely disable the virtual mechanism during construction or destruction, in the code above the calls to Sum() are still indirect using value from VMT. If we had a virtual method in Base that was not overridden in Derived, Derived destructor would call Base implementation of the method via VMT. This guarantees that the most appropriate and at the same time safe overridden version is called. We can say that during these special periods of the object's lifetime we use limited virtual calls, not deeper than the current sub-object's class being constructed/destructed.

This VMT manipulation may cause abstract method (if any) calls during construction or destruction. Oups...

Not only about call address

Ok, now we understand a bit more about how the appropriate code is selected to be executed on a virtual call. But what's about this for that call? Does the pointer to the object need to be somehow adjusted? Let's take a look at the following code example:

layout.cpp

struct Base {
    long long A = 0x0B0;
    virtual int Sum(int a) {
        return A + a;
    }
};

struct Derived: Base {
    long long B = 0x0D0;
};

struct MoreDerived: Derived {
    long long C = 0x0D1;

    int Sum(int a) override {
        return A + B + C + a;
    }
};

int main() {
    return 0;
}

Here is how memory layout of MoreDerived will look like:

code

.----------------------------------------------------------.
| MoreDerived::VMT | Base::A | Derived::B | MoreDerived::C |
'----------------------------------------------------------'
^                  ^         ^            ^
this               this+8    this+16      this+24

After creation of an instance of MoreDerived the pointer this points at the beginning of the object memory layout. This pointer is suitable for all methods declared in MoreDerived because MoreDerived methods were generated by the compiler keeping in mind the offsets of A, B, C which never change for all objects of type MoreDerived. The same this value is suitable for all methods declared in Derived, because Derived layout is also known at compilation time and is part of MoreDerived layout (it is a sub-object of the most derived object of class MoreDerived). If we decided to create an instance of Derived, we would get the next layout:

code

.-----------------------------------------.
| Derived::VMT     | Base::A | Derived::B |
'-----------------------------------------'
^                  ^         ^
this               this+8    this+16

Obviously the offset of Base::A and Derived::B fields are the same for both layout mentioned above. It's a result of the strict layout rules of non-static data fields inside objects/sub-objects: for every given class (MoreDerived), all parent's classes (Base, Derived) fields are located strictly before the classes fields (&Base::A < &MoreDerived::C, &Derived::B < &MoreDerived::C, &Base::A < &Derived::B). So wherever sub-object of class MoreDerived is met in any object memory layout of any descendant of MoreDerived, it's field guaranteed to be prepended by the parents data fields with the same offsets in the same order. The same this in this case fits to call any method of the class and base classes. Trivial but must be mentioned.

The picture changes significantly when we have multiple inheritance. Assume we have the following code:

multiple.cpp

struct Base1 {
    long int B1 = 0xB1;
    virtual long int Sum1(int a) {
        return B1 + a;
    }
};

struct Base2 {
    long int B2 = 0xB2;
    virtual long int Sum2(int a) {
        return B2 + a;
    }
};

struct Derived: Base1, Base2 {
    long int D = 0x0D0;
    long int Sum2(int a) override {
        return D + B2 + a;
    }
};

int main() {
    Derived d;
    Base2& db = d;
    return d.Sum2(1) + db.Sum2(2);
}

Applying the described above to the layout of class Derived is not possible anymore; the this pointer is not common in this case. Let's review memory layouts of the classes:

code

Derived
.---------------------------------------------------.
| Derived::VMT | Base1::B1 | Base2::B2 | Derived::D |
'---------------------------------------------------'
^                ^           ^           ^
this             this+8      this+16     this+24

Base1
.------------------------.
| Base1::VMT | Base1::B1 |
'------------------------'
^              ^
this           this+8

Base2
.------------------------.
| Base2::VMT | Base2::B2 |
'------------------------'
^              ^
this           this+8

Here the field Base2::B2 in Derived layout has offset 16 while the same field in Base2 layout has offset 8, which means that casting Derived to Base2 will have to change this to be able to call Base2 methods properly. And it does change:

gdb

(gdb) p/x &d
$1 = 0x7fffffffdcb8
(gdb) p/x &db
$2 = 0x7fffffffdcc8

According to what we know so far about virtual calls, the code above cannot work properly. The line return d.Sum2(1) + db.Sum2(2) calls the same implementation of Sum2 (overridden in Derived) but uses different this.

Let's figure this out. d memory layout:

gdb

(gdb) x/8xg &d
0x7fffffffdcb8: 0x0000000000403d68      0x00000000000000b1
0x7fffffffdcc8: 0x0000000000403d88      0x00000000000000b2
0x7fffffffdcd8: 0x00000000000000d0      0x00000000ffffdd80
0x7fffffffdce8: 0x00007ffff782a1ca      0x0000000000000008
(gdb) p sizeof(d)
$2 = 40
(gdb)

Here the data fields with values 0xb1, 0xb2 and 0xd0 are in their expected places, VMT pointer 0x0000000000403d68 in the begin of the layout as expected, but also an unexpected value 0x0000000000403d88 that precedes Base2 field 0xb2:

gdb

(gdb) info symbol 0x0000000000403d88
vtable for Derived + 48 in section .data.rel.ro of a.out

This unexpected value is also a vtable pointer pointing somewhere in Derived vtable. And this is the beginning of the Base2 memory layout inside Derived memory layout (see address):

gdb

(gdb) p/x &db
$6 = 0x7fffffffdcc8

So value stored at 0x7fffffffdcc8 will be used as VMT pointer when calling virtual methods of Base2, redirecting the calls to the real method implementation in Derived. Very similar to what we have seen before but... the Base2 sub-object has its own VMT pointer!

Moving along, examining the pointer:

gdb

(gdb) x/xg 0x0000000000403d88
0x403d88 <_ZTV7Derived+48>:     0x0000000000401270
(gdb) x/xg 0x0000000000401270
0x401270 <_ZThn16_N7Derived4Sum2Ei>:    0x247489f8247c8948
(gdb) info symbol 0x0000000000401270
non-virtual thunk to Derived::Sum2(int) in section .text of a.out

This is the so-called thunk, a piece of code that adjusts this pointer to a proper value for the virtual call:

asm

(gdb) disassemble 0x0000000000401270
Dump of assembler code for function _ZThn16_N7Derived4Sum2Ei:
0x0000000000401270 <+0>:     mov    %rdi,-0x8(%rsp)
0x0000000000401275 <+5>:     mov    %esi,-0xc(%rsp)
0x0000000000401279 <+9>:     mov    -0x8(%rsp),%rdi
0x000000000040127e <+14>:    add    $0xfffffffffffffff0,%rdi
0x0000000000401282 <+18>:    mov    -0xc(%rsp),%esi
0x0000000000401286 <+22>:    jmp    0x4011d0 <_ZN7Derived4Sum2Ei>
End of assembler dump.

Since the address of this piece of code is stored in VMT, it is called instead of actual method. After the adjustment is done the overridden Derived method Sum2 is called using jmp 0x4011d0 <_ZN7Derived4Sum2Ei>.

The adjustment is done in add $0xfffffffffffffff0,%rdi, which means that %rdi (this) is shifted -16 bytes:

gdb

(gdb) p (long long)0xfffffffffffffff0
$10 = -16

And that is exactly the distance between the actual Derived object's this and Base2 sub-object (see memory layout above)! If we take a closer look at Derived VMT we can notice that the same method Sum2 has actually 2 records there:

gdb

(gdb) x/8xg 0x0000000000403d68
                                .---- VMT for Sum1      .---- VMT for Sum2
                                |                       |
0x403d68 <_ZTV7Derived+16>:     0x0000000000401250      0x00000000004011d0
0x403d78 <_ZTV7Derived+32>:     0xfffffffffffffff0      0x0000000000403db0
                                .---- thunk for Sum2
                                |
0x403d88 <_ZTV7Derived+48>:     0x0000000000401270      0x00007ffff7e6dd28
0x403d98 <_ZTI5Base1+8>:        0x000000000040200d      0x00007ffff7e6dd28

One of the records points directly to Derived::Sum2 implementation at 0x00000000004011d0, another one points to the thunk 0x0000000000401270 which after this adjustment calls the same Derived::Sum2 implementation at the same address (jmp 0x4011d0 <_ZN7Derived4Sum2Ei>). Same call of the same address, one is done directly while the other after some preparations.

This way C++ compiler prepares for a virtual call and guarantees that:

It doesn't matter where in classes hierarchy the implementation of a virtual method is located, and how the object is referred to, the call will always receive the correct this value.

Conclusion

Virtuals are fun... and more is to come!