Go interfaces deep dive

Jul 23, 2018By Mark Tully

Empty boxes. He really should flatten these… so they can be recycled.

Diving into some Golang interface internals

I was recently doing some memory optimisations on a Go program which contained some large slices of interface references and I thought it would be good to understand exactly what was going on with them. In my case, there were 10,000 slices, each holding 5,120 interface references.

How much memory is that, and where is it being spent?

The interface struct

In Go, an interface reference is a pair of pointers. The first pointer points to the type definition for whatever type of value has been stored, and the second pointer points to the value itself.

Here’s the definition of an iface from the Go runtime

type iface struct {
        tab  *itab
        data unsafe.Pointer
}

On a 64 bit architecture, a pointer is obviously 8 bytes, meaning an interface reference is 16 bytes in size.

So, an individual slice of 5,120 interface references will cost 80KiB. But that’s not the whole story, what if we actually store some values in it?

Storing simple values and dumping them out

When you put a simple value, like a number into an interface, it has to be pointed to by the data member of the iface struct. That means that the value needs to be alloced onto the heap.

Using the unsafe package, it is possible to print out the contents of the iface table, as differently sized values are assigned, which is a good way of confirming understanding of what’s going on.

First, let’s assume we have a simple slice of interface{} that we can use to store references to any value we like.

a := make([]interface{},5)

Like interfaces, slices also have a definition which we can access through cunning use of the unsafe module. In this case, the slice header is defined in the reflect package as reflect.SliceHeader, you can see it here. Its definition is:

type SliceHeader struct {
	Data uintptr
	Len  int
	Cap  int
}

Armed with this knowledge, we can get to the data pointer within the slice and dump the values out as hex to see what is going on, a bit like a poor-mans memory dumping tool.

The data points to uintptr, which is a type which is large enough to hold a pointer for the target architecture, i.e., it is either 4 bytes or 8 bytes for 32 or 64 bit architectures respectively.

As our slice is a slice of 5 interface references, this array of uintptrs will actually be an array of 5 iface structures, each of which has a size equivalent to two uintptrs, meaning we have a total of 10 uintptrs in this Data array.

The following snippet will extract the 4th uintptr from the Data member of the slice, which will therefore be the data field of the 2nd iface struct which has been stored in the slice.

var _uv uintptr
i := 3
sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&a))
dumpedVal := *(*uintptr)(unsafe.Pointer(sliceHeader.Data + unsafe.Sizeof(_uv) * uintptr(i)))

Ok, armed with this trick, we can assign some test values into the slice of interface{} then dump the contents of the slice to see what is going on.

Here’s the complete loop, which is also on the Go playground

package main

import (
	"fmt"
	"unsafe"
	"reflect"
)

var _uv uintptr

func main() {
	fmt.Println("Hello, playground")
	a := make([]interface{},5)
	a[0] = bool(true)
	a[1] = bool(false)
	a[2] = uint16(0x1234)
	a[3] = float64(1.23)
	a[4] = string("test")
	sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&a))
	
	fmt.Printf("size of uintptr = %d\n",unsafe.Sizeof(_uv))
	for i := 0; i<10; i++ {
		dumpedVal := *(*uintptr)(unsafe.Pointer(sliceHeader.Data + unsafe.Sizeof(_uv) * uintptr(i)))
		fmt.Printf("offset 0x%x : 0x%016x",i,dumpedVal)
		if i&1 != 0 && dumpedVal != 0 {
			deref := *(*uint64)(unsafe.Pointer(dumpedVal))
			fmt.Printf(" (*%p = 0x%016x)",unsafe.Pointer(dumpedVal),deref)
			if i == 9 {
				dumpedVal = (uintptr)(deref&0xffffffff)
				deref = *(*uint64)(unsafe.Pointer(dumpedVal))
				fmt.Printf(" (*%p = 0x%016x)",unsafe.Pointer(dumpedVal),deref)
			}
		}
		fmt.Printf("\n")
	}
}

This outputs the following:

Hello, playground
size of uintptr = 4
offset 0x0 : 0x00000000000edde0
offset 0x1 : 0x00000000001738e1 (*0x1738e1 = 0x0807060504030201)
offset 0x2 : 0x00000000000edde0
offset 0x3 : 0x00000000001738e0 (*0x1738e0 = 0x0706050403020100)
offset 0x4 : 0x00000000000ee720
offset 0x5 : 0x0000000000127ee8 (*0x127ee8 = 0xfffffffe00001234)
offset 0x6 : 0x00000000000edee0
offset 0x7 : 0x00000000001280a8 (*0x1280a8 = 0x3ff3ae147ae147ae)
offset 0x8 : 0x00000000000ee620
offset 0x9 : 0x00000000001280b0 (*0x1280b0 = 0x00000004000fe380) (*0xfe380 = 0x6575727474736574)

Let’s break that down. First, I output the size of the uintptr, this is just so I know what architecture Go playground is running on, which in this case is 32-bit.

I then looped through the 10 uintptrs in the slice (remember, 5 iface each of which has 2 pointers, so 10 pointers in total) and printed each value. In the output, each even numbered “offset” output will correspond to the tab member of an iface, and each odd numbered “offset” output which will correspond to the data member of the same iface.

So the first value in the slice was assigned with a[0] = bool(true), so we expect an iface with a pointer to a bool type, and a data value which is pointing to a piece of memory containing a boolean true value (ie 0x1, as bool is 1 byte in Go).

Looking at the output, “offset 0x0” has the value 0x00000000000edde0, which therefore must be a type pointer to the bool type. This is followed by a pointer 0x00000000001738e1 which is our pointer to a boolean value, which we expect to be pointing to a value of true (0x1). For these tab values, I also dereference the memory address and print the 8 bytes I find there, so we can see what the data field is actually pointing to. In this case it prints: (*0x1738e1 = 0x0807060504030201). The last byte of the dereferenced 8 byte memory block is 0x01, so 0x1738e1 points to a byte value of 0x01, i.e. a value of true. No surprises so far.

Following the rest of the output, we see that a[1] = bool(false) results in an iface.tab value of 0x00000000000edde0 (which is the same type pointer, as element 0, as again it’s a bool). The value is (*0x1738e0 = 0x0706050403020100), so this time it’s pointing to a byte value of 0x00, i.e. false.

a[2] = uint16(0x1234) results in (*0x127ee8 = 0xfffffffe00001234), i.e. it points to 0x00001234 as we expect.

a[3] = float64(1.23) results in (*0x1280a8 = 0x3ff3ae147ae147ae). If you take the value 0x3ff3ae147ae147ae and decode it as a IEEE754 double precision floating point value, perhaps by using an online converter you will see that it holds our float value of 1.23 as expected.

Finally, we come to the string a[4] = string("test"). In this case, its value points to (*0x1280b0 = 0x00000004000fe380). That is a Go string struct, which is a 32-bit length value of 0x00000004, next to a 32-bit pointer to a char buffer 0x000fe380. I added a bit of code to dereference this char buffer pointer and that printed out (*0xfe380 = 0x6575727474736574), which if you decode it using an ASCII table is the character sequence “eurttset”, or, more correctly if we account for the little endian byte ordering “testtrue”. Our string is only 4 bytes, so it has the value “true” as expected. The “true” part is actually a neighbouring string which just happened to be next to our string in memory.

A cunning optimisation

You may have noticed that the two data pointers for the boolean true and boolean false value were sequential in memory. That is no coincidence, it is actually an optimisation that Go uses to avoid having to allocate memory on the heap for single byte values which are stored in interfaces. Within the runtime, it has a block of 256 bytes with all the values 0-255 listed in it. When the runtime stores any byte value, such as a bool, into an interface{} then instead of allocating 1 byte on the heap, and storing the value in there and pointing the iface.data at that, it instead skips the alloc and just points the iface.data into this 256 byte table. So that’s why the two addresses of the boolean true and boolean false value are sequential in memory.

Further, if you look at the dump of a boolean value from the output above, (*0x1738e0 = 0x0706050403020100), you will clearly see that not only do 0x00 and 0x01 lie next to each other in memory, but all the numbers up to 0x07 are there too. If we’d kept dumping, we would have uncovered the whole 256 byte static byte table.

What does this mean for the 5,120 element slice?

If you were to store a uint16 at every element in this 5,120 element slice, at the very least, it would take 5,120 * sizeof(iface) + 5,120 * sizeof(uint16) bytes, i.e. 90KiB on a 64-bit architecture.

I say, “at the very least”, as the fact that there is a separate alloc for each uint16 will almost certainly not simply cost the 2 bytes you get out of the alloc, as memory allocators have overhead. They have to track meta data on the alloc, such as how big it is, and often have to “pad” the allocation up to some multiple of 32/64 bits depending on the architecture. I haven’t looked into the Go runtime memory allocator overhead yet (maybe I would be pleasantly surprised), but I would expect this a 2 byte alloc would cost at least 16 bytes on a 64-bit architecture. If this were the case, we could expect to be spending 160KiB on this array, which is a far leap from the 10KiB you may have naively assumed for a 5,120 element uint16 array.

Also, each of these 5,120 tiny allocs is 1 more bit of garbage that the Go garbage collection has to think about when it’s time to garbage collect.

And don’t forget, I had 10,000 of these 5,120 slices in the program I was analysing, so this was starting to add up.

Deep diving into the runtime

So, to recap - when a simple value is assigned to an interface{} variable, a heap alloc is done and the value is copied into the allocated memory. Then the iface struct is filled out with a type pointer and a pointer to the allocated memory holding the value. The only exception is for single byte values, where the memory alloc is skipped, and the data pointer is set to point to the correct value in a shared static byte array instead.

So that’s the theory, and it’s backed up by our memory dumping, so now let’s look at the code.

If you create a new Go file called “interface.go” using the following code:

package main

import (
	"fmt"
)


func main() {
	t := 5
	var myval interface{} = uint16(t)
	fmt.Println(myval)
}

Then compile and disassemble it using:

go build -gcflags="-S" interface.go

You’ll get a bunch of Go intermediate assembly, which if you pick through it you will find the following section (generated from go version go1.10.3 darwin/amd64 in my case):

	0x0000 00000 (interface.go:8)	TEXT	"".main(SB), $80-0
	0x0000 00000 (interface.go:8)	MOVQ	(TLS), CX
	0x0009 00009 (interface.go:8)	CMPQ	SP, 16(CX)
	0x000d 00013 (interface.go:8)	JLS	132
	0x000f 00015 (interface.go:8)	SUBQ	$80, SP
	0x0013 00019 (interface.go:8)	MOVQ	BP, 72(SP)
	0x0018 00024 (interface.go:8)	LEAQ	72(SP), BP
	0x001d 00029 (interface.go:8)	FUNCDATA	$0, gclocals·69c1753bd5f81501d95132d08af04464(SB)
	0x001d 00029 (interface.go:8)	FUNCDATA	$1, gclocals·e226d4ae4a7cad8835311c6a4683c14f(SB)
	0x001d 00029 (interface.go:10)	MOVW	$5, ""..autotmp_2+54(SP)
	0x0024 00036 (interface.go:10)	LEAQ	type.uint16(SB), AX
	0x002b 00043 (interface.go:10)	MOVQ	AX, (SP)
	0x002f 00047 (interface.go:10)	LEAQ	""..autotmp_2+54(SP), AX
	0x0034 00052 (interface.go:10)	MOVQ	AX, 8(SP)
	0x0039 00057 (interface.go:10)	PCDATA	$0, $0
	0x0039 00057 (interface.go:10)	CALL	runtime.convT2E16(SB)
	0x003e 00062 (interface.go:10)	MOVQ	24(SP), AX
	0x0043 00067 (interface.go:10)	MOVQ	16(SP), CX
	0x0048 00072 (interface.go:11)	XORPS	X0, X0
	0x004b 00075 (interface.go:11)	MOVUPS	X0, ""..autotmp_3+56(SP)
	0x0050 00080 (interface.go:11)	MOVQ	CX, ""..autotmp_3+56(SP)
	0x0055 00085 (interface.go:11)	MOVQ	AX, ""..autotmp_3+64(SP)
	0x005a 00090 (interface.go:11)	LEAQ	""..autotmp_3+56(SP), AX
	0x005f 00095 (interface.go:11)	MOVQ	AX, (SP)
	0x0063 00099 (interface.go:11)	MOVQ	$1, 8(SP)
	0x006c 00108 (interface.go:11)	MOVQ	$1, 16(SP)
	0x0075 00117 (interface.go:11)	PCDATA	$0, $1
	0x0075 00117 (interface.go:11)	CALL	fmt.Println(SB)
	0x007a 00122 (interface.go:12)	MOVQ	72(SP), BP
	0x007f 00127 (interface.go:12)	ADDQ	$80, SP
	0x0083 00131 (interface.go:12)	RET
	0x0084 00132 (interface.go:12)	NOP
	0x0084 00132 (interface.go:8)	PCDATA	$0, $-1
	0x0084 00132 (interface.go:8)	CALL	runtime.morestack_noctxt(SB)
	0x0089 00137 (interface.go:8)	JMP	0

Now we don’t need to go through all this disassembly, but the thing to note is the call to runtime.convT2E16. You can see the source code for this in the Go source code, it looks like this:

func convT2E16(t *_type, val uint16) (e eface) {
	var x unsafe.Pointer
	if val == 0 {
		x = unsafe.Pointer(&zeroVal[0])
	} else {
		x = mallocgc(2, t, false)
		*(*uint16)(x) = val
	}
	e._type = t
	e.data = x
	return
}

So this is the method in the Go runtime which converts a uint16 value into an interface structure. Actually, as you might have noticed, it converts it into an eface struct, not the iface struct we were looking at earlier.

type eface struct {
	_type *_type
	data  unsafe.Pointer
}

The eface struct is the “empty interface”, and is used for converting values into interface{} references. It’s pretty much the same as iface except that it just has a type pointer instead of a function table; because the empty interface interface{} has no function table and so doesn’t need the full iface struct to back it. One can convert eface pointers into iface pointers for a given interface by using type assertions, but we won’t go into that in this article.

So, we can see here that by storing this uint16 into an interface{} we have ended up allocating the value on the heap in a 2 pointer wide empty interface struct. You can also see, that there is a little optimisation in the convT2E16 function, which will return a pointer to a common zeroVal if the value being stored is a 0. As 0 is a default value in Go, this little optimisation can potentially save on a bunch of memory allocs for this very common case.

Back to the byte case

So, if we disassemble the uint8 case, do we see the use of the shared static table?

Disassembling:

package main

import (
	"fmt"
)


func main() {
	t := 5
	var myval interface{} = uint8(t)
	fmt.Println(myval)
}

We get:

	0x0000 00000 (interface.go:8)	TEXT	"".main(SB), $72-0
	0x0000 00000 (interface.go:8)	MOVQ	(TLS), CX
	0x0009 00009 (interface.go:8)	CMPQ	SP, 16(CX)
	0x000d 00013 (interface.go:8)	JLS	103
	0x000f 00015 (interface.go:8)	SUBQ	$72, SP
	0x0013 00019 (interface.go:8)	MOVQ	BP, 64(SP)
	0x0018 00024 (interface.go:8)	LEAQ	64(SP), BP
	0x001d 00029 (interface.go:8)	FUNCDATA	$0, gclocals·69c1753bd5f81501d95132d08af04464(SB)
	0x001d 00029 (interface.go:8)	FUNCDATA	$1, gclocals·e226d4ae4a7cad8835311c6a4683c14f(SB)
	0x001d 00029 (interface.go:11)	XORPS	X0, X0
	0x0020 00032 (interface.go:11)	MOVUPS	X0, ""..autotmp_3+48(SP)
	0x0025 00037 (interface.go:11)	LEAQ	type.uint8(SB), AX
	0x002c 00044 (interface.go:11)	MOVQ	AX, ""..autotmp_3+48(SP)
	0x0031 00049 (interface.go:11)	LEAQ	runtime.staticbytes+5(SB), AX
	0x0038 00056 (interface.go:11)	MOVQ	AX, ""..autotmp_3+56(SP)
	0x003d 00061 (interface.go:11)	LEAQ	""..autotmp_3+48(SP), AX
	0x0042 00066 (interface.go:11)	MOVQ	AX, (SP)
	0x0046 00070 (interface.go:11)	MOVQ	$1, 8(SP)
	0x004f 00079 (interface.go:11)	MOVQ	$1, 16(SP)
	0x0058 00088 (interface.go:11)	PCDATA	$0, $1
	0x0058 00088 (interface.go:11)	CALL	fmt.Println(SB)
	0x005d 00093 (interface.go:12)	MOVQ	64(SP), BP
	0x0062 00098 (interface.go:12)	ADDQ	$72, SP
	0x0066 00102 (interface.go:12)	RET
	0x0067 00103 (interface.go:12)	NOP
	0x0067 00103 (interface.go:8)	PCDATA	$0, $-1
	0x0067 00103 (interface.go:8)	CALL	runtime.morestack_noctxt(SB)
	0x006c 00108 (interface.go:8)	JMP	0

If you compare it to the disassembly for earlier, you will see that instead of calling some sort of convT2E8 as you might expect, it instead is loading the effective address (LEA) of runtime.staticbytes+5. Checking the runtime source code again, we see:

// staticbytes is used to avoid convT2E for byte-sized values.
var staticbytes = [...]byte{
	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
	0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
	0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
	0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
	0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
	0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
	0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f,
	0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47,
	0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f,
	0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57,
	0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f,
	0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67,
	0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f,
	0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77,
	0x78, 0x79, 0x7a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f,
	0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
	0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
	0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
	0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
	0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
	0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
	0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
	0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
	0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
	0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
	0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
	0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
	0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
	0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
	0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
	0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff,
}

So indeed, loading from runtime.staticbytes+5 will result in a pointer to the 5th element of this, which holds the value 0x05.

Saving memory

What these little tests have shown is that storing naked types to interface{} references has a hidden memory cost. Indeed, storing any value into an interface reference of any type has a memory cost, as at the very least you will be paying an extra pointer for every reference, compared to what you would have if you just held a pointer to the concrete type instead.

If you commonly work with a lot of interface{} references, perhaps because you have built some sort of general purpose data structure and you have used interface{} to hold the data in it, you can definitely make a memory saving by using the concrete type in place of the interface{}. As Go lacks generics, the only way of doing this is literally to duplicate the code and change the interface{} references to your concrete type, or to use one of the Go code generators which essentially do the same thing for you. This also gives you back compile time type safety, something that is lost when you start throwing around interface{} instead of the true underlying types.

But - once you start copying/pasting code, you introduce code maintenance problems. If you use Go code generators, things are a bit better, but not everyone is familiar with these tools and so you potentially make life harder for future code maintainers / debuggers. So, be sure that it is worth while before making changes.

Another option is to venture into unsafe territory, by using the unsafe module to construct iface structures on the fly from your generic data type structures, thus effectively factoring out the common itab pointer from being stored repeatedly for every value. Needless to say, any future maintainer is going to curse you for doing this (even if it’s you in 6 months time) so be very sure that the memory savings are necessary before even considering this. For “educational entertainment” purposes, I might cover this in a future article.

Conclusion

Interfaces in Go are a great feature, but references to them have a hidden memory cost. If you are wondering where all your memory is going, and you have large slices/arrays/data structures of interface references, hopefully this article will help you understand a little better why the memory is disappearing, and potentially what you can do about it.