Go Type Parsing

Ivan Mladenov


Gopher

"I'm tired of all these unk's!"



It's already difficult to analyze Go binaries: file sizes are huge, the runtime is confusing, strings aren't null-terminated — I could go on. It's even more difficult when a malware developer compiles with an obfuscator such as garble or gobfuscate. As such, tools that undo some of this obfuscation are indispensable to reverse engineers, providing clarity and accelerating analysis. The goal of this article will be to explain how to recover runtime types from a Go binary, and also to serve as a reference to maintain the tools that do so as new Go versions are released. I also recently contributed this type parsing feature to Volexity's GoResolver, so now you can use their tool to recover runtime types.

Though Go is a statically typed language, the language implements runtime reflection, meaning that the program can inspect its own types. I will not dive into how reflection works in Go, but it is the main reason why types need to be available at runtime. However, when a binary is obfuscated or stripped, finding these types can be challenging, and many SRE tools like IDA Pro or Ghidra will not attempt to recover them.


Finding the types

The first step to recovering types in a Go binary is locating a key structure called moduledata (locating this structure in an obfuscated binary is a journey in and of itself, and it is not the focus of this article; if you'd like to see ways to do so, you can check out GoResolver or GoReSym). This structure contains many important offsets inside the binary. Here's a snippet of it:

// go/src/runtime/symtab.go
type moduledata struct {
    /* snip */
    types, etypes         uintptr
    rodata                uintptr
    gofunc                uintptr // go.func.*

    textsectmap []textsect
    typelinks   []int32 // offsets from types
    itablinks   []*itab
    /* snip */
}

This structure is crucial because once we have the types address and the typelinks offset array, we can recover every runtime type in the binary, even if it's obfuscated. With that in mind, let's take a look at the Go runtime (all the information from here on out will be in the Go 1.24 release branch, unless specified otherwise).


Recovering the type information

The biggest hurdle here was figuring out where the types are actually written to the binary; looking at the entry point for the Go compiler was a good start:

// go/src/compile/gc/main.go
func Main(archInit func(*ssagen.ArchInfo) {
    /* snip */
    for nextFunc, nextExtern := 0, 0; ; {
        reflectdata.WriteRuntimeTypes()
    /* snip */
}

Perfect! This is a really good start. If we follow the reflectdata package to this function:

// go/src/cmd/compile/internal/reflectdata/reflect.go
func WriteRuntimeTypes() {
    // Process signatslice. Use a loop, as writeType adds
    // entries to signatslice while it is being processed.
    for len(signatslice) > 0 {
        signats := signatslice
        // Sort for reproducible builds.
        slices.SortFunc(signats, typesStrCmp)
        for _, ts := range signats {
            t := ts.t
            writeType(t)
            if t.Sym() != nil {
                writeType(types.NewPtr(t))
            }
        }
        signatslice = signatslice[len(signats):]
    }
}

To quickly break this function down, signatslice simply maintains a global queue of runtime types that the compiler needs to write to the object file. As it loops, the calls to writeType modify the global queue and populate it with any other types that are associated with the type being written. Also, if the type is named (some composite types—such as arrays or structs—can be constructed as unnamed type literals), then a pointer to that type is also included in the binary.


The layout of runtime types

Now we can inspect the core logic of writeType to really understand what's happening. This function is huge (roughly 300 lines) so I'll only break down the important parts. Towards the beginning, there's a very helpful comment which breaks down the runtime type layout:

// go/src/cmd/compile/internal/reflectdata/reflect.go
func writeType(t *types.Type) *obj.LSym {
    /* snip */
    // Type layout                          Written by               Marker
    // +--------------------------------+                            - 0
    // | abi/internal.Type              |   dcommontype
    // +--------------------------------+                            - A
    // | additional type-dependent      |   code in the switch below
    // | fields, e.g.                   |
    // | abi/internal.ArrayType.Len     |
    // +--------------------------------+                            - B
    // | internal/abi.UncommonType      |   dextratype
    // | This section is optional,      |
    // | if type has a name or methods  |
    // +--------------------------------+                            - C
    // | variable-length data           |   code in the switch below
    // | referenced by                  |
    // | type-dependent fields, e.g.    |
    // | abi/internal.StructType.Fields |
    // | dataAdd = size of this section |
    // +--------------------------------+                            - D
    // | method list, if any            |   dextratype
    // +--------------------------------+                            - E
    /* snip */
}

Marker 0

Every runtime type has a set of attributes associated with it, including size, name etc. Thus, even primitive types will have a structure defined somewhere in the binary to describe the type and to be used by other types. This structure, formerly known as rtype (which was changed in Go 1.21), is defined as a part of the ABI:

// go/src/internal/abi/type.go
type Type struct {
    Size_       uintptr
    PtrBytes    uintptr
    Hash        uint32
    TFlag       TFlag
    Align_      uint8
    FieldAlign_ uint8
    Kind_       Kind
    Equal       func(unsafe.Pointer, unsafe.Pointer) bool
    GCData      *byte
    Str         NameOff
    PtrToThis   TypeOff
}

This structure is the first step to recovering the runtime types. The fields of this structure will determine what other type information will be associated with this runtime type. We'll make use of them in the next markers.

Marker A

Aside from the primitive types and defined types (like strings), Go has 8 literal types: Array, Struct, Pointer, Function, Interface, Slice, Map, and Channel. Marker A is dedicated for these 8 types. The Kind_ field from above is an enum value, which contains the type kind. So, once we've recovered the type structure from Marker 0, we can check to see if that enum value corresponds to any one of these 8 literal types. If that's the case, that means that the compiler will have written some extra data. I'll use Struct to illustrate what this data might look like:

// go/src/internal/abi/type.go
type StructType struct {
    Type
    PkgPath Name
    Fields  []StructField
}

You may notice the first field doesn't have a name; this is known as an embedded field in Go, meaning that all the information from the type being referenced (in this case Type), will be embedded in the structure. It is exactly the same data that was written in Marker 0! So, in writeType, the code doesn't have to write this information as it would duplicate what was written just above. The data that will be written, however, are the second and third fields. The second field specifies the package path at which the struct was defined, which is just a sequence of bytes. The third field I will discuss again once we hit Marker C.

Marker B

In the case of user-defined types or types that have methods associated with them, the type is referred to as an UncommonType. Recall the TFlag field we collected above; it is also an enum which contains, among all other things, whether the type has uncommon data associated with it. Here's the definition of this structure:

// go/src/internal/abi/type.go
type UncommonType struct {
    PkgPath NameOff
    Mcount  uint16
    Xcount  uint16
    Moff    uint32
    _       uint32  // unused
}

Recovering this structure is super useful because we get to see the package name and also how many methods are associated with this particular type. Also, Moff gives an offset from the type to where the methods are stored, so we can use this offset to recover method information when parsing the type.

Marker C

This marker is particularly special because only 3 of our 8 literal types will use it to store a variable-length array of type-specific data. Those types are Struct for field information, Function for parameter information, and Interface for method information. From the example in Marker A, we had an array of fields:

// go/src/internal/abi/type.go
type StructField struct {
    Name    Name    // name is always non-empty
    Typ     *Type   // type of field
    Offset  uintptr // byte offset of field
}

We can gain some extra insight into how the malware behaves by having the name accessible, or even the type associated with the field. Now I also want to note that having the Typ field is one of the most important steps in recovering the types, which we'll see at Marker E.

Marker D

If you look at dextratype, you'll see that after writing the struct at Marker B, it skips to Marker D and then starts a loop. Inside each iteration of the loop, it writes a structure that describes the methods associated with the uncommon data:

// go/src/internal/abi/type.go
type Method struct {
    Name NameOff // name of method
    Mtyp TypeOff // method type (without receiver)
    Ifn  TextOff // fn used in interface call (one-word receiver)
    Tfn  TextOff // fn used for normal method call
}

Here, the method name can also provide some insight into how a type is being used or manipulated.

Marker E

Although nothing is written at this marker, this step after collecting the type information is key: recursing on the types we discovered. From Marker C and Marker D, we encountered a few types hidden inside the runtime type structures. These are important because writeType doesn't actually keep every type accounted for in the typelinks offsets. The only types that are stored in typelinks are the literal types we discussed above, excluding the interface.