Jan 14, 2020

WebAssembly Modules - Sections

Understand the bytes in the WASM

WebAssembly Module

The simplest WebAssembly Module is

00 61 73 6d 01 00 00 00

The first four bytes 00 61 73 6d represent the header, that translates to \0asm. This denotes the asm.js. The next four bytes 01 00 00 00 represent the version. Currently the WebAssembly is in its version 1. Every WebAssembly module has this mandatory header information. Followed by the following sections:

  1. Function
  2. Code
  3. Start
  4. Table
  5. Memory
  6. Global
  7. Import
  8. Export
  9. Data

All the above mentioned sections are optional except for the magic header and version.

The JavaScript engine upon receiving the WebAssembly module, decode and validate the WebAssembly module. The validated modules are then compiled and instantiated. During the instantiation phase, the JavaScript engine produce an instance. The instance is a record that hold all the accessible state of the module. The instance is a tuple of section and its contents.


How WebAssembly module is constructed

The WebAssembly module is split into sections. Each section contains sequence of instructions or statements.

Header Information (MagicHeader Version)
    - function [function definitions]
    - import [import functions]
    - export [export functions]

Each of the section has an unique ID. The WebAssembly module uses this ID to refer the respective function.

Header Information (MagicHeader Version)
    - (function section id)  [function definitions]
    - (import section id)  [import functions]
    - (export section id)  [export functions]

For example, the function section consists of a list of function definition.

Header Information
    - function [add, subtract, multiply, divide]

Inside the module, the function is called using the list index. To call add function, the module refer the function in index 0 of function section.

Section format

WebAssembly module contains a set of sections. In the binary format, each section is in the following structure:

<section id> <u32 section size> <Actual content of the section>

The first byte of every section is its unique section id.

Followed by an unsigned 32-bit integer, that defines the module's size in bytes. Since it is a u32 integer, the maximum size of any section is limited to approximately 4.2 Gigabytes of memory (that is 2^32 - 1).

The remaining bytes are the content of the section. For most of the sections, the <Actual content of the section> is a vector.


Function

The function section have a list of functions. The function section is of the following format:

0x03 <section size> vector<function>[ ]

The unique section id of the function section is 0x03. Followed by an u32 integer, it denotes the size of function section. Vector<function>[ ] holds the list of function.

The WebAssembly module instead of using function names uses the index of the function to call the function. This optimises the binary size.

Every function in the Vector<function> is defined as follows:

<type signature> <locals> <body>

The <type signature> holds the type of the parameters and their return type. The type signature specifies the function signature i.e., type of parameters and return value.

WebAssembly is size optimised. All the type signature used in the module is defined in the type section. Refer more about type section below. The function only uses the index of the type section here.

The <locals> is a vector of values that are scoped inside the function. The function section collates the locals to the parameters that we pass to the function.

The <body> is a list of expressions. When evaluated the expressions should result in the function's return type.

Note the expressions here are not pure always. The globals of the WebAssembly module are mutable and the shared memory is mutable too.

In order to call a function, use $call <function index> (represented by an opcode). The arguments are type validated based on the type signature. Then the local types are inferred. The arguments of the function are then concatenated with the locals.

The expression of the function is then set to the result type defined in the type definition. The expression type is then validated with the signature defined in the type section.

The spec specifies the locals and body fields are encoded separately into the code section. Then in the code section, the expressions are identified by the index.

The order of the types and function sections matters. While hacking on the raw bytecode, proper care should be taken to preserve this order. Refer code section below.


Type

A WebAssembly module with one or more functions starts with a type section.

Everything is strictly typed in WebAssembly. The function should have a type signature attached to it.

To make it size efficient, WebAssembly module creates a vector of type signatures and uses the index in the function section. The type section is of the following format:

0x01 vector<type>[ ]

The unique section id of the type section is 0x01. Followed by the Vector<type>[ ] holds the list of type. Every type in the Vector<type> is defined as follows:

0x60 [vec-for-parameter-type] [vec-for-return-type]

The 0x60 represents the type information is for the functions. Followed by the vector of parameter and return types. The type section also holds the type for values, result, memory, table, global. They are differentiated by the first byte. The type is one of f64, f32, i64, i32. That is numbers. Internally inside the WebAssembly module they are represented by 0x7C, 0x7D, 0x7E, 0x7F respectively.

Note: The type information might change in the future, when WebAssembly starts to support other types.


Code

The code section holds list of code entries. The code entries are pair of value types and Vector<expressions>[ ].

The code-section is of the following format:

0x0A Vector<code>[ ]

Every code in the Vector<code> is defined as follows:

<section size> <actual code>

The <actual code> is of the following format:

vector<locals>[ ] <expressions>

The vector<locals>[ ] here refer to the concatenated list of parameters and local scoped inside the function. The <expression> evaluates to the return type.


Start

The start section is a section in the WebAssembly module which will be called as soon as the WebAssembly module is loaded. The start function is similar to other functions, except that it is not classified into any type. The types may or may not be initialized at the time of its execution. The start section of a WebAssembly module points to a function index (the index of the location of the function inside the function section). The section id of the start function is 8. When decoded the start function represents the start component of the module.

At this moment Webpack, does not support the start section. The start section is rewritten into a normal function call and it is called when the JavaScript is initialised by the bundler itself.


Import section - contains the vector of imported functions.

Export section - contains the vector of exported functions.


Discussions // 🔸 HackerNews

Up Next


யாதும் ஊரே யாவரும் கேளிர்! தீதும் நன்றும் பிறர்தர வாரா!!

@sendilkumarn