Jan 24, 2020

Understanding WebAssembly Text Format - From WTF to WAT

Textualise the bytes in the WASM

WebAssembly enables compiling various languages into binary that runs on JavaScript engine. The compiled WebAssembly binary is size and load time efficient. The binary is optimised at different levels. If you are wondering how to reduce your binary size refer here

The WebAssembly binary module is filled with opcode in hexadecimal format. This makes it nearly impossible for us to read and contextually understand what is happening in a WebAssembly Module.

For most of us minified and transpiled JavaScript is very difficult to read. With the help of source maps browsers show the original source that makes the JavaScript readable and easily debuggable.

Similarly, for WebAssembly, it is nearly impossible for us to understand/ read and infer / debug the compiled, optimised binary code. We have WebAssembly Text Format for representing WebAssembly binary code into somewhat human-readable way.

WebAssembly Text Format

WebAssembly Text Format (or) WAST provides a way to represent the binaries into a S-Expression. This makes it (slightly) easy for us to understand, debug, and evaluate the WebAssembly module.

Some text editors use the WebAssembly Text Format to show the WebAssembly module contents. Browsers or the Node environment use the WebAssembly Text Format to debug (along with source map).

The basic WebAssembly Module is

00 61 73 6d 01 00 00 00  ; Magic Header ; Version

The WebAssembly magic header (that translates into \0asm ) followed by the version that it currently support 1.

The textual representation of the above binary module is

(module )

The module is the basic building block of WebAssembly Module. All the contents of a WebAssembly Module lives inside the module segment.

Does this expression look familiar? You ever worked with LISP (or languages that are inspired by LISP) then it will look very familiar. The expression is called the "s-expression". The expressions are a simplified form of XML. They are concise and don't have much of boilerplate things like XML. They are commonly used in defining a nested list or structured tree. Many research papers on tree-based data structure uses this notation to show it.

In the previous post, we saw the structure of WebAssembly module in binary format. For example, each section starts with a specified section id. In WebAssembly Text Format, this section id is represented with a name. The name provides better readability. The syntax for function expression in WebAssembly Text Format is as follows:

(func <name>? <func_type> <local>* <inst>* )

That is, a simple add function is defined in WebAssembly Text Format:

(func $add (param $lhs i32) (param $rhs i32) (result i32)
    get_local $lhs
    get_local $rhs
    i32.add)

The above code specifies a function. The entire block is wrapped inside the parenthesis.

The function block starts with a func keyword. Then an optional identifier. The identifier can be name or a number. The name is used as an reference and a better readability.

Note that all the name, both the function name and the parameter's name, have been named with a $ prefix. In the binary format, this will be the index number of the segment inside its segment group.

Followed by the function name, we have the type signature. For add function, we have two numbers as input parameters namely $lhs and $rhs. Both of them are of the type i32. The param keyword denotes the parameter.

Then we define the result of the function with its own block. The result have a result keyword followed by the type of the result i32.

In the binary format, the parameters and results are defined via the type section. But in the text format for brevity and easy to understand we will define it in every function.

Then we have a set of instructions.

The first instruction get_local gets the local value of $lhs. Then we fetch the local value of $rhs. Then we add them both with i32.add instruction.

Alt Text

So how does it work?

We have mentioned that the WebAssembly is executed as a stack machine. That is the instructions either push or pop the data (in our case only numbers) in the stack.

When a function is called, it creates an empty value stack, control-flow stack, locals vector.

  • The value stack is where the values are stored and loaded
  • The control-flow stack that holds the information about the label of instructions (including branching instructions), the limit size, return type signature.
  • The locals vector hold all the local variable definitions.

When the get_local $lhs instruction is executed, it gets the value from the locals vector and returns the value. Similarly for get_local $rhs.

Then when the i32.add instruction is called. It performs an add operation and returns the result.

If we want to export this function to the outside world then we can add an export block.

(export <name in which we have to export> (func <function reference>))

Inorder to export the add function.

(export "add" (func $add))

The keyword for the exporting a function is export. Then we define the name of the function exported. In our example, we are exporting the add function with the name "add".

Then we add a block to refer to the function. With a func keyword and followed by the identifer for the function.

Note the name is an optional value in the function section. If the name is not specified, the index of the function is used.

Both function and export section is wrapped inside the module section.

(module
    (func $add (param $lhs i32) (param $rhs i32) (result i32)
        get_local $lhs
        get_local $rhs
        i32.add)
    (export "add" (func $add))
)

The above is a valid WebAssembly module. Congratulations you created your first WebAssembly module. Imagine it as a tree structure. With the module as its root and both function and export are its children.

Well Add is boring let us try to write something more complex than add.


Fibonacci Series generator

This is a simple recursive fibonacci series generator in C.

# Sample code in C
int fib(n) {
    if (n <= 1)
        return 1;
    else
        return fib(n-1)+ fib(n-2);
}

Let us convert this into WebAssembly Text Format.

The function is defined using func block along with its type signature.

(func $fib (param $n i32) (result i32)
    ...
)

So here the fib function takes in a number n as a parameter and returns a number. The function definition follows the same signature as we have in C.

Similar to the add example, we define the parameter with a param keyword followed by an optional name ($n) and the type of the variable.

The return type is specified with result block. The result block contains result keyword followed by the return type.

The WebAssembly do not have in-memory. To handle temporary variables, it has to assign the temporary variables to value and push it into the stack and then retrieve it.

So for checking n<=1 we have to first create a local variable $tmp. To create a local variable use the local block (local $tmp i32).

(local $tmp i32)
i32.const 1
set_local $tmp

(local $tmp i32) is not an instruction. It is part of the function declaration.

Then we create a constant 1 using i32.const 1 instruction.

i32.const 1 returns the value provided.

We then assign the value into the $tmp variable using set_local $tmp. The set_local instruction modifies the value in the locals vector. At the end of execution, the $tmp is 1.

To create a local variable, first, we have to define the variable and then value, finally push the value.

(func $fib (param $n i32) (result i32)
   (local $tmp i32)
    i32.const 1
    set_local $tmp
    .... ; the block of code goes in here
    get_local $tmp
)

We return the $tmp as the output. The get_local instruction gets the value and returns it as result.


What is in the block?

The WebAssembly stack machine is restricted to structured control flow and structure use of stack. The strucutred control flow provides simple and size efficient binary encoding and compilation.

Block is a part of WebAssembly Module that creates a new entry into the control-flow stack. Imagine, block creates a new boundary and operates within the boundary and returns a value.

In the above fib function, we are missing the actual fibonacci implementation. We create a block and define the fibonacci calculation inside it. A block is defined with a keyword block followed by a name to identify the block. i.e.,

block $block
...
end

The end specifies the end of the block. All the block entry should have an end instruction.

In a stack machine, the following instructions are made to check the if condition:

get_local $n
i32.const 2
i32.lt_s
br_if $block

The first instruction returns the value of $n from local vector. The next instruction returms the value 2 to the stack. The instruction i32.lt_s checks for less than condition and returns the output.

why lt_s, because it is less than signed. For unsigned, we can use lt_u.

The br_if instruction operates based on i32.lt_s result. When it is evaluated false, the loop continues. When it is evaluated true, it pops the block from the control flow and returns value of the operands. The br_if block closes over the block $block segment.

Note lt_s returns i32.


Loop

The loop is a special branching instruction. It creates a new label and pushes an entry to the control-flow stack.

WebAssembly Text Format has the loop instruction to do the looping:

i32.const 1
set_local $tmp
loop $loop
.....
end

Assign a value to the $tmp to 1 and start the loop. The loop segment starts with a loop keyword followed by the name of the loop ($loop). The loop is terminated with an end instruction.


Function Calls

Inside the loop, we have to call the fib function for n-1 and n-2 values. To call a function use call <function name or index> instruction.

We will have to first pop the value of $n and then reduce 1 from it. Then call the Fibonacci function.

; inside the loop
get_local $n
i32.const -1
i32.add
call $fib

The call $fib returns an i32. We will add this result to the $tmp and then set the value of $tmp to the new value.

get_local $tmp
i32.add
set_local $tmp

Execute similarly for n-2.

get_local $n
i32.const -2
i32.add

Then we run the loop until the value of $n is greater than 1. If the condition is okay, the loop continues.

tee_local $n
i32.const 1
i32.gt_s
br_if $loop

The tee_local is similar to set_local, but additionally, it returns the value.

Once all the looping is done return the value $tmp.

get_local $tmp

The final fibonacci series using WebAssembly Text Format is:

(func $fib (export "fib") (param $n i32) (result i32)
    (local $tmp i32)
    i32.const 1
    set_local $tmp
    block $B0
      get_local $n
      i32.const 2
      i32.lt_s
      br_if $B0
      i32.const 1
      set_local $tmp
      loop $L1
        get_local $n
        i32.const -1
        i32.add
        call $fib
        get_local $tmp
        i32.add
        set_local $tmp
        get_local $n
        i32.const -2
        i32.add
        tee_local $n
        i32.const 1
        i32.gt_s
        br_if $L1
      end
    end
    get_local $tmp)

Explore further

Raw WebAssembly - Das Surma

WebAssembly Text Reference

Relooper Algorithm


Discussions // 🔸 Hacker Rank

Up Next


யாதும் ஊரே யாவரும் கேளிர்! தீதும் நன்றும் பிறர்தர வாரா!!

@sendilkumarn