Understanding WebAssembly Text Format - From WTF to WAT
WebAssembly enables compiling various languages into binary that runs on JavaScript engine. The compiled WebAssembly binary is size and load time efficient. The binary is optimised at different levels. If you are wondering how to reduce your binary size refer here
The WebAssembly binary module is filled with opcode in hexadecimal format. This makes it nearly impossible for us to read and contextually understand what is happening in a WebAssembly Module.
For most of us minified and transpiled JavaScript is very difficult to read. With the help of source maps browsers show the original source that makes the JavaScript readable and easily debuggable.
Similarly, for WebAssembly, it is nearly impossible for us to understand/ read and infer / debug the compiled, optimised binary code. We have WebAssembly Text Format for representing WebAssembly binary code into somewhat human-readable way.
WebAssembly Text Format
WebAssembly Text Format (or) WAST provides a way to represent the binaries into a S-Expression. This makes it (slightly) easy for us to understand, debug, and evaluate the WebAssembly module.
That's definitely something I wish we'd done better. https://t.co/BC2uf9bKCj
— Alon Zakai (@kripken) January 11, 2020
Some text editors use the WebAssembly Text Format to show the WebAssembly module contents. Browsers or the Node environment use the WebAssembly Text Format to debug (along with source map).
The basic WebAssembly Module is
00 61 73 6d 01 00 00 00 ; Magic Header ; Version
The WebAssembly magic header
(that translates into \0asm ) followed by the version that it currently support 1
.
The textual representation of the above binary module is
(module )
The module
is the basic building block of WebAssembly Module. All the contents of a WebAssembly Module lives inside the module segment.
Does this expression look familiar? You ever worked with LISP (or languages that are inspired by LISP) then it will look very familiar. The expression is called the "s-expression". The expressions are a simplified form of
XML
. They are concise and don't have much of boilerplate things likeXML
. They are commonly used in defining a nested list or structured tree. Many research papers on tree-based data structure uses this notation to show it.
In the previous post, we saw the structure of WebAssembly module in binary format. For example, each section starts with a specified section id
. In WebAssembly Text Format, this section id
is represented with a name
. The name provides better readability. The syntax for function
expression in WebAssembly Text Format is as follows:
(func <name>? <func_type> <local>* <inst>* )
That is, a simple add function is defined in WebAssembly Text Format:
(func $add (param $lhs i32) (param $rhs i32) (result i32)
get_local $lhs
get_local $rhs
i32.add)
The above code specifies a function. The entire block is wrapped inside the parenthesis.
The function block starts with a func
keyword. Then an optional identifier. The identifier can be name or a number. The name is used as an reference and a better readability.
Note that all the name, both the function name and the parameter's name, have been named with a
$
prefix. In the binary format, this will be the index number of the segment inside its segment group.
Followed by the function name, we have the type signature. For add function, we have two numbers as input parameters namely $lhs
and $rhs
. Both of them are of the type i32
. The param
keyword denotes the parameter
.
Then we define the result of the function with its own block. The result have a result
keyword followed by the type of the result i32
.
In the binary format, the parameters and results are defined via the type section. But in the text format for brevity and easy to understand we will define it in every function.
Then we have a set of instructions.
The first instruction get_local
gets the local value of $lhs
. Then we fetch the local value of $rhs
. Then we add them both with i32.add
instruction.
So how does it work?
We have mentioned that the WebAssembly is executed as a stack machine
. That is the instructions either push or pop the data (in our case only numbers) in the stack.
When a function is called, it creates an empty value stack, control-flow stack, locals vector.
- The value stack is where the values are stored and loaded
- The control-flow stack that holds the information about the label of instructions (including branching instructions), the limit size, return type signature.
- The locals vector hold all the local variable definitions.
When the get_local $lhs
instruction is executed, it gets the value from the locals vector and returns the value. Similarly for get_local $rhs
.
Then when the i32.add
instruction is called. It performs an add operation and returns the result.
If we want to export this function to the outside world then we can add an export block.
(export <name in which we have to export> (func <function reference>))
Inorder to export the add function.
(export "add" (func $add))
The keyword for the exporting a function is export
. Then we define the name
of the function exported. In our example, we are exporting the add function with the name "add".
Then we add a block to refer to the function. With a func
keyword and followed by the identifer
for the function.
Note the name is an optional value in the function section. If the name is not specified, the index of the function is used.
Both function
and export
section is wrapped inside the module
section.
(module
(func $add (param $lhs i32) (param $rhs i32) (result i32)
get_local $lhs
get_local $rhs
i32.add)
(export "add" (func $add))
)
The above is a valid WebAssembly module. Congratulations you created your first WebAssembly module. Imagine it as a tree structure. With the module as its root and both function and export are its children.
Well Add
is boring let us try to write something more complex than add
.
Fibonacci Series generator
This is a simple recursive fibonacci series generator in C
.
# Sample code in C
int fib(n) {
if (n <= 1)
return 1;
else
return fib(n-1)+ fib(n-2);
}
Let us convert this into WebAssembly Text Format.
The function is defined using func
block along with its type signature.
(func $fib (param $n i32) (result i32)
...
)
So here the fib
function takes in a number n
as a parameter and returns a number. The function definition follows the same signature as we have in C
.
Similar to the add example, we define the parameter with a param
keyword followed by an optional name ($n
) and the type of the variable.
The return type is specified with result
block. The result
block contains result
keyword followed by the return
type.
The WebAssembly do not have in-memory. To handle temporary variables, it has to assign the temporary variables to value and push it into the stack and then retrieve it.
So for checking n<=1
we have to first create a local variable $tmp
. To create a local variable use the local
block (local $tmp i32)
.
(local $tmp i32)
i32.const 1
set_local $tmp
(local $tmp i32)
is not an instruction. It is part of the function declaration.
Then we create a constant 1 using i32.const 1
instruction.
i32.const 1
returns the value provided.
We then assign the value into the $tmp
variable using set_local $tmp
.
The set_local
instruction modifies the value in the locals vector. At the end of execution, the $tmp
is 1
.
To create a local variable, first, we have to define the variable and then value, finally push the value.
(func $fib (param $n i32) (result i32)
(local $tmp i32)
i32.const 1
set_local $tmp
.... ; the block of code goes in here
get_local $tmp
)
We return the $tmp
as the output. The get_local
instruction gets the value and returns it as result.
What is in the block?
The WebAssembly stack machine is restricted to structured control flow and structure use of stack. The strucutred control flow provides simple and size efficient binary encoding and compilation.
Block is a part of WebAssembly Module that creates a new entry into the control-flow stack. Imagine, block creates a new boundary and operates within the boundary and returns a value.
In the above fib
function, we are missing the actual fibonacci implementation. We create a block and define the fibonacci calculation inside it. A block is defined with a keyword block
followed by a name to identify the block. i.e.,
block $block
...
end
The end
specifies the end of the block. All the block
entry should have an end
instruction.
In a stack machine, the following instructions are made to check the if condition:
get_local $n
i32.const 2
i32.lt_s
br_if $block
The first instruction returns the value of $n
from local vector. The next instruction returms the value 2
to the stack. The instruction i32.lt_s
checks for less than condition and returns the output.
why lt_s, because it is less than signed. For unsigned, we can use lt_u.
The br_if
instruction operates based on i32.lt_s
result. When it is evaluated false
, the loop continues. When it is evaluated true, it pops the block from the control flow and returns value of the operands. The br_if block
closes over the block $block
segment.
Note lt_s returns i32.
Loop
The loop is a special branching instruction. It creates a new label and pushes an entry to the control-flow stack.
WebAssembly Text Format has the loop
instruction to do the looping:
i32.const 1
set_local $tmp
loop $loop
.....
end
Assign a value to the $tmp
to 1 and start the loop. The loop segment starts with a loop keyword followed by the name of the loop ($loop
). The loop is terminated with an end
instruction.
Function Calls
Inside the loop, we have to call the fib function for n-1
and n-2
values. To call a function use call <function name or index>
instruction.
We will have to first pop the value of $n and then reduce 1 from it. Then call the Fibonacci function.
; inside the loop
get_local $n
i32.const -1
i32.add
call $fib
The call $fib
returns an i32
. We will add this result to the $tmp
and then set the value of $tmp
to the new value.
get_local $tmp
i32.add
set_local $tmp
Execute similarly for n-2
.
get_local $n
i32.const -2
i32.add
Then we run the loop until the value of $n
is greater than 1
. If the condition is okay, the loop continues.
tee_local $n
i32.const 1
i32.gt_s
br_if $loop
The tee_local is similar to set_local, but additionally, it returns the value.
Once all the looping is done return the value $tmp
.
get_local $tmp
The final fibonacci series using WebAssembly Text Format is:
(func $fib (export "fib") (param $n i32) (result i32)
(local $tmp i32)
i32.const 1
set_local $tmp
block $B0
get_local $n
i32.const 2
i32.lt_s
br_if $B0
i32.const 1
set_local $tmp
loop $L1
get_local $n
i32.const -1
i32.add
call $fib
get_local $tmp
i32.add
set_local $tmp
get_local $n
i32.const -2
i32.add
tee_local $n
i32.const 1
i32.gt_s
br_if $L1
end
end
get_local $tmp)
Explore further
Ever heard that debugging WebAssembly is hard?
— Lin Clark (@linclark) September 4, 2019
Now you can debug it just the same as native code: with VSCode (or with LLDB directly)
🤯https://t.co/cYUTwMAy3t
Discussions // 🔸 Hacker Rank