Analyzing performance WebAssembly vs Native Code

08.03.2023 — Backend, WebAssembly — 4 min read

Find the full presentation here

JavaScript has been the primary programming language for web applications. However, with the increasing complexity of web applications, JavaScript’s interpreted nature can lead to performance overhead and slow startup times. As a result, solutions that provide both efficiency and security for web applications have been in demand.

While there are various technologies available for running performance-critical code on the web platform, WebAssembly has emerged as a widely adopted solution. It provides a portable, safe, and fast runtime environment for multiple languages and has broad support across major web browsers. In this blog, we will analyze the performance of WebAssembly and its features in detail.

WebAssembly was released in 2017, and all major web browsers now support it. It is a low-level bytecode intended to serve as a compilation target for code written in languages like C, C++, Rust, and Go. It offers portability along with performance and security.

WebAssembly's features include safety, performance, portability, and compact code. WebAssembly provides a sandboxed execution environment for running code in a web browser, which helps protect user data and system integrity by isolating the code from the rest of the system. The low-level code emitted by a C/C++ compiler is optimized ahead-of-time for full machine performance. Portability is essential for code targeting the web to run across all hardware and platform types. Compact code is crucial for reducing load times, saving bandwidth, and improving responsiveness on the web.

According to the paper that introduced WebAssembly, their evaluation on polybenchC benchmarks found that WebAssembly is only 26% slower than Native Code. There have been continuous improvements in WebAssembly implementation, and 15 benchmarks are now within 10% of native performance.

However, PolybenchC benchmarks are not very practical for real-world applications. The authors tried using the SPEC CPU suite of benchmarks, and applications compiled to WebAssembly run slower by an average of 45-55%. The SPEC-CPU benchmark results show a significant speed difference between WebAssembly and native code.

The WebAssembly documentation lists a number of targeted use cases, including simulations, programming language interpreters, virtual machines, POSIX programs, image editing, video editing, image recognition, and image editing. The high performance of WebAssembly on the scientific kernels in PolybenchC does not suggest that it will perform well given a different sort of application.

To address the problem, BROWSIX-WASM was introduced, which is an extension to Browsix that provides system-calls for web apps in JS to run unmodified WebAssembly-compiled Unix applications directly inside the browser. BROWSIX-SPEC is a harness that extends BROWSIX-WASM to allow automated collection of detailed timing and hardware on-chip performance counter information to perform detailed measurements of application performance.

Browsix bridges the gap between conventional operating systems and the browser, enabling programs expecting a Unix-like environment to run directly in the browser. By mapping current browser APIs, such as Web Workers and postMessage, onto low-level Unix primitives, such as processes and system calls, Browsix does this. Browsix only supports JS, not WASM, and uses SharedArrayBuffer for process-kernel communication, which WASM does not support.

In conclusion, WebAssembly has emerged as a widely adopted solution for running performance-critical code on the web platform. Its features include safety, performance, portability, and compact code. While its performance is not yet comparable to native code for all applications, the continuous improvements in WebAssembly implementation make it a promising technology for the future. With the introduction of BROWSIX-WASM and BROWSIX-SPEC, it is possible to run unmodified WebAssembly-compiled Unix applications directly inside the browser, enabling detailed performance measurements of application performance.

SPEC CPU → 55% slower than native code performance.

Reasons:

Poor register allocation
poor instruction selection
extra branches
reserved registers
stack overflow checks
indirect functiona call check

To summarize, the latency was caused due to :

some missing optimizations
design issues inherent to WASM
Restrictions applied by the browser environment.

Evaluation

We use BROWSIX-WASM and BROWSIX-SPEC to evaluate the performance of WebAssembly using three benchmark suites:

SPEC CPU 2006
SPEC CPU2017
PolyBenchC → only for comparison purpose, but do not represent typical workloads

SPEC benchmarks require BROWSIX-WASM to run successfully.

CPU	6-Core Intel Xeon E5-1650 v3 with hyperthreading
RAM	64 GB
Operating System	Ubuntu 16.04 with Linux kernel v4.4.0
Browsers Used	Google Chrome 74.0 and Mozilla Firefox 66.0
Compilation Method	Native code using Clang 4.03 and WebAssembly using BROWSIX-WASM
Number of Executions per Benchmark	5
Reported Metrics	Average running time and standard error
Execution Time Measured	Difference between wall clock time at the start and end of program