Analyzing performance WebAssembly vs Native Code
— Backend, WebAssembly — 4 min read
Find the full presentation here
JavaScript has been the primary programming language for web applications. However, with the increasing complexity of web applications, JavaScript’s interpreted nature can lead to performance overhead and slow startup times. As a result, solutions that provide both efficiency and security for web applications have been in demand.
While there are various technologies available for running performance-critical code on the web platform, WebAssembly has emerged as a widely adopted solution. It provides a portable, safe, and fast runtime environment for multiple languages and has broad support across major web browsers. In this blog, we will analyze the performance of WebAssembly and its features in detail.
WebAssembly was released in 2017, and all major web browsers now support it. It is a low-level bytecode intended to serve as a compilation target for code written in languages like C, C++, Rust, and Go. It offers portability along with performance and security.
WebAssembly's features include safety, performance, portability, and compact code. WebAssembly provides a sandboxed execution environment for running code in a web browser, which helps protect user data and system integrity by isolating the code from the rest of the system. The low-level code emitted by a C/C++ compiler is optimized ahead-of-time for full machine performance. Portability is essential for code targeting the web to run across all hardware and platform types. Compact code is crucial for reducing load times, saving bandwidth, and improving responsiveness on the web.
According to the paper that introduced WebAssembly, their evaluation on polybenchC benchmarks found that WebAssembly is only 26% slower than Native Code. There have been continuous improvements in WebAssembly implementation, and 15 benchmarks are now within 10% of native performance.
However, PolybenchC benchmarks are not very practical for real-world applications. The authors tried using the SPEC CPU suite of benchmarks, and applications compiled to WebAssembly run slower by an average of 45-55%. The SPEC-CPU benchmark results show a significant speed difference between WebAssembly and native code.
The WebAssembly documentation lists a number of targeted use cases, including simulations, programming language interpreters, virtual machines, POSIX programs, image editing, video editing, image recognition, and image editing. The high performance of WebAssembly on the scientific kernels in PolybenchC does not suggest that it will perform well given a different sort of application.
To address the problem, BROWSIX-WASM was introduced, which is an extension to Browsix that provides system-calls for web apps in JS to run unmodified WebAssembly-compiled Unix applications directly inside the browser. BROWSIX-SPEC is a harness that extends BROWSIX-WASM to allow automated collection of detailed timing and hardware on-chip performance counter information to perform detailed measurements of application performance.
Browsix bridges the gap between conventional operating systems and the browser, enabling programs expecting a Unix-like environment to run directly in the browser. By mapping current browser APIs, such as Web Workers and postMessage, onto low-level Unix primitives, such as processes and system calls, Browsix does this. Browsix only supports JS, not WASM, and uses SharedArrayBuffer for process-kernel communication, which WASM does not support.
In conclusion, WebAssembly has emerged as a widely adopted solution for running performance-critical code on the web platform. Its features include safety, performance, portability, and compact code. While its performance is not yet comparable to native code for all applications, the continuous improvements in WebAssembly implementation make it a promising technology for the future. With the introduction of BROWSIX-WASM and BROWSIX-SPEC, it is possible to run unmodified WebAssembly-compiled Unix applications directly inside the browser, enabling detailed performance measurements of application performance.
SPEC CPU → 55% slower than native code performance.
Reasons:
- Poor register allocation
- poor instruction selection
- extra branches
- reserved registers
- stack overflow checks
- indirect functiona call check
To summarize, the latency was caused due to :
- some missing optimizations
- design issues inherent to WASM
- Restrictions applied by the browser environment.
Evaluation
We use BROWSIX-WASM and BROWSIX-SPEC to evaluate the performance of WebAssembly using three benchmark suites:
- SPEC CPU 2006
- SPEC CPU2017
- PolyBenchC → only for comparison purpose, but do not represent typical workloads
SPEC benchmarks require BROWSIX-WASM
to run successfully.
CPU | 6-Core Intel Xeon E5-1650 v3 with hyperthreading |
---|---|
RAM | 64 GB |
Operating System | Ubuntu 16.04 with Linux kernel v4.4.0 |
Browsers Used | Google Chrome 74.0 and Mozilla Firefox 66.0 |
Compilation Method | Native code using Clang 4.03 and WebAssembly using BROWSIX-WASM |
Number of Executions per Benchmark | 5 |
Reported Metrics | Average running time and standard error |
Execution Time Measured | Difference between wall clock time at the start and end of program |