What Does WebAssembly Mean for the Server and GenAI?

Solomon Hykes, one of the co-founders of Docker was quoted saying “If WASM+WASI existed in 2008, we wouldn’t have needed to create Docker. That’s how important it is. WebAssembly on the server is the future of computing.”
In a previous blog post “WASM your way to the cloud” I explored what Wasm meant for cloud native architectures along with some background on the history and when it entered the chat for cloud. Now, I plan to explore why Wasm matters to the server.
This of course can mean many things, including why the footprint and performance of Wasm applications are important, why that matters to Generative AI and how the security and multiarchitecture aspects are key to a happier server. Let’s dive in.
The Fine Qualities of WASM
Wasm offers similar conceptual benefits to containers, Wasm binaries are multilanguage, portable, performant and secure. These characteristics excited the tech industry when containers and microkernels came into the picture in the last decade. Let’s explore why these characteristics for Wasm applications represent a new wave of excitement.
Portability
The portability allows an application to be deployed, moved and transferred across environments, which is critical to today’s multicloud and multicluster architectures. Wasm achieves portability by offering a binary format that allows code to run on a variety of architectures. This is one area where it’s worth distinguishing between Wasm’s portability and container compatibility. With container images, the type of CPU architecture, distribution and version of an operating system matters. For example, if you wanted to build a container for a specific system architecture, you would have to invoke the --platform
flag in the build process and tag it appropriately for each architecture on which you would like to run the container.
Operating systems are also a consideration. For instance, there is no way to run Windows containers on Linux without virtualization because Windows containers need the specific Windows kernel, Hyper-V host or support for host compute service (HCS). This is one of the areas where Wasm tends to shine. Wasm applications are made up of “modules” (see WebAssembly Component Model), and these modules are compiled down to a bytecode (binary format). This binary format makes it much more portable than that of specific languages such as JavaScript, Python or Rust.
Example WASM Application
We will use a Wasm application framework from Fermyon called spin to help demonstrate this. Fermyon recently released SpinKube during KubeCon EU Paris 2024 as a Cloud Native Computing Foundation(CNCF) sandbox project. It can enable much the same use case we will be showing here; check that out separately. Spin takes advantage of the component model along with the Wasmtime runtime. Wasmtime uses the component module along with WebAssembly System Interface (WASI), which allows Wasm applications to run on the server instead of the web. WASI provides access to operating system features like filesystems, networks and more in a POSIX-like way.
This example will show how to build a Rust-based Wasm application producing the .wasm
binary using Rust’s Wasm+WASI target wasm32-wasi
and simply running it both on Ubuntu Linux 23.10 as well as Windows 11 without modification.
> Note, we won’t go into the full Spin and Rust environment here, for more information check out the Spin quickstart or try it yourself in this lab.
First, the Rust application is defined as a simple HTTP handler and response.
Then we add the Rust Wasm+WASI target to the development environment.
Next, build the .wasm
binary.
This produces the .wasm
binary in the target build folder.
Now we can run the Wasm module on the Ubuntu server.
Then, access the application, which serves a simple HTTP response.
Next, upload our Wasm module to the GitHub Packages repo (ghcr).
Now on our Windows 11 Desktop, we can run the Wasm module from PowerShell with one command without any modification across platforms.
I want to be clear; this simplistic example is not entirely dissimilar to running code on Linux and Windows. However, even with this simple example, there was no need for any application or library-specific dependencies to be installed, just Spin, which knows how to run the Wasm binaries since we didn’t need to rebuild it. This example is really to highlight the portability and usefulness of the binary format across Linux and Windows; applications can, of course, become more complex.
Security
Wasm improves the overall security and attack vector footprint by running each Wasm module in its own sandboxed environment isolated from the host runtime. This means a running Wasm application has no visibility to the host operating system outside of how the runtime and any system resources can only be accessed through WASI.
Wasm also provides a linear memory concept that provides a contiguous array of memory with a maximum size. This is an isolated memory region for the Wasm module to use. Misuse of this memory can cause traps (exceptions) to occur that are reported up the stack in the runtime. This provides various memory safety aspects for the isolated region of memory available to the application compared to the memory available in the runtime, which helps obviate, but not eliminate, certain memory safety bugs such as buffer overflows and unsafe pointers. Most of the details are beyond the scope of this article.
Wasm is a fast-moving project and as the spec and implementation grows over time this does not eliminate potential bugs or security concerns in runtimes, however, Wasm’s design for isolation and sandboxes does positively set up the overall security posture.
Performance
Wasm also offers impressive performance, and this is important because even if something were hyper-portable if it weren’t an efficient computing mechanism, it wouldn’t really be that useful. Wasm is inherently more compact than alternatives in browsers such as JavaScript. This is because Wasm is compiled into a binary containing bytecode rather than something like JavaScript, which is interpreted. The result, for example, is that a native Rust application built into a container can be much larger and slower on initial startup than a Rust-based Wasm binary module.
Check out the size difference of this Rust HTTP server Wasm module compared to a container that does the same thing built off the rust:1.77 image.
Rust Wasm module
Using the rust:1.77 container image:
Using the rust:1.77-slim container image:
Using the rust:1.77-slim container image:
There are ways to use multistage builds or scratch images to achieve smaller image sizes to get the size down to tens of MB, however, this is not likely to be the way many developers start. It should be noted that by using these techniques, we could produce a much smaller image below, however, it is still five times larger than the Wasm binary alone
Using the multistage build with rust:alpine container image:
Linear memory also makes memory access generally more efficient along with other aspects like parallel threads to boost CPU consumption. Performance is and will be an important characteristic of applications running in public and private clouds including use cases such as serverless and AI.
What This Means for the Server
Portability, security and performance explain why Linux container technology has thrived in today’s data center architectures, and Wasm can extend these benefits for applications beyond the browser running on the server. Let’s look at a few examples.
Cloud Applications
Wasm applications using frameworks for the server such as WASI, Wasmtime and WasmEdge can benefit from many of the same developer toolchains that containers use as an onramp, as we have shown above. That means there is a natural progression for development teams to start experimenting with Wasm on the server. Using existing cloud native tooling is not the only way Wasm applications can run, but does make it easier for teams developing cloud-based applications with containers and Kubernetes to start implementing Wasm modules. The portability and performance of these Wasm modules can also improve density and start times for applications.
Edge, IoT and Serverless
The form factor of Wasm applications being portable and small makes it ideal for decentralized computational workloads at the edge. Wasm applications take up less space, which is perfect for many edge and Internet of Things (IoT) use cases dealing with limited compute capacity. The performance and fast startup times of Wasm binaries also make it an ideal candidate for serverless computing where cold start can be minimized for serverless functions.
Generative AI
GenAI and its use cases are quickly becoming one of the most focused technologies within most companies and rightfully so. GenAI has many compelling use cases, and there has been a lot of recent work within the cloud native and Kubernetes communities as well as in the enterprise to enable AI workloads for consumers.
If you would like to dig in, check out this design guide as an example.
GenAI has various architectures and components depending on the use case, however many of these stacks share common components.
Two main components of the GenAI stack are models such as Llama 2, Mistral, GPT-4 and others that run on GPUs along with the inferencing components, which handle talking with a model and handling inferencing requests. Inferencing is the operational task that runs data through a model for completing a task.
For instance, if you were to ask a question such as “Tell me a joke about dogs” to OpenAI’s ChatGPT, it takes the question and performs an inferencing step to provide you with a response.
Inferencing steps are one example where Wasm can fit quite nicely. Take this example from Fermyon using its Spin framework where Wasm modules are used in a serverless architecture. The example Wasm application can be defined as:
This shows how Wasm can be used within the Spin framework and cloud. You can read about it here to learn about how taking advantage of the small footprint, performance and security of Wasm means improvements in density, potential cost optimizations and wider use cases for where Wasm modules and AI can run at the edge. Wasm can also be thought of as a universal runtime for the deployment of AI components, making it easy for researchers and developers to run models via Wasm runtimes that are highly portable.
Conclusion
Wasm for the server, in my opinion, is here to stay for the server and will grow for years to come. The portability, performance and security that the industry loves about containers are extended to the architecture and the deployment of WebAssembly applications on the server. With the growing support of Wasm within the cloud native ecosystem toolchain and the obvious connections with use cases like edge, IoT, serverless and GenAI, there is no doubt that Wasm will be important to the server.