how to build solana programs with solana berkeley packet filter (sbpf) assembly
/Development

How to Write Solana Programs with SBPF Assembly

11 min read

There is currently a parallel race to the bottom for program optimization playing out in the Solana ecosystem. 

At a high level, libraries like Pinocchio are revolutionizing Rust development, achieving orders-of-magnitude improvements in compute efficiency. Meanwhile, at the absolute lowest level, a group of dedicated developers, united by their mutual disrespect of the compiler, are taking it one step further. Instead of writing Solana programs in compiled languages like Rust or C, they direct their focus towards meticulously hand-rolling bytecode to squeeze the maximum performance out of every last instruction.

These low-level gains are only possible when we directly instruct the VM in its native language: sBPF Assembly, Solana's own variant of the extended Berkeley Packet Filter (eBPF), the bytecode used and executed within every single on-chain program.

Writing sBPF assembly gives developers direct access to the lowest level interface of the Solana Virtual Machine. While the Rust compiler and LLVM attempt optimizations, due to insufficient verbosity of language syntax or a lack of sufficient context to make better compilation choices, they often end up generating suboptimal bytecode in comparison to that of a skilled developer with the complete instruction-level control that assembly enables.

While this added level of control comes at the cost of ergonomics, the savings in compute unit usage and binary size (and thus rent) are significant. These savings become especially important in highly contentious, competitive, performance-critical operations.

Simultaneously, there is also an argument to be made that not all programs should be written in assembly.

Although things have improved drastically, historically, the tooling has been limited, and more importantly, performance gains often come with the significant tradeoff of manual verification of correctness and increased audit costs. This is due to a lack of automated tooling and the syntax being more cumbersome to read, write, and understand. 

Conversely, it may also be argued that compiled languages are a black box, obscuring the choices made by the compiler. Thus, the added transparency and control of assembly can often reveal things that are not easily visible when working with compiled languages. In fact, the vast majority of recent performance breakthroughs in our Rust SDKs were actually discovered and informed by realising we could hand-roll more efficient bytecode than the compiler.

In this article, you'll learn:

  • What sBPF Assembly is and how it provides direct control over the virtual machine
  • The evolution from Berkeley Packet Filter to eBPF and why Solana adopted it
  • sBPF's virtual machine architecture, instruction set, and memory model
  • How to set up your development environment and build sBPF programs
  • Step-by-step assembly programming through a practical memo example
  • Essential security considerations when writing low-level code

What is Assembly?

Assembly is a human-readable variant of machine code: the lowest-level programming language that directly corresponds to the instruction set of a CPU or VM.

Instead of variables and functions, assembly operates on registers (fast, temporary storage locations in the CPU), memory addresses (physical locations in RAM or on disk), and fundamental operations such as load (read from memory), store (persist to memory), arithmetic, and jumps (control flow).

Each instruction in assembly maps one-to-one to an equivalent instruction in machine code. 

This one-to-one mapping means programmers control precisely what the processor executes, including which registers hold data, how memory is accessed, and the exact sequence of operations. 

Unlike high-level languages, where a single function might generate dozens of instructions, assembly offers complete transparency and control over the machine's behavior without opaque abstractions.

What is Berkeley Packet Filter (BPF) and eBPF?

Berkeley Packet Filter (BPF) originated in 1992 as a virtual machine for efficiently filtering network packets in Unix kernels. The original BPF used a simple instruction set and register-based architecture that could run sandboxed code safely within the kernel.

Extended Berkeley Packet Filter (eBPF) modernized this concept, expanding from a packet filter into a general-purpose virtual machine. eBPF introduced a 64-bit architecture, more registers, and richer instruction sets, enabling complex programs to run securely in kernel space for networking, security, and system monitoring.

Solana adopted eBPF because it provided a proven, secure execution environment with built-in sandboxing. The sandboxing prevents programs from accessing system resources, crashing nodes, or interfering with other programs, while deterministic execution ensures all validators produce identical results. 

Additionally, the register-based architecture and mature toolchain made it ideal for high-performance on-chain execution, while the existing LLVM backend allowed developers to compile from high-level languages like Rust.

sBPF Virtual Machine Architecture

When a Solana program executes, the runtime loads the sBPF bytecode into memory, performs static verification to ensure safety (checking for infinite loops, invalid memory access, and proper instruction usage), and then executes it within the virtual machine. 

The VM provides a controlled 64-bit execution environment where programs run in complete isolation from the host system and other programs, with all resource access mediated through the runtime.

sBPF Instruction Set Architecture

sBPF operates with eleven 64-bit registers (r0-r10), with r10 serving as a read-only frame pointer, and r0 serving as a return register. 

Instructions follow a consistent format with opcodes specifying operations (arithmetic, logic, memory access, jumps) and operands indicating source/destination registers, offsets, and/or immediate values. 

Key instruction categories include ALU operations (add, subtract, bitwise), memory operations (load/store), and control flow (conditional/unconditional jumps).

sBPF Memory Model

sBPF programs operate within a structured memory layout: a 4KB stack for local variables and function calls, a heap for dynamic allocations, read-only program data containing the bytecode and constants, and account data regions that map to Solana accounts, which the program can access during execution.

All memory access is bounds-checked, and programs cannot access memory outside their designated regions.

Solana Syscalls in sBPF

sBPF programs cannot directly access system resources or perform I/O operations. Instead, they request services through syscalls, which are special instructions that transfer control to the Solana runtime. 

In sBPF assembly, syscalls are invoked using the call instruction and a call symbol that is modified to a call target by the compiler at the time of assembly. Currently, syscalls are invoked via text-based dynamic relocations; a complicated string lookup table system that maps symbols to a 32-bit Mumur3 hash at JIT compilation. However, there is an active proposal to replace this with static syscalls, drastically simplifying calling conventions. When a syscall is invoked, arguments are passed through registers 1 through 5, with register 5 sometimes acting as a stack spill, and return values are written back to r0.

Common syscalls include memory operations (sol_memcpy, sol_memcmp), cryptographic functions (hashing, signature verification), logging, and cross-program invocations.

SBPF Assembly Tutorial 

Set Up Your Environment 

Writing sBPF assembly traditionally required the full Solana toolchain: a bloated, complex, platform-dependent process.

For this reason, Dean Little created the sBPF SDK, providing a complete end-to-end solution for bootstrapping, building, compiling, testing, and deploying sBPF programs.

You can install the SDK on any operating system using Cargo:

Code
cargo install --git https://github.com/blueshift-gg/sbpf.git

Before diving into the code, it is recommended to also install the VS Code sBPF Assembly extension for syntax highlighting, auto-completion, and error detection.

Set Up Your Project 

Create a new project scaffold with:

Code
sbpf init <name_of_the_project>

This creates a project with Mollusk Rust tests.

If you instead want to create a scaffold with TypeScript tests, you can use this to initialize a new scaffold with TypeScript tests:

Code
sbpf init <name_of_the_project> --ts-tests 

Memo sBPF Assembly Example

It's hard to imagine a simpler program than a memo; this is exactly why it makes the perfect introduction to sBPF assembly. 

The program does one thing: takes in whatever instruction data you send it and logs it to the blockchain. No accounts, no complex logic, just pure instruction-level control over the Solana Virtual Machine.

Let's start by examining the complete program:

Code
.equ NUM_ACCOUNTS, 0x00
.equ DATA_LEN, 0x08
.equ DATA, 0x10
.globl entrypoint
entrypoint:
  ldxdw r0, [r1+NUM_ACCOUNTS]
  ldxdw r2, [r1+DATA_LEN]
  add64 r1, DATA
  call sol_log_
  exit

Define your Constants

The program starts by defining three constants that map the structure of the serialized input region of the Solana runtime:

Code
.equ NUM_ACCOUNTS, 0x00 // Account count offset
.equ DATA_LEN, 0x08     // Data length offset  
.equ DATA, 0x10          // Data start offset

These offsets correspond to where the Solana runtime packages instruction data in memory.

When the VM calls our program, it hands us a structured buffer in register r1, and these constants let us navigate that structure. 

Tools like sbpf.xyz can automatically calculate these offsets based upon your account and instruction data layout.

Create the Entrypoint

We then proceed to create the entrypoint and the validation for our program.

Code
.globl entrypoint
entrypoint:
  ldxdw r0, [r1+NUM_ACCOUNTS]

The .globl entrypoint directive tells the linker to make the entrypoint symbol globally visible. The Solana runtime looks for this symbol to know where to start executing your program. Fun fact: While we typically call it entrypoint, it can actually be named anything!

The first instruction uses a clever assembly-specific validation technique: since r0 is our return register, and any return value other than 0  serves as an error code, loading the number of accounts directly into r0 forces this program to exit with a non-zero error code if more than 0 accounts are passed in. 

This allows us to skip manually passing over any inputted accounts to validate the offset of our instruction data in the VM.

Invoke the sol_log Syscall

We can then finally perform the sol_log_ syscall:

Code
ldxdw r2, [r1+DATA_LEN]   ; Load memo length
add64  r1, DATA           ; Point r1 to memo data
call sol_log_             ; Log the memo
exit                      ; Exit with r0 value

The sol_log_ syscall expects the length of the message in r2 and a pointer to the message to log in r1. Luckily, in the serialized input region, the runtime automatically prepends instruction data with a 64-bit length counter. As such, we can invoke the sol_log_ syscall by simply:

  • Loading the value at the DATA_LEN offset into r2
  • Making r1 point to the offset of the start of our instruction data

Once our two registers are pointing to the correct values, we simply invoke the syscall, then exit with whatever value is in r0.

Build and Deploy your Program

You can build your program using SBPF’s own in-built assembler written by Claire Fan; a 5mb Rust binary that replaces the >2gb LLVM toolchain you would have to use with Solana platform tools. 

To run a build, simply execute:

Code
sbpf build

After building your program you can deploy it with:

Code
sbpf deploy

Interact with your Program

To test your program using the scaffolded tests, you can run sbpf test or, if you prefer, you can run the full pipeline with sbpf e2e to build, deploy, and test in one command.

Security Considerations in sBPF Assembly

Writing assembly code means taking full responsibility for security: there's no compiler to catch your mistakes. Every instruction directly affects program safety, making these core security principles essential.

Input Validation

Assembly programs must manually validate all inputs. Always verify account counts, data lengths, and buffer sizes before use.

For example, in our memo program, loading the account count into r0 created automatic validation: if accounts were incorrectly passed, the program would fail. For data processing, check lengths against expected ranges before accessing memory.

Memory Bounds Checking

sBPF provides no automatic bounds checking. Before accessing arrays or buffers, manually verify that your read/write operations stay within allocated boundaries. A simple bounds check before memory access can prevent crashes and data corruption:

Code
jgt r2, MAX_LENGTH, error    # Check if length exceeds limit
ldxb r3, [r1+r2]            # Safe to load if check passes

Register Management

Registers hold critical program state. When calling functions or syscalls, preserve important values by saving them to the stack or other registers. 

The frame pointer (r10) and return values in r0 require special attention: corrupting these can crash your program or create security vulnerabilities.

Arithmetic Safety

Manual overflow detection is crucial for arithmetic operations. Before adding large values, check if the result could exceed 64-bit limits. 

Division operations need explicit zero-checks to prevent runtime errors.

Syscall Parameter Validation

Syscalls expect valid parameters and will fail with invalid inputs. Before calling syscalls, ensure registers contain proper pointers, lengths, and values. Invalid parameters not only cause failures but also consume compute units unnecessarily.

Conclusion

sBPF assembly isn't for everyone, and that's exactly the point. 

Most developers should stick with Rust and let the compiler handle optimizations. But for those optimizing for every last compute unit or building performance-critical infrastructure, assembly offers something no high-level language can: complete control.

We've covered the basics here, from understanding how sBPF fits into Solana's architecture to building your first memo program. 

The memo example might seem trivial, but it demonstrates the core principles you'll use in more complex programs:

  • Direct register manipulation
  • Manual memory management
  • Explicit syscall handling

These gains come at a cost: you're trading safety nets for speed, abstractions for control. You should choose assembly when you've already optimized your Rust code and still need more performance, when you're building infrastructure where every microsecond matters, or when you need to do something the compiler simply can't optimize well. 

For everything else, save yourself the headache and stick with Rust.

The tools are getting better, the community is growing, and the performance benefits speak for themselves. But remember: “with great power comes great responsibility for not breaking things”.

If you want to read additional content about how to use sBPF assembly, go read the Introduction to Assembly Course on Blueshift and test your skills with some of the challenges that are present there!

Related Articles

Subscribe to Helius

Stay up-to-date with the latest in Solana development and receive updates when we post