
Engineering Velocity: Inside Anza’s Performance Team with Alessandro Decina
Introduction
Techno-optimism is, undoubtedly, Solana’s cultural zeitgeist. The unfettered belief in unrestrained speed, progress, and innovation underlies every pull request that gets shipped to mainnet. The mantra is IBRL: Increase Bandwidth, Reduce Latency. It is equal parts engineering diktat, cultural shibboleth, and secular prayer. If Bitcoin is a cathedral to permanence, and Ethereum is an agora for neutrality, Solana is a racetrack: a realm of measurable, mechanical speed.
Yet this speed never descends from the heavens in a neatly commented diff. It is quarried, sanded, and worked into existence by those who dare to stare at code all day. Few stare harder than Alessandro Decina, originally a GStreamer savant who took stuttering video buffers as a personal affront. Today, he leads Anza’s four-person performance team, a team whose idea of “self-care” is deleting yellow bars from a validator trace at 3:00 a.m. Their day-to-day consists of staring at flame graphs, ripping out entire workflows, and rewriting battle-tested production code because anything that can be optimized eventually will.
I wanted to understand what it means to live at this velocity. So, I sat down with Alessandro Decina himself to see how he continues to make the fastest blockchain in existence even faster. The following interview is an autopsy of speed. We discussed his days working with multimedia pipelines to the cultural guardrails that enable four engineers to out-ship entire organizations.
The conversation has been edited and condensed for clarity and brevity.
Interview
Origins and Worldview
Ichigo: Taking a trip down memory lane, your first love was GStreamer and multimedia pipelines. What were the biggest latency lessons from that era? What did chasing real-time audio and video teach you about shaving milliseconds off of Agave?
Decina: First of all, great job stalking me, haha. I’ve done work on GStreamer for like 15 years or so—maybe a little longer. It’s actually where I learned literally everything I know. I started contributing to the project when it was open source and just getting started. And I was so lucky, I just got close to the founder at the time. The guy was like, I don’t know, 15 years older than me, and he was really good. Like this guy is on Wikipedia—that guy is probably smart. Not like me—I pretend to be smart. The guy just decided, yeah, whatever, I will just teach you everything I know for free. And, so, yeah, I started working on that.
It’s funny now that we talk about low latency while working on Solana, because this is not low latency at all. Like when you’re talking about audio processing, DPSs, and multimedia hardware, the things we do on Solana are super high latency. If anything audio or video had a 400 millisecond response time, it would essentially be broken. That does not work.
At some point, I started working with Linux drivers and hardware that does video and audio encoding and decoding. Many of the things I do today are essentially the same as the things I was doing at that time. Latency when you’re working with hardware means there’s a queue somewhere. You find that queue. You try to minimize it, making it as small as possible, and make sure it never underruns.
For example, the XDP work I’m doing right now is very similar to how audio ring buffers work. Even the call we’re having right now, you have a bunch of packets arriving out of order. There’s a ring buffer somewhere reordering all the packets, and you definitely want to make sure that doesn’t underrun.
It feels like I’ve been doing the same work for the last 20 years.
Same stuff, different day, so to speak. Because I also noticed you worked at Spotify as well, and there was a bit of Firefox—
Ahh, no, the Firefox stuff was just integrations work for a hackathon. Haha, I’m so old that I wrote the original video tag for Firefox that used GStreamer.
Damn, haha. So all roads lead back to GStreamer?
GStreamer really taught me multi-threaded programming. It’s the reason why I’m in this ecosystem. There are people who write Assembly and do all sorts of low-level hacks. I’m like, yeah, I’ve been doing that for a long time. And I’ve learned that unless you have a language with a strong type system and a good compiler, you end up shooting yourself in the foot. I’m sure you can make something marginally faster with Assembly, but I really want Rust.
I want the Rust compiler to tell me: you’re an idiot—this is not going to work because you have a bug here. Before Rust, I felt like I was 10% a software engineer, and 90% a meat debugger. I was just debugging things. And so, I think GStreamer and my love for Rust are the reasons why I started to work on Solana. Rust was getting popular, there weren’t too many jobs that allowed you to go full-time, and I had decided I was only going to work with Rust.
I’m too old to be writing C. I don’t want a memory-unsafe language. I don't want to waste my time. So, yeah, here I am.
Very nice. So, when you made the switch, did you have any misconceptions about decentralized systems before working on validator code? What convinced you that Solana’s architecture could actually scale?
I actually ended up joining Solana almost two years too late because I was busy working on the Rust compiler. Someone at Solana who was just starting work on the virtual machine sent an email saying, “Oh, I’m doing the same thing, it looks like you’re a little further ahead, come work with us.” I didn’t reply to that email because I looked into Bitcoin and then Ethereum, and I realized you could execute things, but you have 10 TPS. It’s not like a serious project, right? You can’t do any real-world stuff with 10 TPS.
And so when I got that email, I didn’t reply. I was just like, okay, these crypto people are not serious yet.
Toly reached out two years later, and I checked the price of SOL. I was like, okay, I should’ve opened that email, haha. This time I did end up chatting with him. Before the chat, he pointed me to some code. I looked at it, and the code was horrible, to be honest. This was really bad Rust code.
But then I spoke to Toly without Googling him, so I had no clue who he was. He was smart and said all the right things. He said we’re building this thing. Currently, we do this. It’s obviously not state-of-the-art, but the ambition is that we scale with the hardware. We build the most performant blockchain, and then the bottleneck becomes the hardware. The idea is that the more hardware you throw at this thing, the more it will scale.
And that was convincing for me. That felt like this wasn’t just a bunch of some blockchain maniacs wanking over World War III…I’m interested in the technology. I am one of the few crypto people that are actually in it for the tech.
Solana has this engineering culture where the tech is heavily influenced by techno-optimism—that whole culture of Increased Bandwith, Reduce Latency. How do you think about it personally? How does that influence the day-to-day at Anza?
To me, Ethereum fundamentally has a scarcity mindset. They’re like, okay, we’ve hit some walls, and we’re going to find workarounds to the walls we’ve hit. We’re going to invent all of this infrastructure, for what are essentially knowledge gaps that we have and that we feel like we cannot possibly fix, right?
And instead, I feel like we are the opposite. It’s like, okay, there is a problem. There are no unsolvable problems, other than things that violate the laws of physics. This is just a huge cultural difference.
If something is broken, we just say, okay, let’s sit down. Let’s profile a little and see what the issues are. Let’s talk to the traders, the market makers. Let’s see what their issues are. And recently, we’ve found some really concrete issues. And we’ve fixed most of them. And, genuinely, in two to three months, we can fix all of them.
There are many things that don’t work today in Solana. We know about them, and we’ve never sat down and gone, “We have the perfect solution! And now, to go further, we need to invent something new, or we have to research and do something else.” No—these are concrete issues. Most of them are really dumb issues.
And, we haven’t had any outages recently. Personally, I think that’s bearish, right? Because I think some people have started to become a little conservative. We know we can go a lot faster. We know we can do 100 million CU blocks tomorrow, right? We just need to speed-run some things. And I am for speed-running everything.
We fundamentally know—we don’t need a roadmap to know that we can 10x the current performance. We see how to do it. We know how to do it. We have either written the code, and it’s not fully complete, or we have written the code and can’t deploy it because there are still some edge cases that need to be fixed. But we know exactly what to do.
We know how to scale this thing.
Performance Engineering
Speaking of the performance work, I don’t think a lot of people really know that Anza has a dedicated performance team. It kind of goes under the radar. Tell me a bit about the team’s structure. How does it differ from other engineering teams at Anza?
Right, so we do have different teams at Anza. We have the consensus people right now who are focusing on Alpenglow. We have the networking people focusing on Gossip, mostly. There are the AccountsDB people who essentially only do accounts database work. There’s the block production team that works on the scheduler. And of course, all the other teams that I’m forgetting.
The difference with the performance team is that we work on everything. We profile, find a bottleneck, and try to ask the relevant teams whether they have the time and expertise because, sometimes, we find issues that not everyone can fix. For example, if you’re a consensus person, you’re not necessarily the best at low-level programming because your expertise is elsewhere. And so, in those cases, we usually just go in and fix the code for them.
So, we don’t just work on one thing. We just find the next bottleneck. We sync roughly every two weeks and decide where we are, what we need to do next, and how do we make the next release faster.
Another big difference is that Anza tends to hire smart people. Like, if you don’t know Rust, that’s fine. If you don’t know low-level programming, that’s fine. We assume that if you’re smart, then we can teach you most of these skills on the job. For the performance team, I usually hire people who actually have experience with the kernel or other low-level stuff because of the kind of bottlenecks that we’re hitting right now.
For example, with the accounts database, there are some algorithmic issues that we have to fix. But, the reason why in the 2.3 release the accounts database is like ten times faster than it was two months ago is because we just fixed the way it does I/O. And you need to know how that works in order to make it faster. Like, if you just have a higher-level understanding of databases, you don’t really know how disks work, and you don’t need to know how the kernel schedules I/O requests.
And so, personally, for the performance team, I tend to hire more low-level people. Again, I don’t even care if they know Rust, but I do want them to have worked with C or C++ on other lower-level things.
The reason why it’s not really known that there’s a performance team is that we essentially started in December. I was originally hired to work on the compiler, but switched to perf around the March 2024 outage and started to optimize things. People were not very happy because I had just told them one day that I was going to work on whatever I wanted. So, yeah, haha, they were not happy, but then we had really good results. And so people came up to me and were like, “Okay, actually, do you want to hire more people to do this?” Then in December, we made the perf effort official and started the team.
To be honest, I’m biased, but the performance team is the best team at Anza, hands down, for sure.
I don’t doubt it, haha. How many people are on the team?
It’s four of us full-time, but I’ve been making jokes on Twitter about Brooks and some other people joining because I’ve started to share my profiler more broadly with people. Until maybe two months ago, only the performance people had access to the profiler. Now everyone has the profiler. And like, for example, Brooks, since I gave him the profiler, Brooks does more performance work than me, haha. Like he’s completely hooked. He’s just making everything faster now.
So now there’s unofficially Brooks and a few other guys who also do a lot of performance stuff. But there are four of us full-time on performance.
So, when working on performance, what’s a “smell” that you look for when profiling that most engineers would miss? How do you decide what needs to be optimized?
There are some really obvious things. Like when I first started profiling Agave, we spent way more time inside the kernel than executing user space code, which is ridiculous. We are not a low-level kind of application. If we were a multimedia framework, it would make sense that you do most of the work in the kernel because, ultimately, you need to send the samples to the hardware. But the only really low-level kind of work we do is Turbine.
So, usually, the biggest smell when I start profiling is if I start to see a lot of yellow in my flame graphs because it means we’re spending too much time in the kernel. It probably means that someone is using some high-level API that looks innocent, but under the hood, it’s just atrocious performance-wise.
In the last year, we have reduced the amount of memory that Agave uses by about 10x because we generally run into the same problem. When you do too many memory allocations, at some point, you need to start interacting with the kernel. You can see the interaction with the kernel in the profiler. You see, okay, where is this coming from? You see that this chain is just thrashing memory all the time. You track it all the way back to where you’re churning and allocating too much memory, and you go and fix it.
There are some harder things. Like, for example, we have this fundamental design issue that we found and I’m fixing. So, famously, Solana is pipelined and there are different stages, and it’s supposed to be all parallelized so that things run in parallel and all.
In practice, because of the way things are architected, we do have a pipeline design, but we have so many stalls in this pipeline. So, we do have different stages, but we’re not maximizing the throughput of all the stages because of stupid design bugs that introduce latency at various parts. This latency is what people typically complain about when they cannot send transactions or when they say there’s jitter in the system. This jitter is not introduced by anything fundamental, or by hardware—it’s just us doing things suboptimally.
But, to be completely frank with you, the kinds of things we work on are stupid. Like, there are some super obvious bugs, and we’re just fixing the obvious bugs.
How do you decide whether these bugs warrant micro-benchmarks versus completely replaying mainnet traffic?
I think most of the performance issues we have come from the fact that people wrote micro-benchmarks. They made the micro-benchmarks faster. They tested them in isolation. And then, when you put everything together in Agave, nothing works like the micro-benchmarks.
So, I personally tell people: do not use micro-benchmarks for anything at all. Even replaying transactions—it’s something I’ve done maybe three times in the last year. Because even when you replay mainnet traffic, you don’t replay at the exact same speed that you would get when you’re actually executing mainnet traffic, so many things change.
And this is part of why we’re making startup so much faster, because otherwise, if you have to wait half an hour every time you want to see whether your fix works, it’s annoying.
When you get to the stage where we are with Agave, you can’t go deep down rabbit holes for just one component. It’s perhaps intellectually interesting work, but it’s completely useless without taking the whole system into consideration. It doesn’t actually advance anything.
So, when looking at the entire system, why is rewriting Turbine to use XDP such a big deal?
I was working on networking right before Solana. I was at a startup that does deep packet inspection. So, they essentially intercept all the traffic that comes into a NIC, analyze it in real-time to stop malicious flows, and then reinject it into the kernel. We essentially had written a whole TCP and UDP stack in user space using Rust, Tokio, and XDP, of course.
When I joined Anza, it was obvious that at some point we would need to use XDP. When Firedancer started, they were like, “We’re going to start with an XDP implementation of Turbine,” and I told them it was stupid. It made no sense. It takes a lot more time to do that because XDP is objectively a terrible API. So, you want to avoid using it for as long as possible until the wheels literally come off.
Then you’re like, oh [redacted], now I have to use XDP, which is what happened to us. We had been working on removing all other bottlenecks in the pipeline until the day we started load tests, and we saw Turbine completely stop working.
So we were like, okay, this is clearly not viable anymore. And I actually tried really, really hard not to use XDP because I had used it in the past and knew how horrible it is. I tried to build an io_uring-based implementation of Turbine. And then I found some bugs in io_uring. I started fixing those bugs. I still have some kernel patches that I want to send over, but at some point, I realized, okay, I cannot tell all of our validator operators to use my custom kernel to run Solana.
I’ll have to use XDP, and we did. And now it works.
The answer is: you find the next bottleneck, and you fix it. And you keep fixing all the bottlenecks you find. You can think about tomorrow’s problems tomorrow. That’s my motto. You could only worry about tomorrow, but then today would suck. Today sucks on Solana. Blocks are too small, Turbine adds too much latency, the scheduler still has issues. We have to fix things today. Otherwise, there’s no tomorrow to go that fast.
How do you guard against performance regressions? How do you coordinate with Firedancer?
Perf regressions are a struggle. I personally profile something in Agave every day, like at least a few times every day, and we often have regressions because writing performant code is a job. You need to know how to write performant code. If you write Rust code, it’s going to be, on average, more performant than if you write some Node.js code or Python code or whatever. But if you, for example, work on the AccountsDB, where you work with collections with millions of items, you can’t just write code. It’s hard to build algorithms and work on large datasets in an efficient way.
So, we do have occasional regressions. Until like a month ago, I was just yelling at everyone, pretty much, haha. Like Brooks with AccountsDB—I think at some point he hated me. We have a great relationship, but until a month ago, essentially half of our interactions were me yelling at him because something was slower in the AccountsDB.
For the protocol, working with Firedancer has improved this because I feel like many parts of the protocol ended up growing organically in response to different challenges. Protocol development started with an idea; they put it into production, and, like most ideas, they don’t work the first time. Then they started adding things on top. Many of the things added on top were really bad ideas performance-wise.
For example, in Gossip, there was this thing called epoch slots where the cluster would essentially broadcast to everyone whose validators have seen which slots. And just casually, six months ago, while profiling something else, I noticed that this epoch slots thing was taking more time CPU-wise than actually executing transactions. And bandwidth-wise, it was taking four times the amount of bandwidth taken by Turbine. It’s just this random patch that was built on top of the protocol at some point in time to mitigate an issue.
So, this is not happening anymore. And, it’s in part thanks to Firedancer. Now, when someone makes a proposal, we have to work with Firedancer. They’re obviously making another client, haha, so they have to budget the work. They have to decide how long implementing this proposal would take. What’s the priority of this? And so they push back for better or worse, and they push back on a lot, or almost all, of the changes that we do. And they’re very good at pushing back on really bad changes.
Once Turbine ships with XDP, if you had a month with no meetings and no conflicts, just complete freedom to work on whatever, what would be the first part of Agave that you would optimize or re-architect?
I really want—I literally have dreams about this—I’ve been wanting to rewrite the AccountsDB for like two years. I just know that if I start doing that, it will literally take a month or two of my life. And at this time, it’s not the best way for me to spend my time. But I will do it. I keep trying to bully Brooks into doing it, but if he doesn’t do it, I will do it at some point.
Future Developments
Looking towards the future with planned features like Async Execution or Multiple Concurrent Leaders, what would be the biggest headache for the performance team?
At a gut level, I hate async. Like, the model right now is very simple. You get some transactions, you replay them really fast, and then you vote. It’s super easy conceptually. Async makes the design harder, but it also makes the actual experience of using the chain so much better.
I also hate the design for Multiple Concurrent Leaders, haha. I understand that, especially if you want to do high-speed trading, you need multiple leaders. Like, there’s no alternative. But personally, that’s something that will not happen for at least 12 months. And so I don’t want to get too distracted.
It’s important that in one year we have Alpenglow, but it’s also important that in the next month we do a hundred million CUs. Like, we need to focus on making what we have now fast because Alpenglow is new code, and Multiple Concurrent Leaders is new code. There are unknown unknowns. Say that, for whatever reason, Alpenglow ends up behind schedule like Firedancer. Then what? Do we keep having the shitty, slow chain we have now? No, we have to focus on going fast today.
Advice on Performance Engineering
What resources would you recommend to somebody with Rust experience who wants to get into performance coding and profiling?
First of all, I recommend using a good profiler, which doesn’t exist today, haha. But I’m going to release mine soon, hopefully. And then I actually think the best way to learn anything is to do it on something that you actually care about.
So my advice to people wanting to learn how to do performance work is to find some of the software that you use every day and that you love, profile it, and make it faster, because a lot of software is very slow. Like a lot of fast software can go so much faster. Computers are really fast. And because they’re really fast, it’s very easy to do something slow and not even know about it.
The way I see a lot of people get hooked is to find something that you use and profile it—make it faster, send pull requests, and I guarantee they will get accepted. You will get so hooked.
The kernel is just any other dependency. Like, when you’re working on something, and you’re using a library, chances are that you have to look into the library at some point if something doesn’t work or something is slow or whatever. The kernel is just another library. So,, read kernel code. Kernel code is some of the simplest code I’ve ever seen. If you look at the scheduler in the kernel, conceptually, it’s simpler than the scheduler we have on Solana.
But just go and read the Linux code. It’s C, it’s not ideal, and so anything that touches hardware is usually cursed, haha, but most of Solana does not work directly with the hardware. If you find generic stuff that you use every day, like a syscall or Tokio, or file system code. That’s very easy. Just go read it. And if you spend a week reading it, you will learn it like any other code. And you will feel like a fucking genius. You’ll be like, ahh, now I can do kernel work, you know?
What’s the best way to start contributing today?
My preference is to get on Discord and get into the development channel for the Solana Tech Discord. For example, there is a guy who has been working on some networking stuff, who, the other day, just started a conversation about the TPU code. And to be honest, he has a better understanding of how that code works than most people at Anza. Like, you can contribute. And if you’re as good as that guy and you send me patches, I will merge the patches.
We don’t do a lot of internal, non-open development. So, if you don’t see a pull request in a bit of code that you think is slow, that you’re knowledgeable of,, and want to fix it, tell me on Discord, we’ll create an issue, we will assign it to you, and you fix it.
I want people to send me patches. I do want to grow our community by sending patches.
Rapid Fire Questions
What’s the hardest bug that you’ve squashed this year?
A miscompilation with some floating point code that was blocking the 2.2 release just a couple of months ago. It was not the hardest, but it was the most tedious thing because I had to spend days just reading assembly code.
What music do you have on repeat when you’re staring at flame graphs?
Usually house or minimal techno. Stephan Bodzin is usually on a loop.
What’s your favorite Linux distro?
Debian, for sure. It’s the only one that’s not super annoying, haha.
What’s the best optimization that’s coming with Alpenglow?
That we’re not going to execute votes. Vote transactions are [redacted]. And it’s so good that the whole voting thing is not transactions anymore.
What are your thoughts on ZK?
It’s a great technology, but I feel like it’s still very much in the research phase for scaling blockchains. And so I’m not super interested.
July 2026, a year from now, what’s your slot time prediction?
Personally, hopefully sooner than that, I want to have 200 millisecond slots. I keep telling Toly that he needs to meme it into existence. So, hopefully, by then, it will have happened. I think we can do it even today. I think that would be the minimum in the sense that anything more than that would be a failure, but we can go lower than that.
When is Agave going to hit that fated 1 million TPS?
Haha, I will not be doing a rapid-fire answer for that one. I keep asking people, “Where are the one million transactions going to come from?”
We’re going to do a million TPS when people have a million TPS to send, but I don’t think it’s going to be anytime soon, sadly. I did say that if Agave doesn’t do a million TPS by October, I’ll quit my job. So I might have to bullshit a demo, haha.
Conclusion
Alessandro Decina represents the beating heart of Solana’s performance culture—a relentless pursuit of speed grounded in pragmatic engineering, rather than theoretical perfection. His performance team punches well above their weight class, finding and fixing the “stupid bugs” that collectively compound into systemic slowdowns.
In a world where many teams get lost in grand architectural visions, Anza’s performance team stays laser-focused on the bottlenecks directly in front of them. Profile, identify, fix, repeat. It’s unglamorous work that yields glamorous results: a blockchain that actually scales with hardware, rather than around it.
The conversation above reveals a fundamental truth about building high-performance systems: speed isn’t just about clever algorithms or cutting-edge hardware. It’s about a cultural commitment to never accepting “good enough” when “great” is technically achievable. For Solana, that means 200 millisecond slots, Async Execution, Multiple Concurrent Leaders, and sustaining over a million TPS aren’t just technical milestones—they’re inevitabilities.
Related Articles
Subscribe to Helius
Stay up-to-date with the latest in Solana development and receive updates when we post