Looking Back at Stageless

Bevy's 3rd birthday happened recently, so I'll take this as an opportunity to reflect. In the past year, I've had some significant contributions to Bevy including finishing pipelined rendering, but for this post I'd like to focus on the year-and-a-half journey of the Stageless scheduling rewrite. Much of the heavy lifting for it was done by @alice-i-cecile(RFC) and @maniwani(Implementation), but I'd like to think I contributed significantly by providing ideas, reviewing things, and fixing bugs in the final implementation. From gathering ideas to the implementation PR and numerous follow-up PRs, it was quite the project. This post is mostly about thinking about what went right and what went wrong in the process.

Intro

Officially known as "Scheduling V3", the stageless work was a rewrite of system scheduling that aimed to solve a bunch of problems that had come to be obvious with numerous issues and help threads on Discord. The main issues being:

  1. RunCriteria were hard to compose due to them having 4 states. This prevented using simple boolean logic on them when trying to combine them and potentially forcing users to deal with all 16 combinations of state.
  2. State used looping criteria which could lead to deadlocks if the state was repeatedly changed and could have other surprising behavior related to the looping.
  3. Applying the command buffers to the world required creating a new stage, but stages were hard barriers between systems, so would require moving the systems to the new stage.

Stages were the container for systems and these three problems were intertwined with how stages behaved. Thus "removing" stages and making Bevy "stageless" was how the work was referred to. Rather than completely removing stages, what happened was a reframing and extending of the existing APIs. But the term stuck around. Through many discussions on Discord and GitHub, the major fixes for these problems that we landed on were to:

Timeline

What follows is a rough timeline of the year-and-a-half journey to bring Stageless to fruition.

Pre-RFC Period

RFC Period

Post RFC Merged

Note that this was not a continuous process. In some of the major gaps between dates no work was being done on Stageless. This is just how open source work goes sometimes.

Post Mortem on the RFC process

What Went Right

What Went Wrong

How do RFCs fit into current Bevy?

I think that having seen the work that went into the Stageless RFC, people are now reluctant to open their own RFCs. They might think it'll be easier to just open a PR and hash things out there. In a lot of cases they're probably right, but I suspect that there are a handfull of current PRs that would have been better served going through the RFC process as they get hung up on their more controversial aspects until the author gets too busy to continue or loses interest. I personally avoid reviewing some PRs because I don't understand the problem space well enough to weigh-in on the controversial aspects of those PRs.

Stageless itself should probably have had a more formal Pre-RFC period where what features needed to be included and which features should be cut was decided. A lot of the early reviews of Stageless focused too much on the details and we ended up with a decent amount of churn on the prose for features that were eventually cut.

Cart has been doing something like Pre-RFC's with his recent rewrites like with assets and scenes. However, these never get promoted to full RFCs. Cart gets away with this because of his BDFL status and his PRs always get priority. This isn't a problem, but it ends up devalueing RFCs more than it should for the average contributor. If the more controversial details have already been hashed out in an RFC. Reviewers can then focus on implementation details and code quality, which more reviewers feel comfortable with. This is all assuming that our experts who review RFCs are able to review RFCs faster than PRs. I'm assuming this is true, but it might be a bad assumption.

Overall, we need more features to go through the RFC process and see if we can iron out the flaws that currently exist and give a better path for controversial changes to be implemented. Especially now that we have SME's that can merge RFCs. Then we can better understand how RFCs fit into the Bevy contribution picture.

Post Mortem on the Implementation

What Went Right

What Went Wrong

New Rule: Always encourage breaking up large PRs

My new personal rule for my own bevy work and reviewing other PRs is to "always encourage breaking up large PRs". The most limiting factor with Bevy velocity is reviewer time. The larger the PR the larger contiguous block of time a reviewer has to set aside. Large PRs do not respect the limited time that our reviewers have.

Sometimes breaking things up might not be possible, but I suspect the amount of times that happens is smaller than we think. Ideally we're able to break large work into multiple refactors that keeps the code working. This can be more work for the author, but I think it'll lead to better code and faster velocity. If this isn't an option, we can have the original code and the rewrite to living in the repository at the same time. This is not ideal as fixes may need to be applied to both sets of code. But this sometimes will be the least worst solution if we need multiple PRs to reach feature parity with an existing solution.

Future Scheduling Stuff I may work on

Final Thoughts

I'm really proud to have worked on Stageless. It really feels like it brought a lot of value to Bevy users. For myself, I learned a lot about Bevy and got much better at writing and understanding Rust. I look forward to continuing to contribute to Bevy and participate in this community in the future.