
A few weeks ago, there was a huge Cloudflare outage that knocked out half the internet for a while. As someone who has written a fair bit of Rust in my spare time (23KLOC according to cloc over the last few years), I couldn’t resist the urge to add some constructive thoughts to the discussion around the Rust code that was identified for the outage.
And I’m not going full Rust-Evangelism-Strike-Force here, as my pro-Swift conclusion will attest. Basically I’d just like to take this outage as an opportunity to recommend a couple tricks for writing safer Rust code.
So, here’s the culprit according to Cloudflare’s postmortem:
pub fn fetch_features(
&mut self,
input: &dyn BotsInput,
features: &mut Features,
) -> Result<(), (ErrorFlags, i32)> {
features.checksum &= 0xffff_ffff_0000_0000;
features.checksum |= u64::from(self.config.checksum);
let (feature_values, _) = features
.append_with_names(&self.config.feature_names)
.unwrap();
...
}
Apparently it processes new configuration, and crashed at the unwrap because configuration with too many features was passed in.
Keep in mind that I’m not seeing the greater context of this function, so the following may be affected by that, but here are my thoughts re the above code:
&dyn for input indicates this uses dynamic dispatch, which means this isn’t intended as high-performance code. Which makes sense if this is just for loading configuration. Given that, they could have simply used anyhow’s all-purpose Result to make their lives simpler instead of this complex tuple for the error generic.unwrap() is called. This is the big red flag, and something that should only generally be done in code that you are happy to have panic eg command line utilities, but less so for services. Swift’s equivalent is the force-unwrap operator !. When Swift was new, it was explained that the ! was chosen because it signifies danger, and stands out like a sore thumb in code reviews to encourage thorough examination. Rust’s unwrap isn’t as obvious at review time, and thus can sneak through unnoticed.? after the call to append_with_names, so that this function would hot-potato the error to the caller, instead of panicing.append_with_names returns an Option not a Result, ok_or(..)? would be a tidy option.Here I’ve changed the fetch_features function to be safer, with a couple options for how to gracefully handle this if append_with_names returns either a Result or an Option (it isn’t clear which it is from Cloudflare’s snippet, so I’ve done both). Note that I’ve also added some boilerplate around all this to keep the fetch_features code as similar as possible, but also commented out some stuff that’s less relevant.
fn main() {
let mut fetcher = Fetcher::new();
let mut features = Features::new();
if let Err(e) = fetcher.fetch_features(&mut features) {
// ... Gracefully handle the error here without panicing ...
eprintln!("Error gracefully handled: {:#?}", e);
return
}
}
enum FeatureName {
Foo,
Bar,
}
struct Fetcher {
feature_names: Vec<FeatureName>,
}
impl Fetcher {
fn new() -> Self {
Fetcher { feature_names: vec![] }
}
// This is the function Cloudflare said caused the outage:
fn fetch_features(
&mut self,
// input: &dyn BotsInput,
features: &mut Features,
) -> Result<(), (ErrorFlags, i32)> {
// features.checksum &= 0xffff_ffff_0000_0000;
// features.checksum |= u64::from(self.config.checksum);
// If append_with_names returns a Result,
// the question mark operator is safer than unwrap:
let (feature_values, _) = features
.append_with_names_result(&self.feature_names)?;
// If append_with_names returns Option,
// ok_or converts to a result, which forces you to be
// explicit about what error is relevant,
// which is then safely unwrapped using the question mark operator.
let (feature_values, _) = features
.append_with_names_option(&self.feature_names)
.ok_or((ErrorFlags::AppendWithNamesFailed, -1))?;
Ok(())
}
}
#[derive(Debug)]
enum ErrorFlags {
AppendWithNamesFailed,
TooManyFeatures,
}
struct Features {
}
impl Features {
fn new() -> Self {
Features {}
}
// This is for if it returns a Result:
fn append_with_names_result(
&mut self,
names: &[FeatureName],
) -> Result<(i32, i32), (ErrorFlags, i32)> {
if names.len() > 200 { // Config is too big!
Err((ErrorFlags::TooManyFeatures, -1))
} else {
Ok((42, 42))
}
}
// This is for if it returns an Option:
fn append_with_names_option(
&mut self,
names: &[FeatureName],
) -> Option<(i32, i32)> {
if names.len() > 200 { // Config is too big!
None
} else {
Some((42, 42))
}
}
}
Feel free to paste this into the Rust Playground and see if you have better suggestions :)
? operator is a great option, particularly if you are already in a function that returns a Result, so please take advantage of such a situation.ok_or is a great way to safely unwrap Options inside a Result function. If forces you to think about ‘what error should I return if there’s no value here?’.If anyone from Cloudflare is reading this, I hope this critique does not come across as unkind, much of my code is not amazingly bulletproof either! And kudos to Cloudflare for allowing us to see some of their code in the postmortem :)
Thanks for reading, I pinky promise this was written by a human, not AI, hope you found this useful, at least a tiny bit, God bless!
Thanks for reading! And if you want to get in touch, I'd love to hear from you: chris.hulbert at gmail.

(Comp Sci, Hons - UTS)
Software Developer (Freelancer / Contractor) in Australia.
I have worked at places such as Google, Cochlear, Assembly Payments, News Corp, Fox Sports, NineMSN, FetchTV, Coles, Woolworths, Trust Bank, and Westpac, among others. If you're looking for help developing an iOS app, drop me a line!
Get in touch:
[email protected]
github.com/chrishulbert
linkedin