Hi, I'm Chris 👋

I am a software developer / freelancer / contractor based in Australia.
I specialise in building well-structured app platforms for companies to build their long-term app strategies upon.
I have a broad array of experience from my work at Google, News Corp, Fox Sports, NineMSN, FetchTV, Coles, Woolworths, Trust Bank, and Westpac.
Please see my portfolio, then get in touch if I can be of service!

See my Portfolio »

Angry Cloud from Commander Keen 4

A few weeks ago, there was a huge Cloudflare outage that knocked out half the internet for a while. As someone who has written a fair bit of Rust in my spare time (23KLOC according to cloc over the last few years), I couldn’t resist the urge to add some constructive thoughts to the discussion around the Rust code that was identified for the outage.

And I’m not going full Rust-Evangelism-Strike-Force here, as my pro-Swift conclusion will attest. Basically I’d just like to take this outage as an opportunity to recommend a couple tricks for writing safer Rust code.

The culprit

So, here’s the culprit according to Cloudflare’s postmortem:

pub fn fetch_features(
        &mut self,
        input: &dyn BotsInput,
        features: &mut Features,
) -> Result<(), (ErrorFlags, i32)> {
    features.checksum &= 0xffff_ffff_0000_0000;
    features.checksum |= u64::from(self.config.checksum);
    let (feature_values, _) = features
        .append_with_names(&self.config.feature_names)
        .unwrap();
    ...
}

Apparently it processes new configuration, and crashed at the unwrap because configuration with too many features was passed in.

Code Review

Keep in mind that I’m not seeing the greater context of this function, so the following may be affected by that, but here are my thoughts re the above code:

  • It returns a Result, with nothing for success case, and a combo of ErrorFlags and i32 for the failure case.
  • The presence of the &dyn for input indicates this uses dynamic dispatch, which means this isn’t intended as high-performance code. Which makes sense if this is just for loading configuration. Given that, they could have simply used anyhow’s all-purpose Result to make their lives simpler instead of this complex tuple for the error generic.
  • unwrap() is called. This is the big red flag, and something that should only generally be done in code that you are happy to have panic eg command line utilities, but less so for services. Swift’s equivalent is the force-unwrap operator !. When Swift was new, it was explained that the ! was chosen because it signifies danger, and stands out like a sore thumb in code reviews to encourage thorough examination. Rust’s unwrap isn’t as obvious at review time, and thus can sneak through unnoticed.
  • Since we’re already in a function that returns Result, it would be more idiomatic to use ? after the call to append_with_names, so that this function would hot-potato the error to the caller, instead of panicing.
  • If append_with_names returns an Option not a Result, ok_or(..)? would be a tidy option.

Alternative

Here I’ve changed the fetch_features function to be safer, with a couple options for how to gracefully handle this if append_with_names returns either a Result or an Option (it isn’t clear which it is from Cloudflare’s snippet, so I’ve done both). Note that I’ve also added some boilerplate around all this to keep the fetch_features code as similar as possible, but also commented out some stuff that’s less relevant.

fn main() {
    let mut fetcher = Fetcher::new();
    let mut features = Features::new();
    if let Err(e) = fetcher.fetch_features(&mut features) {
        // ... Gracefully handle the error here without panicing ...
        eprintln!("Error gracefully handled: {:#?}", e);
        return
    }
}

enum FeatureName {
    Foo,
    Bar,
}

struct Fetcher {
    feature_names: Vec<FeatureName>,
}

impl Fetcher {
    fn new() -> Self {
        Fetcher { feature_names: vec![] }
    }
    
    // This is the function Cloudflare said caused the outage:
    fn fetch_features(
        &mut self,
        // input: &dyn BotsInput,
        features: &mut Features,
    ) -> Result<(), (ErrorFlags, i32)> {
        // features.checksum &= 0xffff_ffff_0000_0000;
        // features.checksum |= u64::from(self.config.checksum);
        
        // If append_with_names returns a Result,
        // the question mark operator is safer than unwrap:
        let (feature_values, _) = features
            .append_with_names_result(&self.feature_names)?;
        
        // If append_with_names returns Option,
        // ok_or converts to a result, which forces you to be
        // explicit about what error is relevant,
        // which is then safely unwrapped using the question mark operator.
        let (feature_values, _) = features
            .append_with_names_option(&self.feature_names)
            .ok_or((ErrorFlags::AppendWithNamesFailed, -1))?;
        
        Ok(())
    }
}

#[derive(Debug)]
enum ErrorFlags {
    AppendWithNamesFailed,
    TooManyFeatures,
}

struct Features {
}

impl Features {
    fn new() -> Self {
        Features {}
    }
    
    // This is for if it returns a Result:
    fn append_with_names_result(
        &mut self,
        names: &[FeatureName],
    ) -> Result<(i32, i32), (ErrorFlags, i32)> {
        if names.len() > 200 { // Config is too big!
            Err((ErrorFlags::TooManyFeatures, -1))
        } else {
            Ok((42, 42))
        }
    }

    // This is for if it returns an Option:
    fn append_with_names_option(
        &mut self,
        names: &[FeatureName],
    ) -> Option<(i32, i32)> {
        if names.len() > 200 { // Config is too big!
            None
        } else {
            Some((42, 42))
        }
    }
}

Feel free to paste this into the Rust Playground and see if you have better suggestions :)

Suggestions

  • Instead of unwrap, the ? operator is a great option, particularly if you are already in a function that returns a Result, so please take advantage of such a situation.
  • ok_or is a great way to safely unwrap Options inside a Result function. If forces you to think about ‘what error should I return if there’s no value here?’.
  • Consider Swift! The exclamation point operator is a great way of drawing attention to danger in a code review, which is a fantastic piece of language ergonomics.

Summary

If anyone from Cloudflare is reading this, I hope this critique does not come across as unkind, much of my code is not amazingly bulletproof either! And kudos to Cloudflare for allowing us to see some of their code in the postmortem :)

Thanks for reading, I pinky promise this was written by a human, not AI, hope you found this useful, at least a tiny bit, God bless!


You can read more of my blog here in my blog archive.

Chris Hulbert

(Comp Sci, Hons - UTS)

Software Developer (Freelancer / Contractor) in Australia.

I have worked at places such as Google, Cochlear, Assembly Payments, News Corp, Fox Sports, NineMSN, FetchTV, Coles, Woolworths, Trust Bank, and Westpac, among others. If you're looking for help developing an iOS app, drop me a line!

Get in touch:
[email protected]
github.com/chrishulbert
linkedin



 Subscribe via RSS