I’m sure by now you’ve already heard about the CrowdStrike Windows Outage, said to be the largest IT outage in history. Perhaps you know about it because you’re surrounded by the blue glow of computers displaying the infamous “blue screen of death,” or maybe you’re stuck at an airport because your airline relies on Windows systems that use CrowdStrike cybersecurity software, or maybe you’re contemplating shoplifting a bottle of wine and box of Chex Mix because the self-service checkout systems have blue-screened at your local grocery store.
A shocking number of very diverse systems around the world have ground to a halt because of this Windows/CrowdStrike combined failure, causing a lot of trouble for a lot of people just trying to get through their daily lives. But the real tragedy here is that it doesn’t seem like CrowdStrike CEO George Kurtz can go racing today.
Yes, racing! Look, we think it’s great that a rich CEO decides to spend his time and money racing, and we mean that genuinely! In this particular case, the racing is the SRO GT World Challenge, currently running at Virginia International Raceway (VIR). Of course, all of this global disaster business meant that George didn’t really get a chance to go out and race, but we do know he was at VIR, hoping to race.
In fact, we have sources that have told us that this interview was actually conducted while Kurtz was at VIR:
EXCLUSIVE: CrowdStrike founder and CEO @George_Kurtz speaks on TODAY about the major computer outages worldwide that started earlier today: “We’re deeply sorry for the impact that we’ve caused to customers, to travelers, to anyone affected by this.” pic.twitter.com/fWz6KhgrcZ
— TODAY (@TODAYshow) July 19, 2024
Man, that dude must be having a deeply, profoundly shitty day. The global outage is bad enough, but being excited for a weekend of racing and then having to work instead? The worst.
We even have some evidence that Kurtz was not able to run in the practice session, either:
See that? Kurtz car didn’t get out onto the track. He did manage to get out on the track yesterday, when the world was still pure and good and free of catastrophic computer failures. You can see the CrowdStrike cars out on the track yesterday in their Instagram post:
…and we can tell by the windshield display that Kurtz himself got some track time in this Mercedes-Benz AMG GT:
On top of all this, we have a source from the paddock who goes by “Buzz” who tells us that CrowdStrike is very likely withdrawing all of their cars. So, it appears that there will be no racing for anyone at CrowdStrike tomorrow. And you thought you had it bad just because you’re stuck in Dayton or trapped in a walk-in freezer or something like that! They can’t race!
The scale of this issue is truly incredible; I’m just thankful the Autopian Servers all run on a cluster of 6502-based Commodore PETs running a proprietary lisp-based operating system that is effectively immune from security threats, thanks to the use of an 11-bit byte and all documentation being written in Aramaic.
Supposedly a fix has been developed and is in the process of being deployed, and this whole nightmare will be behind us soon enough. But that will not bring these lost days of racing back.
Nothing will.
DDos attacks didnt stop the Head of Iracing from going on vacation. JK to be honest he was in the middle of PA with his family and he had good guys in the office in Massachusetts taking care of the problem
Hopefully this will be the death knell of C++ (null pointer) in critical infrastructure.
Then again, my long retired Dad still gets inquires about COBOL jobs. Some habits die hard, I guess.
As someone still working this weekend to fix this issue within my company, I’m just saddened to hear of his inability to race. Here I am in the middle of covid (still have fever and everything) fixing machines over and over and over again. I have nothing by sympathy for his loss.
Did I lay it on thick enough? Anyone believe me?
I, for one, fully believe in your heartfelt sympathy for this man’s unprecedented plight.
WHEW! I was worried…
An 11 bit byte is crazy talk. But a 10 bit byte? That’s just a C/10 Arpanet IMP.
Apparently the SATA protocol also uses 10 bit bytes (converted from the original 8 bit). DAT and DCC audio recording also applies this 8bit/10bit encoding to recorded data. I’ve known DAT and DCC for decades, and even used DAT at some point, and I had no idea. I love the stuff that I learn from silly jokes in this website.
I’ve met George a few times, and talked cars and racing with him. He’s a nice-enough person and a genuine car lover. However, I do recall when he was at McAfee they were prone to this type of failure. I don’t blame him personally, but perhaps the valuation has inflated his wealth to the point of a bit of detachment?
A Lisp OS is on brand for this site: many, many cars involved.
I’m impressed Jason knows what Lisp is!
What is it with these tech CEO’s and their hairdos? I know that is so superficial of me, but, it’s hysterical.
I need to stop judging people….
Edit: Woah! That said, he is almost 60. Looks good for a 60 year old, but that makes his haircut even more hysterical to me.
This was my first thought! Never trust a CEO with a do like that.
Apparently the outage has taken out a couple of Formula One teams on their race weekend. Mercedes AMG was knocked offline.
https://www.si.com/fannation/racing/f1briefings/news/f1-news-mercedes-speaks-out-on-impact-of-chaotic-crowdstrike-global-outage-01j35n7ndpya
https://www.mercedesamgf1.com/news/mercedes-announces-global-partnership-with-crowdstrike
“Taken out” and “knocked offline” is rather overstating it. There were certainly teams affected (including Mercedes, rather obviously), but by the time the Free Practice sessions today started they were up and running: https://www.autosport.com/f1/news/mercedes-back-where-we-need-to-be-after-crowdstrike-tech-glitch-drama/10636225/
On the internet, all knobs go to 11.
Out of all the names to put on the side of a car you intend to go racing with Crowdstrike is…certainly one of them. It would be absolutely perfect if they were running Mustangs but they’re a bunch of rich assholes in German cars. Alas.
Maybe it’s a Le Mans 1955 reference.
Too soon.
Would be fitting for WRC.
Reminds me of Knight Capital Group’s boo boo in 2012. They made an update that worked in test mode but screwed up the NYSE. They don’t exist anymore
Yeah, but at least KCG ate the whole financial cost of their boo-boo (which technically wasn’t a boo-boo, just very bad order execution. While it did cause quite a bit of market volatility, the $460 million they lost ended up in the pockets of people who hadn’t screwed the pooch and reacted fast enough to fleece the poor schmucks at KCG).
Somehow I doubt Crowdstrike will fully compensate the victims of this cluster-f, short of completely liquidating their entire company they simply don’t have the money to do it.
Crowdstrike is gonna have to pay, one of 4 ways:
1) SLA credits. Most of the major SaaS players will “guarantee” “uptime” up to 99.99% or 99.95% of the time (though “uptime” is kinda loose – like, Crowdstrike’s platform was still functioning during this outage, even though they bricked their customers’ endpoints). So a bunch of credits (basically free service) will get issued.
2) Contractual limits of liability. This is based on ability to resolve P0 or P1 cases in a timely fashion. IOW, if you’re GE and you call Crowdstrike with a “Your shit’s broken my machines,” Crowdstrike is likely obligated to provide a fix within 8 or 12 or 24 hours. That’s going to be impossible for some of these customers. Again, open to interpretation (“provide a fix” in this case is “physically walk to server, with bitlocker key, and delete a file in safe mode), but Crowdstrike will pay some money here. For big customers, it’s not uncommon to see limits of liability in the 6 or 7 figures. For most smaller customers, the limits of liability are “What you’ve paid us already” (which is still bad, albeit not as bad, as having to pay a customer more than they’ve already paid you.
3) Cyberinsurance subrogation. If you’ve got business continuity insurance that you’re going to invoke, the cyberinsurance entity will likely go after (ironically) Crowdstrike for loss of business.
4) Lawsuits. If you’re, say, Gap/Athleta, leggings and crop tops don’t ship and, realistically, no end-user contracts are breached. But if you’re, say, Delta, planes don’t fly, and that puts you possibly outside the bounds of force majeure where you’re eating compensation for your passengers under contracts of carriage. Delta isn’t going to want to pay, so there will undoubtedly be lawsuits filed by Delta seeking damages equivalent to what they’re paying to passengers in the event Crowdstrike doesn’t pony up.
I’ll wager Crowdstrike ends up in bankruptcy on account of this. This is an unfathomable QA lapse that speaks to deeply broken internal processes.
Exactly correct
It’s obvious that they don’t test updates before rolling them out. I don’t feel sorry for them. Making the end user a QA tester doesn’t usually backfire this badly but it needs to stop.
Usually, for a big entity like this, the “best practice” is going to be (and yes, I’m aware of “Agile” and the cult around it, but bear with me):
None of this is very hard, mind you – I was sole QA for a product installed on roughly 2.5MM endpoints circa 2016, and we aimed for a weekly update cadence. Get the release candidate on Monday, run it through automation Tuesday/Wednesday, release to insiders Thursday while we do our in-house testing, then look at the internal and MS telemetry before going wide on Friday. Look at the behavior of stuff on Friday throughout the day, then call it good. We did the same thing as Crowdstrike (intercept driver), and we never bricked machines like this. Any BSOD with our software required human review, and while there were often dumb-shit issues (no free space, customer had too little memory, Microsoft released an update that broke one of our dependencies, etc.) they always in turn generated KB articles so customers could see what was wrong.
That’s pretty much how my team does Agile (smartly, I think)…it’s a constant, positive tension between the Agile purists and me. Sprints of requirements/development/dry run testing/customer demos followed by a sprint of hardening and then formal, QA-witnessed test. It’s dramatically reduced the number of issues we find in formal test.
More likely some schlub was able to just put the wrong file in an update queue. I worked for a company that this happened to (schlub was not me, and the consequences were nothing like as wide-spread, but it caused chaos in our userbase for a day or two). Still a SERIOUS breakdown in procedures, but I find it highly unlikely that they “skipped QA testing to save a buck” as seems to be the conspiracy theory du jour. There is a BIG difference between this sort of software and games and other consumer crap where making the end user the beta tester really is a thing.
In our case it was a file that configured credit-card payments. And ALL the stores had to be re-configured at once because the credit card bank changed something on their side. So the bad file went out overnight, and the next morning nobody could take credit cards in about 500 hardware stores. Oops. And it was a subtle enough problem that it took a while to figure out what was up. We could not just roll back as the bank had made their change so the old file wouldn’t work either. Fun.
The very CEO in this article was formerly CTO of McAffee, and presided over a similar, though smaller, fuckup in 2010 that was linked to costcutting.
Well if this update caused Windows computers to blue-screen on restart as has been reported, then they absolutely did not test it as that is a fault that would have been immediately apparent.
I am wondering what a lisp-based OS would be like
It’s easy to imagine:
Windowsss
OSss Xss
iOSss
It’s called emacs!
I see what you did there.
Well-spoken CEO under trying circumstances, but wow. That. Hair!
Clearly he wasn’t trying hard enough – so now he has circumstances.
On the plus side, he does have hair like Astro Boy!
This should teach corporation to not just click auto-update and forget about it. Any large company should manage their updates on their time schedule and not trust anyone to push out flawless code.
I mean, I don’t think this is that easy. This is about security. The bad guys won’t wait to attack until you’ve put the walls up, in a manner of speaking.
It is that easy, though. Updates don’t get pushed to our machines at work until one of our teams has a chance to test them and make sure they don’t brick our computers. Someone would have to have pretty intimate knowledge of our system to do more financial damage than us not being able to work for even a day. The financial industry uses an awful lot of proprietary software that is frequently obtuse and a nightmare to use, but that’s by design. I would hazard a guess that many, if not most, can do without immediate securiy updates. I can’t/won’t provide specifics about where I work, but this issue took down an institution who proceses $30 million + for us a day. And we’re not exactly a huge institution ourselves.
We are also far from the only company who uses them. We’re probably on the smaller end.
My company is FAR smaller, but our IT Director absolutely, positively refuses to use any software for mission-critical systems that can push any sort of update without his explicit approval after in-house testing. Paranoia is a good thing sometimes.
But welcome to Software-as-a-Service! We screw up your systems without you having to lift a finger!
Usually updates are run on a testing environment first. This is the accepted standard for finding potential errors in code so you don’t crash the planet.
Pushing it out into the live environment without doing this is pathologically idiotic.
The lady in the photo on the AP site is wearing a souvenir shirt from Iowa. Who the hell buys a souvenir shirt from Iowa?
RAGBRAI riders. I have several. There are also one or two Caitlin Clark fans out there.
Big thumbs up for both Caitlin Clark and RAGRAI [Register’s Annual Great Bicycle Ride Across Iowa for folks who wonder what the heck that acronym is for].
That picture of the CEO reminds me of coneheads
Might want to CTRL-H CyberStrike to CrowdStrike.
My friend just bought a house and was planning to move in today pending paperwork getting finished but the outage screwed all that up and now he has reschedule movers, utilities, appliance delivery, etc. Sucks.
Reading this, on a mac, made my day
Same but Linux
Same but Dell
Same but eMachines
Same but homebrew workstation running… Windows 10.
Gosh, when I retired it was z/os on IBM Mainframe
You’d be surprised how well this site works on ReactOS.
Could just have easily been Macs affected, crowdstrike runs on there too.
To be fair, it wouldn’t really have affected nearly as many people, because nobody uses Macs as servers.
Even Apple doesn’t use Macs as servers (iCloud was built on VMware + Supermicro + Linux, for the curious, though I’m not sure who the hardware vendor is these days).
Not as easily. On Windows, their software runs as a kernel-level driver. On Macs, that’s no longer allowed (since 2020). To be fair to Microsoft, they are forced to give that level of access due to a consent agreement with the Eu who thinks it “unfair” that MS should be the only one with kernel-level access.