COVID-19!!!
the ops incident with no postmortem, no resolution, and no agreed-upon end time
Haha 😂 pic.twitter.com/oWf0yB1mze
— You Have One Job, Stay Indoors (@_youhadonejob1) March 20, 2020
Most Canadian solution ever. News outlet in Canada is taping their microphones to hockey sticks to maintain social distance...
— You Have One Job, Stay Indoors (@_youhadonejob1) March 19, 2020
The Incident With No Postmortem
COVID-19 was the largest unplanned infrastructure event in modern history, and nobody wrote a proper postmortem. There were reports. There were commissions. There were after-action reviews with the word "unprecedented" in the executive summary. None of them had the five-whys structure, the contributing factors section, the action items with owners and due dates, the part where you write down what you'd do differently.
The incident also never formally closed. There was no "all-clear, returning to normal operations." Different organizations called it at different times based on different criteria. Some people went back to the office in 2021. Some in 2022. Some never did, because the office they'd been going to was no longer there. The incident status is, depending on who you ask, somewhere between "resolved" and "ongoing degraded state."
Zoom Became Critical Infrastructure Without Anyone Approving It
In March 2020 the global workforce discovered, more or less simultaneously, that its ability to function depended on a video calling application that IT had not selected, evaluated, or sanctioned. Zoom was just there. Someone on a sales team had started using it in 2018 because the screensharing worked. Then their team. Then the teams they met with. By the time offices closed, Zoom was load-bearing.
The SSO integration happened while employees were already connecting with personal accounts. The security hardening guidance was distributed to users who had been on the platform for six months and had existing habits. The enterprise contract was negotiated after the usage was already in production. This is shadow IT at civilizational scale.
The broader Zoom stack — and by "Zoom stack" I mean the full constellation of Slack, Google Meet, Teams, Webex, and whatever the product team was using that IT didn't know about — became the connective tissue of organizations overnight. It worked. The surprising thing was not that it mostly worked. The surprising thing was that anyone was surprised it mostly worked, given that the people who had been building distributed systems for a decade had been saying for years that remote collaboration infrastructure was viable.
We're All WFH Now (Some of Us)
The phrase "we're all WFH now" had a shelf life of about eighteen months before it fractured into "some of us are WFH now" and then "there's a policy about this." The permanent-remote and return-to-office conversation consumed enormous organizational energy from 2021 onward. The ops and infrastructure people, who had generally been able to work from anywhere and knew it, watched with mild curiosity as the rest of the company worked through what they'd already worked through in 2015.
For ops specifically, the calculus was simple: response time goes down when you're already at your desk. The question of whether to be in an office was answered by the pager. The pager doesn't care where you are. It cares how fast you answer.
The hockey stick microphone is a reasonable solution to a novel problem. A lot of what happened in March 2020 was that.— the news anchor, improvising, on camera, in real time