Ten reflections on the Microsoft Crowd Strike Incident July 2024

Guest blog by Mark Brett, Socitm Honorary Life Member / Pro bono Cyber security and resilience advisor

The Microsoft Crowd Strike Incident (MSCSI) is an opportunity to think through ten key points [summarised below].

Whilst this wasn’t a “Cyber Attack”, or even a “Cyber Incident”, the outcomes and effects were of the same magnitude.

Conflation

In recent times, local government authorities have needed to respond to a conflation of cyber and information security, assurance and governance. Now, the lines between ICT and cyber security have further converged with the emergence of Digital, Data Analytics and Technology (DDaT).

The continued transformation of local authorities requires an understanding of the emergence of new terms and their meanings. The question here is: are all the supporting processes and procedures being updated in line with these new approaches?

Information and security classifications

Ten years ago, everything was moving to the cloud, going agile and transforming the way we consumed systems and services. That led to a change in the Government Protective Marking Scheme (Information Asset Classification) with the introduction of the streamlined “Official” classification.

The levels of classification were reduced from six (Unclassified, Protected, Restricted, Confidential, Secret and Top Secret to four (Unclassified, Official, Secret and Top Secret).

The threat has increased, the granularity of protections and threat profiles have reduced. Losing Restricted Confidential has caused a lot of issues, especially in policing.

Cloud computing is now pervasive, as is working remotely and more recently the emergence of Artificial Intelligence (AI).

One of the key business objectives within DDaT remains the protection of information. The Data Protection Act hasn’t gone away. Criminals, whether acting on their own or as a proxy for a maligned foreign state, are causing harm through their continuous activities to disrupt systems and services through ransomware and other attacks. This in turn has shortened the attack times from months to hours, necessitating a rapid and automated response.

Rapid and automated responses

We’ve put technologies in place to act as our advocate, taking rules-based decisions to respond automatically and mitigate much of the harm caused to stop the spread of these incidents in our networks.

This has meant we’ve learned to trust the technologies that defend us.

As we move to the next stage of software defined ‘Zero Trust’ networks, the trust in our networks decreases to zero while our trust in the protective systems we put in place rises exponentially.

This brings us back to the MSCSI in July 2024.

What went wrong?

The MSCSI was a failed automated patch deployment.

Yet, we have a regime in place with the PSN Code of Connection (PSN CoCo), Cyber Essentials (CE) and the emergent Cyber Assessment Framework (CAF) telling us to patch immediately or as quickly as possible.

Thirty years ago, we would have thoroughly tested any update patch and set a “Checkpoint” for regression. We would have tested the failure (regression testing).
Twenty years ago, we would have extensively tested patches in a test environment.
Ten years ago, we would have shortened the process to testing in a staging environment.
Today with the automated world of Security Orchestration and Automated Response (SOAR), supporting “Continuous Integration”, we have moved to deploying patches automatically and without a “Human in the loop”.

Imagine the outcry if some AI tool had done this for us? It will come.

The point here is that we need to think about what we are doing and the ever-increasing speed in which we are travelling.

This growing conflation of forces requires our attention and vigilance.

My ten key points for consideration

Assurance
Where is the assurance process that considers the impact of a failed patch being deployed at scale?
Checkpoint
As part of resilience planning, a checkpoint image should be planned for before automated rollout, to enable regression.
Red Teaming
This isn’t routinely carried out before new systems and services are deployed to understand the weaknesses and the attack vectors (Red Team) and the defensive errors, failed patching, same outcome service loss (Blue team defence) and the combination of the two (Purple teaming).
Risk management and risk analysis
Do we scrutinise our suppliers enough? What happens if something goes wrong? What remediation will you have in place? How will you protect users from service failures? (This is why there is such a price difference between 99% availability and 99.999%)
Resilience planning
Do we even plan and exercise for a failure of service availability? The MSCSI was just a patch deployment gone wrong.
Reliability
Do we understand the scope of our systems and services? I buy MS365 or G-Suite and so on. These services are themselves made up of many interconnected components.
Choice
Do we even have a choice anymore with Software as a Service (SaaS)? Is it just take it or leave it?
Automation
This is before automation and AI takes over even more decisions through introduction of SOAR. We will need a holistic approach to Zero Trust, Cloud integration and SOAR.
LACES
We need to understand the Local Authority Cyber Eco-System (LACES©) framework, considering governance, assurance, processes, resilience, data sharing and knowledge transfer.
Break glass
We need to plan, quantify trigger points, risk assess and ensure that we have ‘break glass’ policies in place. The assurance process should identify mitigations, through playbooks, which in turn should be exercised, even if it’s only through a Teams call, that’s brainstormed and documented.

Conclusion

The MSCSI was a failed patch situation that highlights the inter-connectivity of cloud and the speed at which things can fail with global consequences. The greater risk is that adversaries out there will be analysing and learning from what happened and we must too.

Some suggested further reading and resources

Join your local WARP and subscribe to updates and events from the Cyber Technical Advisory Group

National Cyber Security Centre
Statement on major IT outage
“If you have knowledge, let others light their candles in it.” Why sharing lessons learned from cyber security incidents and ‘near misses’ will help everyone to improve

Socitm content in the Resource Hub
Blog: Not the Yule log we were expecting
Briefing: Cyber security skills in the public sector
Connected Places: Community Resilience [members, please log in]
Public Sector Digital Trends 2024: Technology trends

How to be an Effective Coach

How to be an Effective Coach

How to be an Effective Coach