CrowdStrike, a leading cyber security firm, experienced a significant outage in July 2024 that impacted a wide range of organisations relying on its services globally. This event provided critical lessons for organisations emphasizing the importance of robust contingency planning measures.

Key Lessons:

1. Incident Response Planning

The incident highlighted the importance of well-defined incident response plans, including clearly documented and tested protocols across people, technology and processes.

2. Regular Scenario based Testing

The need to identify specific scenarios custom to the organisation which can cause widespread business impact, emphasizing the need for a well-defined and supported resiliency function.

3. Redundancy, Failover and Scalability Mechanisms

The incident highlighted the need for effective redundancy and failover mechanisms for critical systems, manual workarounds across every layer as well as the importance to identify and regularly review interdependencies.

4. Third Party Risk Management

This incident was a clear reminder on the importance of a thorough third party risk management.

5. Service Level Agreements (SLAs)

The incident has emphasized the need for clearly defined SLAs that outline performance metrics.

6. Focussed Communication

The incident has clearly highlighted the importance of effective communication planning and execution for organisations in every sector.

7. Continuous Monitoring and Alerts

The incident highlighted the importance of implementing processes for continuous monitoring of systems supporting critical business operations.

8. Complexity of environments

The incident highlighted the criticality of continuity planning focussed on operational integrity of complex environments.

9. Investments in Resourcing

The outage underscored the need for robust local resourcing as well as customer support to handle increased queries expected during incidents.

10. Mental Health Support

It is important to make sure employees supporting the recovery from such incidents are provided with the right mental health support considering the long and stressful nature of recovery.

Key Insights:

1. Holistic Enterprise Risk Management

It is important to regularly update risk assessments based upon applicable emerging threats and develop mitigation strategies for a wide range of custom applicable scenarios.

2. Incident Response Planning

Maintaining up-to-date BCP, DRP and Cyber Incident Response plans that address various scenarios, ensures the organisation can continue operating during disruptions.

3. Resilience Strategy

Driving design of infrastructure and systems with resilience in mind should be a key part of resiliency strategy.

The strategy should also guide safely testing security updates and patches in test environments before rollout to production systems.

4. Resource Management

Allocating sufficient resources, including budget and personnel, to initiatives focussed on maintaining operational resiliency ensures they are effectively implemented and maintained. This also includes initiatives to manage mental health.

5. Culture

To support proactive resilience, it is important to promote a security-first and resiliency by design culture that provides incentives for proactive engagement in safeguarding information & systems.

Conclusion

Critical incidents such as the CrowdStrike outage are the unavoidable reality that organisations across every sector and every geography face today.

By focussing efforts on learning from this incident, organisations can enhance their preparedness and ability to withstand disruptions and ensure business continuity of their critical operations.

Please click here for the full report: https://www.rsm.global/australia/report/key-lessons-and-insights-crowdstrike-outage

7/10/2024
> Back to News