Skip to main content

In the world of software development and IT operations, ensuring the stability and reliability of updates before they are deployed across all systems is critical. This principle is especially important for cybersecurity solutions like CrowdStrike Falcon, where updates must be meticulously tested to prevent issues such as the recent Blue Screen of Death (BSOD) incident. This article explores why testing production updates is crucial and outlines best practices for mitigating risks associated with widespread deployment.

On July 19, 2024, an update to CrowdStrike’s Falcon endpoint detection and response (EDR) platform caused a Blue Screen of Death (BSOD) on many Windows computers globally. Most users’ day started with an error screen and inability to access their asset. The issue caused significant disruption across various sectors. Though the root cause of the issue was identified on the same day, a workaround was done by IT professionals who physically booted every Windows machine into safe mode and removed a channel file to get the system to boot normally again. This took anywhere between a few hours for smaller organizations to a couple of days for large organizations.

The Risks of Unverified Updates

When updates are pushed to all machines without adequate testing, several risks arise:

  1. System Instability: Unverified updates can cause crashes, slowdowns, or other instability issues. This is particularly problematic for systems that rely on continuous uptime, such as those in a business or production environment.
  2. Data Loss: Unexpected crashes or system failures due to faulty updates can lead to data loss, especially if there are no recent backups.
  3. Security Vulnerabilities: Inadequate testing might introduce new vulnerabilities or fail to address existing ones, potentially exposing systems to cyber threats.
  4. Operational Disruption: For organizations, widespread issues resulting from faulty updates can lead to significant operational disruptions, impacting productivity and service delivery.
  5. User Frustration: Frequent issues or downtime due to problematic updates can lead to frustration among users, affecting morale and trust in the software.

Best Practices for Testing Updates

To minimize the risks associated with deploying updates, organizations should adhere to the following best practices:

  1. Staging Environment Testing: Before rolling out an update to the entire production environment, test it in a staging environment that mimics the production setup. This helps identify potential issues without affecting live systems.
  2. Phased Rollout: Implement updates in phases or using a gradual rollout strategy. Start with a small group of users or systems and monitor the performance before expanding the deployment to the broader user base.
  3. Automated Testing: Utilize automated testing tools to run regression tests and verify that new updates do not break existing functionalities or introduce new issues.
  4. Beta Testing: Engage a select group of users to test the update in real-world conditions. Collect feedback and monitor for any issues that might not have been caught in earlier testing phases.
  5. Monitoring and Rollback Mechanisms: Implement robust monitoring systems to quickly identify any problems arising from new updates. Ensure that rollback mechanisms are in place to revert to a previous stable version if necessary.
  6. Communication Plan: Clearly communicate with users about upcoming updates, including potential impacts and any required actions on their part. Provide timely updates on any issues and resolutions.
  7. Documentation and Feedback: Maintain thorough documentation of testing procedures and results. Gather feedback from users who experience issues and use this information to refine future updates.

Case Study: CrowdStrike Falcon Update Incident

The recent CrowdStrike Falcon update incident, highlights the critical need for comprehensive testing protocols:

  • Issue Identification: The problem was identified when users began reporting system crashes following the update.
  • Resolution Efforts: CrowdStrike had to quickly develop and release a fix while guiding affected users through temporary measures and rollback procedures.
  • Lessons Learned: The incident underscores the importance of pre-deployment testing and phased rollouts. It also highlights the need for effective communication and support channels to manage and resolve issues promptly.

Testing production updates before they are deployed to all machines is not just a best practice but a necessity for maintaining system stability, data integrity, and operational efficiency. By adopting a rigorous testing approach, including staging environments, phased rollouts, and automated testing, organizations can mitigate risks and ensure a smooth deployment process. Learning from past incidents, such as the CrowdStrike Falcon BSOD issue, can help refine these practices and enhance overall software reliability and user satisfaction.

The involvement of Senior Leadership varies from organization to organization and the industry type. However, in general best practices recommends that since the landscape of Cyber Threat keeps evolving, we need to keep educating our leaders so that they understand the importance of Cybersecurity Awareness and Training in order to train the entire organization so they act as the first line of defense.

A new trend has been observed recently in the Cybersecurity pitch that senior leaders are taking initiative in hiring Cybersecurity consulting firms and vCISO services to address the policy gap that is arising because of the fast-pacing evolution of IT technologies, specifically AI based services in all segments of IT services and tools.

It is recommended that each organization that is onboarding new technologies, IT solutions, applications and tools to service their business needs should have a mandatory cybersecurity awareness program with focus on top-down approach. Cybersecurity should be discussed in every Department meeting to ensure that Cybersecurity is not only IT/Security responsibility but everyone’s responsibility.

University of California, Riverside (UCR) has published a paper that recommends leaders to leverage various leadership styles to an advantage when it comes to combating cybersecurity challenges in their organizations. Some of the leadership they recommended are:

  • Collaborative leaders promote cross-functional communication and cooperation, breaking down silos that may impede the sharing of crucial information. This open communication facilitates a more comprehensive understanding of potential threats and vulnerabilities, enabling a more robust cybersecurity strategy.
  • Transformational leadership In the context of cybersecurity, this style encourages a proactive approach towards identifying and addressing potential threats. Such leaders foster a transformational environment to instill a sense of responsibility and accountability among team members, promoting a collective effort to safeguard sensitive information.
  • Transactional leaders In the cybersecurity context, adhering to established protocols and compliance measures is the priority. Such leaders ensure that team members follow standardized security practices, reducing the likelihood of human error and exploitation of vulnerabilities.
  • Situational Leaders adapt an approach based on the specific challenge at hand, whether it’s a sudden breach or a sophisticated attack, these leaders guide their teams through effective crisis management and response strategies.
  • People-first leaders can contribute to a strong cybersecurity posture by prioritizing the well-being and development of team members. In the context of cybersecurity, this can translate to a workforce that is more vigilant and committed to upholding security best practices.

Apart from these leadership practices to develop a healthy and effective cybersecurity culture, it is important that an effective Cybersecurity program and tool is implemented to educate every employee, contractor and consultant who has access to the organization’s assets at any capacity.

Author

Bhavani Damodaran | Senior Technical Manager, Information Security at GS Lab | GAVS

Bhavani  is a Senior Technical Manager, Information Security at GS Lab | GAVS. She has held numerous positions of responsibility in areas of Information Security such as risk management, IT controls, audits and compliance. Her expertise involves handling IT risks, security control framework designing and assessing digital tools. She is an avid traveler and is passionate about driving.