CrowdStrike Outage: Lessons Learned for SAP Solutions
The fallout of the recent worldwide systems outage has far-reaching consequences for cybersecurity. The outage is estimated to impact 8.5 million devices powered by Microsoft Windows operating systems. The cause of the outage is a corrupted update for an agent used for the Falcon security platform from CrowdStrike. Falcon uses a cloud architecture with servers, workstations, containers, virtual machines, and other devices connected directly to CrowdStrike services through an agent installed in each host. The agent operates at the kernel level. The kernel is responsible for managing work processes in operating systems and mediating access to hardware resources.
Operating systems enable applications to run in two modes: user and kernel. Most applications operate in user mode without direct access to the underlying hardware or system resources. Kernel mode is far more privileged and provides applications with unrestricted access to the system including hardware control, memory management, and device drivers. Errors in applications running in user mode are isolated and do not impact the stability of the operating system. However, errors in applications running in kernel mode can crash the operating system. This is exactly what happened with the recent CrowdStrike/ Microsoft outage.
The Falcon agent operates in kernel mode as a device driver. This is most likely because the agent requires privileged access to system data structures to deliver the protection provided by CrowdStrike. Microsoft is well aware of the risk posed by applications running in kernel mode. The Windows Hardware Quality Labs (WHQL) program is intended to test and certify third party device drivers to manage the risk. The driver used by the Falcon agent was WHQL tested and certified. However, security products such as Falcon require continuous updates to counter the latest cyber threats. Since it’s not feasible to recertify the driver for each update, updates are applied through dynamic definition files that can include code executed by the driver. This code is not tested and signed as part of the WHQL program. A software bug in unsigned code packaged in a recent update for the Falcon driver running in kernel mode is the root cause of the large-scale system outage.
There are two obvious questions that arise from the events. The first is why was the software bug not discovered and removed before the update was released by CrowdStrike? This points to concerns around development and release management procedures on the part of the software vendor. Understandably, its not feasible to test software updates against for every possible scenario. For example, past CrowdStrike updates have been known to trigger crashes in the Central Management Console and Central Management Server of SAP BusinessObjects. However, given the widespread impact of the current bug, it’s likely that more comprehensive testing would have revealed the error. It also raises questions around inadequate parameter validation by the Falcon agent that may have detected and blocked errors in arguments passed to kernel functions to prevent system crashes. This points to concerns around software design.
The second question is why didn’t organisations analyze the impact of the updates in test machines or perform a staged rollout? Testing would have most likely revealed the issue and a staged rollout of the update would have lessened the impact even if the update wasn’t tested.
The answer to both questions is that both software vendors and customers are responding to a threat landscape that demands rapid response to new and emerging threats. Therefore, organizations are prioritizing speed of response for information security over preserving the availability of their systems. The outage provides a stark reminder of the dangers of this approach.
Systems outages can be especially severe if they impact business-critical SAP solutions. SAP customers should identify third party agents and programs that operate in kernel mode in SAP hosts. The continued use of such software should be reviewed in light of recent events, especially if the software is automatically updated by the vendor without any input from the customer.
The Cybersecurity Extension for SAP protects SAP solutions from advanced persistent threats without the use of kernel-level agents or programs. The solution operates in user mode to monitor and secure the application, database and operating system layers in SAP hosts.