Windows Server Domain Controllers Hit by LSASS Reboot Loops
- [01] Immediate impact: Domain Controllers face LSASS crashes and reboot loops, leading to authentication failures and significant enterprise service disruption.
- [02] Affected systems: Critical infrastructure running Windows Server 2022, 2019, 2016, and 2012 R2 is impacted by April 2024 security updates.
- [03] Remediation: Administrators should pause update deployment or uninstall the relevant KB packages if domain stability is compromised during the rollout.
Overview
Microsoft has issued a warning regarding a significant stability issue affecting Windows Server environments where Domain Controllers (DCs) enter persistent reboot loops. This behavior follows the application of the April 2024 cumulative security updates across multiple versions of the operating system. according to BleepingComputer, the instability is caused by the Local Security Authority Subsystem Service (lsass.exe) crashing unexpectedly. Because LSASS is a critical system process responsible for security policy enforcement and user authentication, its termination forces the Windows operating system to initiate an immediate restart.
Technical Analysis of LSASS Instability
The failures are primarily characterized by a memory leak within the LSASS process. When the memory consumption reaches a critical threshold or the service encounters an unhandled exception, it terminates, resulting in a system crash. Security administrators monitoring their environments have reported seeing Event ID 1000 and Event ID 1001 in the Windows Event Viewer, often referencing an access violation (exception code 0xc0000005) within the lsass.exe executable.
Windows Server Domain Controller LSASS memory leak
The Windows Server Domain Controller LSASS memory leak is particularly problematic for high-traffic environments. As the service handles authentication requests, the memory footprint expands without being properly released, eventually exhausting available system resources. This issue is not limited to a single version of the OS but spans several generations of Windows Server. The specific updates identified as the source of this instability include:
- Windows Server 2022: KB5036909
- Windows Server 2019: KB5036910
- Windows Server 2016: KB5036911
- Windows Server 2012 R2: KB5036912
While these updates were intended to address several CVE entries and enhance the security posture of the domain, the unintended side effect on availability poses a significant risk to business continuity. If a primary DC fails, users may experience Phishing alerts being ignored or legitimate authentication attempts being blocked, as the SOC and EDR solutions may lose telemetry from the affected identity providers.
Operational Impact and Risks
When a Domain Controller enters a reboot loop, the impact is immediate and widespread. Authentication services, including Kerberos and NTLM, become unavailable, preventing users from logging into their workstations or accessing network resources. This disruption can also impact automated service accounts, potentially breaking Supply Chain Attack defenses or backup routines that rely on domain credentials.
Furthermore, the instability makes the environment more susceptible to Lateral Movement. If security monitoring tools are unable to communicate with the identity provider, suspicious authentication patterns may go undetected during the window of instability. This is why the April 2024 security update reboot loop is considered a high-severity operational issue even if it is not a direct security vulnerability being exploited by an APT.
Remediation and Recovery
For organizations already experiencing these symptoms, the primary recovery path involves the removal of the problematic update. Microsoft has acknowledged the issue and is working on a permanent resolution, but in the interim, administrators should consider the following steps for LSASS crash Windows Server 2022 troubleshooting and mitigation:
- Identify the Failure: Review Event Logs for LSASS-related crashes (Event ID 1000/1001) to confirm the source of the reboot.
- Rollback Updates: If the environment is unstable, use the
wusa /uninstall /kb:[PackageNumber]command to remove the offending update from impacted Domain Controllers. - Isolation: If one DC remains stable while others fail, consider shifting primary authentication roles to the stable node while updates are paused on the remainder of the fleet.
- Monitoring: Increase the frequency of memory usage monitoring for the
lsass.exeprocess via Performance Monitor to catch leaks before they result in a system crash.
Organizations should validate these updates in a non-production staging environment before proceeding with a broad deployment to avoid widespread infrastructure downtime.
Advertisement