Entropy Deficiencies in LLM-Generated Passwords
Overview of LLM Pseudorandomness Failures
As Large Language Models (LLMs) are increasingly integrated into daily workflows, users frequently rely on these systems for tasks that require high degrees of randomness, such as generating secure passwords or cryptographic keys. However, recent analysis suggests that LLMs are fundamentally incapable of producing high-entropy, unpredictable strings. According to Schneier on Security, research conducted by Irregular demonstrates that LLMs like Claude generate passwords following highly predictable patterns that significantly simplify brute-force attempts for a motivated attacker.
Technical Analysis of Generated Patterns
The research identified three primary areas where LLM-generated passwords fail to meet security standards: starting character bias, uneven character distribution, and the absence of internal repetition. These traits are not the result of software bugs but are inherent to how LLMs predict tokens based on human-centric training data and reinforcement learning.
Character Positioning and Biased Prefixes
In a sample size of 50 passwords generated by the model, every single entry began with a letter. More specifically, a vast majority of these started with the uppercase letter ‘G’, followed immediately by the digit ‘7’. This consistency suggests a limited internal ‘state space’ for what the model considers a ‘strong’ password. For an attacker, knowing that a password likely starts with ‘G7’ reduces the search space of an 8-to-12-character password by several orders of magnitude.
Non-Uniform Distribution of Entropy
A truly random password generator should utilize the entire available character set (lowercase, uppercase, digits, and symbols) with roughly equal probability. The Irregular research found that Claude’s choices were extremely uneven. Specific characters such as ‘L’, ‘9’, ‘m’, ‘2’, ’$’, and ’#’ appeared in every single password generated during the test, while common characters like ‘5’ and ’@’ were almost entirely absent. Most characters in the alphabet were never selected at all. This lack of distribution means that while the password may look complex to a human, it lacks the mathematical entropy required to resist automated dictionary attacks.
The ‘Appearance’ of Randomness vs. Mathematical Reality
One of the most concerning findings is that the LLM deliberately avoids repeating characters. In a truly random 12-character string, the probability of repeating at least one character is statistically high. However, the model consistently avoided repetition, likely because its training data or alignment (RLHF) suggests that non-repeating strings ‘look’ more random to humans. By avoiding repetition, the model further restricts the possible permutations of the password, making the resulting string easier to guess through targeted algorithmic modeling.
Implications for Threat Modeling
From a threat intelligence perspective, this represents a shift in how password-spraying and brute-force tools may be developed. If an attacker knows a target uses a specific AI assistant for password generation, they can optimize their toolsets to prioritize:
- Passwords starting with ‘G7’ or similar common AI prefixes.
- Character sets limited to the ‘preferred’ tokens identified in this research.
- Logic that excludes any strings with repeating characters.
This creates a ‘Vibe-based’ security failure where the user feels secure because the password is long and complex, while the actual entropy is low enough to be trivial for modern hardware to crack.
Actionable Recommendations
For Organizations
- Policy Enforcement: Update internal security policies to explicitly prohibit the use of AI chatbots or LLMs for generating passwords, API keys, or secrets.
- Technical Controls: Implement password complexity requirements that mandate the use of centralized, audited password managers with built-in Cryptographically Secure Pseudo-Random Number Generators (CSPRNG).
- Monitoring: Monitor for ‘G7’ and other identified LLM-specific prefixes within internal credential databases or during password resets to identify users relying on AI generation.
For Security Professionals
- Verification: When auditing credentials, utilize research-backed wordlists that include known AI-generated patterns.
- Education: Inform stakeholders that LLMs are predictive engines, not random number generators. Their goal is to produce the most ‘likely’ next token, which is the antithesis of the randomness required for secure authentication.
Advertisement