Simon Says | Guessing the LLM's Password

Guessing the LLM's Password

First published: 24th February 2026

Many people nowadays turn to LLMs for advice on all aspects of life. What happens when you ask an LLM to help you choose a strong password?

Here are my first two questions to Claude:

Here is ChatGPT:

And here is Gemini:

Oh no.

This should not be surprising

Transformers are terrible randomness generators.

To some approximation, they pick the least random password. A transformer model with zero temperature would suggest the same password every time. Chat models aren't on zero temperature, which is why we see slight variations above.

Unless otherwise specified, I'm using chat mode, not API, of the default free models. That means Claude is Sonnet 4.6, GPT is 5.2, and Gemini is 3 in Fast mode. I did try a few other models, but the sort of person who would ask an LLM for a password and then use it is probably going to be using the default chat settings.

Let's ask each model for 5 passwords at once

Here's Claude:

The first password looks a lot like the ones we got when asking for one password.

Here is GPT:

Again, the first password looks a lot like the ones we got when asking for one password.

Here is Gemini:

Gemini gives us a lecture about password styles. Points to Gemini!

Password Reuse

I asked all three of them 100 times for a single password, and collected the results. Claude was the only one to suggest the same password twice. Both of these passwords were suggested twice in 100 conversations:

Kx9#mP2$vL7@nQ4!
Kx9#mP2$vLqR7@nJ

After taking more samples, Claude picked `Kx9#mP2$vL7@nQ4!` 124 out of 1891 times - over 6%! I searched this exact string and it turned up several corporate blog posts giving password advice, and several pieces of documentation related to password generators. The good news is - no hits on haveibeenpwned .

Password Advice

Gemini is a clear winner here. Almost every time, it gave a lecture about best practice for creating passwords. Maybe it has received RLHF for this? Even when it gives an example random password, it usually adds a warning like the following:

Gemini was also the only one to ever completely refuse to make a password. For example: "I can't create a password *for you to use* because if I generate it, then I know it, which defeats the purpose of a secure password that only *you* know. (then goes on to talk about the merits of various methods of creating passwords)" In contrast, Claude seems quite happy for us to actually use the password provided. An example: "Here's a strong password: **`Tr!v3x#Qm9@Lw2$`** Some tips to keep it secure: - **Don't reuse it** across multiple sites... (the list of tips does not suggest not using it at all)"

Entropy

GPT likes to make passwords that follow the pattern of uppercase, digit, symbol, lowercase. Following this pattern rigidly is strictly worse than just using lowercase letters all the time (because the set of characters is less than 26 for digits and symbols). I didn't notice this pattern until GPT pointed it out to me itself:

For GPT and Claude, the most common symbols were !@#$ and the most common set of numbers was 7492. GPT would often use this exact order.

I made a simple Markov model with this as a prior, and estimated the transition probabilities between these groups separately, then tweaked the transition probabilities for each individual character. Even this doesn't capture all the information we know about the actual distribution, because if the first four characters have been upper, lower, digit, symbol, it is likely that the next four will be upper, lower, digit, symbol, but if the first four are upper, digit, lower, symbol, then the next four are more likely to also be upper, digit, lower, symbol. This can't be captured by a Markov process because they are memory-less. If we dive too deeply into general next-character prediction models, we might end up... uhm, implementing an LLM. So this is where I stopped.

Model	Entropy assuming independent characters	Entropy assuming Markov process
GPT	103.54	69.31
Gemini	78.27	55.53
Claude	74.21	47.24

Despite having the worst entropy, Claude was the most likely to make claims that the passwords are "fully random", "truly random," or "secure randomly-generated". Claude also likes to point out how many characters are in the password.

Other models

As I said in the introduction, I was using the default settings of the free chat interface for most of this work. But I was also curious about other models.

Claude Opus 4.6 actually runs a bash command to get cryptographically secure randomness. So does Haiku 4.5:

I wonder which models have received training specifically for this.

_ Likes Like