Passwords are ubiquitous in computer security. All too often, they are also ineffective. A good password has to be both easy to remember and hard to guess, but in practice, people seem to go for the former over the latter. Names of wives, husbands and children are popular. Some take simplicity to extremes: One former deputy editor of the Economist used "z" for many years. And when hackers stole 32 million passwords from a social-gaming website called RockYou, it emerged that 1.1 per cent of the site's users -- 365,000 people -- had opted either for "123456" or for "12345."
That predictability lets security researchers (and hackers) create dictionaries that list common passwords, a boon to those seeking to break in. But although researchers know passwords are insecure, working out just how insecure has been difficult. Many studies have only small samples to work on -- a few thousand passwords at most. Hacked websites such as RockYou have provided longer lists, but there are ethical problems with using hacked information, and its availability is unpredictable.
However, a paper to be presented at a security conference held under the auspices of the Institute of Electrical and Electronics Engineers, a New York-based professional body, in May, sheds some light. With the co-operation of Yahoo, a large Internet company, Joseph Bonneau of Cambridge University obtained the biggest sample to date -- 70 million passwords that, though anonymized, came with useful demographic data about their owners.
Bonneau found some intriguing variations. Older users had better passwords than young ones. (So much for the tech-savviness of youth.) People whose preferred language was Korean or German chose the most secure passwords; those who spoke Indonesian the least. Passwords designed to hide sensitive information, such as credit-card numbers, were only slightly more secure than those protecting less important things, like access to games. "Nag screens" that told users they had chosen a weak password made virtually no difference. And users whose accounts had been hacked in the past did not make dramatically more secure choices than those who had never been hacked.
But it is the broader analysis of the sample that is of most interest to security researchers. For, despite their differences, the 70 million users were still predictable enough that a generic password dictionary was effective against both the entire sample and any demographically organized slice of it. Bonneau is blunt: "An attacker who can manage 10 guesses per account... will compromise around one per cent of accounts." And that, from the hacker's point of view, is a worthwhile outcome.
One obvious answer would be for sites to limit the number of guesses that can be made before access is blocked, as cash machines do. Yet whereas the biggest sites, such as Google and Microsoft, do take such measures (and more), many do not. A sample of 150 big websites examined in 2010 by Bonneau and his colleague Soren Preibusch found 126 made no attempt to limit guessing.
How this state of affairs arose is obscure. For some sites, laxity may be rational, since their passwords are not protecting anything particularly valuable, such as credit card details. But password laxity imposes costs even on sites with good security, since people often use the same password for several different places.
One suggestion is lax password security is a cultural remnant of the Internet's innocent youth -- an academic research network has few reasons to worry about hackers. Another possibility is because many sites begin as cash-strapped startups, for which implementing extra password security would take up valuable programming time, they skimp on it at the beginning then never bother to change. But whatever the reason, it behooves those unwilling to wait for websites to get their acts together to consider the alternatives to traditional passwords.
One such is multi-word passwords called pass-phrases. Using several words instead of one means an attacker has to guess more letters, which creates more security -- but only if the phrase chosen is not one likely to turn up, through familiar usage, in a dictionary of phrases. Which, of course, it often is.
Bonneau and his colleague Ekaterina Shutova have analyzed a real-world pass-phrase system employed by Amazon, an online retailer that allowed its American users to employ pass-phrases between October 2009 and February 2012. They found that, although pass-phrases do offer better security than passwords, they are not as good as had been hoped. A phrase of four or five randomly chosen words is fairly secure. But remembering several such phrases is no easier than remembering several randomly chosen passwords. Once again, the need for memorability is a boon to attackers. By scraping the Internet for lists of things such as film titles, sporting phrases and slang, Bonneau and Shutova were able to construct a 20,656-word dictionary that unlocked 1.13 per cent of the accounts in Amazon's database.
The researchers also suspected even those who do not use famous phrases would still prefer patterns found in natural language over true randomness. So they compared their collection of pass-phrases with two-word phrases extracted at random from the British National Corpus (a 100 million-word sample of English maintained by Oxford University Press), and from the Google NGram Corpus (harvested from the Internet by that firm's web-crawlers). Sure enough, they found considerable overlap between structures common in ordinary English and the phrases chosen by Amazon's users. Some 13 per cent of the adjective-noun constructions ("beautiful woman") that the researchers tried were on the money, as were five per cent of adverb-verb mixes ("probably keep").
One way around that is to combine the ideas of a password and a pass-phrase into a so-called mnemonic password. This is a string of apparent gibberish that is not actually too hard to remember. It can be formed, for example, by using the first letter of each word in a phrase, varying upper and lower case, and substituting some symbols for others -- "8" for "B," for instance. Even mnemonic passwords, however, are not invulnerable. A study published in 2006 cracked four per cent of the mnemonics in a sample using a dictionary based on song lyrics, film titles and the like.
The upshot is that there is probably no right answer. All security is irritating (ask anyone who flies regularly), and there is a constant tension between people's desire to be safe and their desire for things to be simple. While that tension persists, the hacker will always get through.