A long time ago in 2000, computer scientists at Carnegie Mellon created a security tool we’ve all come to know: the CAPTCHA program. Named after Alan Turing’s famous test to measure machine intelligence, CAPTCHA quickly became the web’s first line of defense against bots and spam.
Over its 18-year lifespan, CAPTCHA has taken on many forms, but its main concept has stayed the same. Before a user can do a specific task, like post a comment or make a purchase, they need to verify they’re human by passing a CAPTCHA test.
Related Post: What the Heck Are Bots, Anyway?
Classic CAPTCHA tests show users a visual clue, like warped letters, numbers, or street names, which the user then needs to type into a box. Newer versions are less disruptive; Google’s popular reCAPTCHA simply requires users to click on a box.
In all cases, if the user follows the CAPTCHA test directions correctly, then they can move ahead. If not, they’re barred from continuing further until they can prove they’re not a bot.
In theory, only humans can beat CAPTCHA, because only humans should be able to interpret what the images and test directions actually say. But of course, that’s no longer the case. Thanks to advancements in artificial intelligence, computers can now tackle CAPTCHA tests with near-human precision.
Copying the Brain
Last October, Vicarious, an artificial intelligence software startup, revealed a new system that can successfully solve reCAPTCHAs with 66.6% accuracy using only a handful of examples as a guide. Called the Recursive Cortical Network (RCN), the A.I. algorithm is inspired by the human brain’s ability to process visual cues.
Previous algorithms learned only by analyzing millions of sample CAPTCHA tests. Real people identified each distorted CAPTCHA character for the algorithm, which then, over lots of tests, could recognize letters and numbers as they appeared. Vicarious refers to this method as tabula rasa, the blank slate approach.
But RCN is different. Instead of referencing a catalog of previous examples, RCN solves CAPTCHA tests on a case-by-case basis. It looks at each individual letter as a shape, taking into account its contours and surface. It then uses probability to solve the CAPTCHA based on what it “sees.”
Vicarious found that this learning model could accurately identify distorted letters at a much higher rate than older algorithms, no matter how much the letters were warped. The graph below shows how well the RCN recognized letters despite spacing compared to other deep learning systems.
For visually impaired Internet users, Google’s reCAPTCHA offers an audio option. Users listen to a prerecorded voice that recites a list of numbers, which they then type into a box.
That should make it harder for bots to crack a CAPTCHA, right? Well, not really. Researchers at the University of Maryland created unCaptcha, an A.I. system that exploits free speech-to-text services to solve audio reCAPTCHAs.
The process is pretty straightforward. First, the A.I. downloads the audio reCAPTCHA and breaks it into individual segments. Then, it uploads each segment to different online speech-to-text programs, such as IBM Bluemix, Bing Speech API, and, ironically, Google’s Speech app.
Each speech-to-text service might transcribe the audio differently. For instance, one program might recognize a spoken “one” as “one,” “won,” or “un,” as seen in the flowchart above. UnCaptcha reviews the responses, taking into account how the service interpreted them, and converts them into digits based on phonetic mapping.
The A.I. system then ensembles the responses and makes a weighted guess. Basically, if it determines that three out of four of the speech-to-text results identified the number as “one,” then it concludes that 1 is the correct digit.
This process repeats for each spoken digit. Over the span of 450 tests using real audio CAPTCHAs, the research team discovered that unCaptcha could solve the puzzles with a staggering 85.15% accuracy.
The End of CAPTCHA?
What does this all mean for the future of CAPTCHA? Things definitely don’t look good for the security tool, and if you rely on the service to safeguard your sites, you might be freaking out.
But until CAPTCHA does come to an end, you should still consider keeping the tool, as it does help block less sophisticated attacks. Better to be safe than sorry, at least until the next best thing comes along.