A Brief History of reCAPTCHA

4 min readDec 27, 2021

Source: https://github.com/google/recaptcha/issues/286. I did not want to violate the Medium.com terms of service, but it is pretty obvious what most of these terms are and I have provided a link to the uncensored source.

reCAPTCHA is an excellent example of dual-purpose technology, a term I made up just now to describe software with two major functions. In the first version of reCAPTCHA, users were asked to ID a distorted bit of text to prove they were not spam bots — this was reCAPTCHA’s main purpose. Simultaneously, users were actually helping to transcribe unknown words. According to Tom Scott, if maybe a dozen people agreed on the unknown word, then this became the accepted transcription and helped to digitize old books and newspapers.

The original design is described pretty well here, in an episode of “How I Built This.” At this point, Google bought the reCAPTCHA technology and many of the original creators moved on to Duolingo.

The entire history and overview provided in about six minutes

An Overview

reCAPTCHA came out a few years after CAPTCHA, or Completely Automated Public Turing Test to tell Computers and Humans Apart. Why they chose this contrived acronym is beyond me…why not call it Completely Automated Public Turing Assignment Identifying Non-humans? That would have spelled CAPTAIN. Oh well…

The first version of reCAPTCHA was text-based, and doubled as a transcription tool. Even before AI and machine learning, spammers could still create bots that succeeded some of the time, and also employed people to solve reCAPTCHA challenges. reCAPTCHA version 2 was something of a black box, but Google developers designed it to assign trust based on cookies — if the user seemed suspect, or if there was not sufficient information, he/she was then asked to identify pictures. For example, maybe he/she would be asked to determine which pictures were of fire hydrants.

AI and machine learning were able to successfully solve these picture challenges, so reCAPTCHA version 3 came out at the end of 2019.

Not everyone loves it.

The Cybersecurity Connection

This is a scientific paper that appeared on BlackHat; the writers were able to solve 70% of reCAPTCHA challenges (version 2, presumably). They outline their tools and methods in detail, including the use of neural networks, deep learning, and, poetically, Google Reverse Image Search.

We are able to create over 63,000 cookies in a single day without triggering any mechanisms or getting blocked, and are only limited by the physical capabilities of the machine. This indicates that there is no mechanism to prohibit the creation of cookies from a single IP address. The only restriction we detected was triggered by a massive number of concurrent requests (i.e., for detecting DoS attacks). The lack of a safeguard can be justified by the fact that creating cookies at a large scale has not been required by attacks before. Indeed, we present a novel misuse of tracking cookies, which makes them a valuable commodity for fraudsters.
— Suphannee Sivakorn, Jason Polakis, and Angelos D. Keromytis

Privacy Concerns

This is an article by FastCompany outlining potential privacy concerns. From the article:

…But there’s the trade-off. “It makes sense and makes it more user-friendly, but it also gives Google more data,” he says. Google would not clarify what it does with the data it captures about user behavior via reCaptcha, only that it is used for improving reCaptcha and general security purposes.

The FastCompany article is skeptical in tone — it rejects Google’s claims that the data will be used responsibly, and seems to share concerns that the company, in general, simply has too much power over user data.

Closing Thoughts

I added a honeypot to our Kiwanis website, and anyone who knows cybersecurity would probably tell me that this is not a complete solution to the problem. This seemed to mitigate our spam problem quite a bit, but occasionally we will get something like this:

Yes, I literally added a question asking them whether or not they are a bot. No one else seems to do this, and I cannot imagine a detail like this going very well in a job interview…but why not? I have yet to meet a bot intelligent enough to fill in “no” for this field…I think.

Terms such as “honeypot” and “reCAPTCHA” may be outside of the common vernacular, but unfortunately everyone in the world is probably familiar with spam. At its core is the “arms race” Tom Scott described, which explains how distorted text turned to pictures, and then to an invisible program running in the background that may be superior, and may or may not present problematic implications.

Criminals will continue to enhance their technology, Google will respond in kind, and occasionally cybersecurity researchers will publish impressive papers on how they bypassed security systems such as this.

A Brief History of reCAPTCHA

An Overview

The Cybersecurity Connection

Privacy Concerns

Closing Thoughts

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Evan SooHoo

No responses yet