Thursday, July 09, 2009

reCAPTCHA's Business Model

If you use the web, chances are you've been asked to use a "captcha." A captcha is a way of differentiating between humans and machines by asking users to transcribe garbled text that is unreadable to a computer. Whether it's preventing spam on blogs or verifying website sign-ups, captchas keep malicious programs from sending spam and consuming energy. Captchas have been around for almost a decade and are fairly commonplace, but an organization called reCAPTCHA is pushing the envelope in terms of how data is used and has some strong potential for being a lucrative company.

reCAPTCHA is a project from Carnegie Mellon that offers a standard captcha service for free to any web service. What is innovative about reCAPTCHA is that the service asks for two words to be transcribed before allowing users to proceed. The first word has a known value and is the test, while the second word is displayed so that reCAPTCHA can learn its meaning. If enough users agree on the meaning of the second word to a point of statistical significance, chances are that the meaning of the garbled word has been found.

Here are two examples from reCAPTCHA's website.

Original scanned image:

The computer's translation of the image into text, with unreadable parts highlighted:

reCAPTCHA is currently working with the Internet Archive and the New York Times in an effort to convert books and old papers to text so that they can be preserved, searched, and kept accessible for generations to come. In addition to the altruistic applications of reCAPTCHA's technology and data, it could have a very lucrative business model. There are several companies that are digitizing books, including Google and Amazon. reCAPTCHA could license it's technology to help these companies transcribe books more quickly and accurately. Another potential business could be to license their technology to law firms that have to sift through thousands of pages of written documents to gather evidence and build their case. Using this technology would save them time and reduce labor costs for these firms. reRAPTCHA is a great example of a free service that is generating huge amounts of data and using it in a valuable way.

Update 9/16/2009:
Google has acquired reCAPTCHA and will by applying it's technology to digitize more content. Google is essentially buying time so that they don't have to wait to build out their OCR library. From the Google Blog:
This technology powers large scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we'll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process.


david said...

NOVA just had a segment about the inventor:

Will Hambly said...

Thanks David. That was interesting. Good stuff.

amuthanjrv said...

This article is showing your knowledge in reCAPTCHA's business model.

David Sameth said...

Nice iformation, thanks for sharing 2captcha

markson said...

What's more, PMP accreditation of the PM is required by numerous businesses and they won't contract individuals who don't have PM confirmation.ExcelR PMP Certification

digitaltucr said...

I must admit that your post is really interesting. I have spent a lot of my spare time reading your content. Thank you a lot!ExcelR pmp certification

Priyanka said...

Attend The PMP Certification in Abu Dhabi From ExcelR. Practical PMP Certification in Abu Dhabi Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The PMP Certification in Abu Dhabi.
ExcelR PMP Certification in Abu Dhabi

Priyanka said...

Attend The Data Analytics Course From ExcelR. Practical Data Analytics Course Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Analytics Course.
ExcelR Data Analytics Course

ravali said...

Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
ExcelR data science course in mumbai

tejaswini said...

I have been searching to find a comfort or effective procedure to complete this process and I think this is the most suitable way to do it effectively.

360DigiTMG data science course
360DigiTMG big data analytics course malaysia
360DigiTMG data analytics courses

Abhinavhyd said...

Hey, thanks for this great article I really like this post and I love your blog and also Check Python course Training in 360DIGITMG. Python Training certification program provides an overview of how Python and R programming can be employed in Data Mining of structured (RDBMS) and unstructured (Big Data) data. Comprehend the concepts of Data Preparation, Data Cleansing and Exploratory Data Analysis. Perform Text Mining to enable Customer Sentiment Analysis. Learn Machine learning and developing Machine Learning Algorithms for predictive modeling using Regression Analysis. Assimilate various black-box techniques like Neural Networks, SVM and present your findings with attractive Data Visualization techniques.
360Digitmg Python Training institute

Data Science Course said...

Thanks for giving me the time to share such nice information. Thanks for sharing.

Data Science Course
Data Science Course in Marathahalli

Abhinavhyd said...

Hey, thanks for this great article I really like this post and I love your blog and also Check machine learning course in Hyderabad at 360DIGITMG.
360Digitmg machine learning course in hyderabad