Yes, you too are training ΑΙ algorithms!
by Georgia Paraskaki
Remember these CAPTCHA tests you have to pass to access some websites? Aren’t they suspiciously difficult? It is not your impression, they are indeed difficult!
Can you recall the last time you wanted to access a website and, before accessing it, you were asked to identify which pictures show a bike or to read a rather distorted piece of text? Did the pictures seem blurry and the text difficult to understand? Perhaps you were given 5 letters or numbers and only two of them were actually easy to identify. The rest of them, you most likely had to guess. But don’t feel despair! You are not bad at this because the truth is it is really a hard task to complete!
CAPTCHA (Completely Automated Public Turing Test To Tell Computers and Humans Apart) tests exist to help websites identify whether a bot or a human is trying to access it and, consequently, prevent frauds and automated attacks by bots*. So why are these tasks so hard? Well, the answer is a bit more complicated. While to some extent you are expected to identify some parts of the text, at the same time you are actually…annotating data for artificial intelligence* (AI) algorithms!
What does this mean? Imagine that the distorted text was once a part of a book that is destroyed and nobody can easily read it or make sense out of it. In this case, AI could give an answer and decode the distorted text. A single person cannot train this algorithm. So where do we find the annotators*? Guess what, while solving CAPTCHA you’re actually annotating data to feed an AI algorithm! CAPTCHA is actually based on unsolved AI tasks, ensuring bots cannot solve these tasks. At the same time, while the users are annotating the data, we are working towards solving this AI task until…it’s solved! This leads to a win-win situation: either these AI problems are solved when sufficient data is gathered or, they remain unsolved and they serve as a method to differentiate humans from computers.
Data is very crucial to AI. Many algorithms exist but if you don’t have sufficient data, you can’t properly train the algorithm. Think of a baby that experiences cats or dogs for the first time! Now you show a Dobermann dog (Picture 1) and then you tell the baby: “Hey, that’s a dog”. The next day you bring in a Persian cat (Picture 2) – this time you tell the baby this is a cat. The next day you bring in a Papillon dog (Picture 3). Would the baby know at this point that this is a dog? The size and fur of a Papillon dog is closer to the ones of the Persian cat shown earlier, so probably not! The baby needs more samples to understand the features of a dog (nose, eyes, tail etc.) and make a good model to classify what they see as a dog. In a similar way, AI needs a lot of annotated data to be able to crack complicated tasks!
So, let’s now get this story straight! The websites agree on using CAPTCHA because it protects them from bots, and at the same time, the data are used to train artificial intelligence algorithms and solve long-standing problems. The question is, would you have agreed if you knew about it? Are you proud of contributing to advancing the progress of AI and solving advanced and complex problems or are you scared that you might have been helping without knowing? It is tough to decide, isn’t it? With technology advancing rapidly, the answer to the question remains unclear and most likely we as users would feel a bit of both.
* Lexicon: bots: software applications that run automated tasks over the Internet, annotators: people that annotate (label) data, artificial intelligence: intelligence demonstrated by machines that are trying to mimic human intelligence by performing tasks
Find all our past articles chronologically organized in our archive.
Do you have ideas, questions, comments or special requests?
Would you like to highlight your research project or nominate a researcher that you would like to learn more about?
Please write to our email or fill out the form and hit “send”. We will be happy to talk with you!
[contact-form-7 id=”44″ title=”Contact form 1″]