Workers outsource AI training tasks to AI, study finds

MASSACHUSETTS, UNITED STATES — A new study reveals a substantial portion of workers paid to train artificial intelligence (AI) models might be delegating this task to AI systems themselves, potentially contributing to the inaccuracy of these models.
Gig workers usually perform tasks that are challenging to automate such as labeling data, annotating text, and solving CAPTCHAs. The data they process is used to train AI systems.
Often underpaid, these workers are believed to be utilizing tools like OpenAI’s ChatGPT to do their work and earn more within their limited time.
Researchers from the Swiss Federal Institute of Technology (EPFL) estimated that between 33% to 46% of the 44 gig workers they employed to summarize medical research papers utilized AI tools like ChatGPT. They fear this figure may surge with the increasing advancement and accessibility of AI systems.
Robert West, an assistant professor at EPFL and co-author of the study, commented, “I don’t think it’s the end of crowdsourcing platforms. It just changes the dynamics.”
However, this trend could introduce additional errors into these AI models given that language models like ChatGPT occasionally present false information as fact.
Ilia Shumailov, a junior research fellow at Oxford University, warned that if incorrect output is utilized to train other AI models, these errors can be amplified over time, making their origins harder to trace.
“The problem is, when you’re using artificial data, you acquire the errors from the misunderstandings of the models and statistical errors,” Shumailov explained. “You need to make sure that your errors are not biasing the output of other models, and there’s no simple way to do that.”
The study emphasized the need for better methods to verify whether data used for training AI models is produced by humans or AI. It also raised the potential risks of tech companies excessively depending on gig workers for data preparation tasks.