AI surpasses humans, costs soar amid ethical concerns: Stanford report

CALIFORNIA, UNITED STATES — Artificial intelligence systems have become so advanced that they match or exceed human performance on many tasks like reading comprehension, image classification, and complex mathematics.
According to the new AI Index Report 2024 from Stanford University, this meteoric progress means common AI benchmarks are quickly becoming outdated.
Assessing complex reasoning abilities crucial
The report highlights the need for new ways to evaluate AI systems on complex tasks involving abstraction and reasoning.
One recent test, the Graduate-Level Google-Proof Q&A Benchmark, was extremely challenging for humans – even PhD scholars scored just 65% on questions in their field.
GPQA is a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.
As of last year, AI scored 30-40%, but the latest Anthropic chatbot Claude 3 now reaches 60%.
Additionally, the report highlights the increasing use of AI in science, like Google DeepMind’s projects for material discovery and weather forecasting.
However, as of 2023, there are still some task categories where AI fails to exceed human ability.
“These tend to be more complex cognitive tasks, such as visual commonsense reasoning and advanced-level mathematical problem-solving (competition-level math problems),” the report said.
Soaring costs and ethical concerns
As AI capabilities skyrocket, so do training costs, with GPT-4 costing $78 million and Gemini Ultra $191 million. Moreover, the lack of standardized assessments for AI’s responsible use makes it difficult to measure and compare the risks associated with different systems.
“Leading developers, including OpenAI, Google, and Anthropic, primarily test their models against different responsible AI benchmarks. This practice complicates efforts to systematically compare the risks and limitations of top AI models,” the report explained.
The report also notes a sharp rise in AI regulation in the United States, from just one AI-related regulation in 2016 to 25 last year, amid growing public nervousness.