GPT-4 beats human analysts in financial forecasting: study

ILLINOIS, UNITED STATES — New research from the University of Chicago has revealed that artificial intelligence can outperform human experts in analyzing corporate financial statements and predicting future earnings.
The findings are published in the working paper, “Financial Statement Analysis with Large Language Models,” which focused on the capabilities of GPT-4, a powerful language model developed by OpenAI.
Emulating analyst reasoning with AI
The study highlights a novel approach where GPT-4 uses “chain-of-thought” prompts to mimic the cognitive process of a financial analyst. This method enables the AI to dissect balance sheets and income statements—absent of any descriptive text—to forecast company performance.
“We find that the prediction accuracy of the LLM is on par with the performance of a narrowly trained state-of-the-art ML model,” the paper states.
“LLM prediction does not stem from its training memory. Instead, we find that the LLM generates useful narrative insights about a company’s future performance.”
This technique has proven effective, with GPT-4 achieving a 60% accuracy rate in predicting earnings growth direction, surpassing the 53-57% accuracy typical among human analysts.
The researchers also provided GPT-4 with only standardized, anonymized balance sheets and income statements, devoid of any textual context. Despite this lack of supplemental information,
GPT-4 was able to analyze the raw financial data and predict the direction of future earnings more accurately than professional financial analysts.
Paving the way for AI in financial decision-making
This study not only demonstrates the potential of large language models in financial analysis but also hints at future applications in decision-making processes.
Alex Kim, a co-author of the study, noted, “One of the most challenging domains for a language model is the numerical domain, where the model needs to carry out computations, perform human-like interpretations, and make complex judgments.”
“While LLMs are effective at textual tasks, their understanding of numbers typically comes from the narrative context and they lack deep numerical reasoning or the flexibility of a human mind.”
To further showcase these capabilities, the researchers have developed an interactive web application that allows users to explore GPT-4’s analytical prowess, though they advise that its accuracy should be independently verified.
Skepticism and regulatory concerns
Despite its impressive capabilities, some experts remain skeptical. A financial practitioner critiqued the study on Hacker News, arguing that the artificial neural network (ANN) used as a benchmark might be outdated. “That ANN benchmark is nowhere near state of the art,” they commented, suggesting that more current, proprietary models might show different results.
Moreover, American financial regulators have previously warned that the rapid adoption of AI in the financial sector poses risks to stability that require oversight.
On the other hand, a Gartner survey revealed that 39% of finance leaders are currently using AI and machine learning (ML) in their operations, with another 29% planning for its implementation.
The research identified three key factors shared by leading AI users in finance: patience to see transformative results, embedding data scientists within finance teams, and securing executive buy-in to ensure employee adoption.