Generative Agent Simulations of 1,000 People
A paper that thoroughly executes a parity study between Synthetic and Organic users.
In the light of an ancient parable, we explore a new paper that dives into how ensembles of large language models match the prediction accuracy of human crowds. It reveals that combining machine predictions with human insights leads to the most robust forecasting results.
The study "Wisdom of the Silicon Crowd" examines how ensembles of large language models (LLMs) compare to human crowds in forecasting accuracy. It introduces a novel ensemble approach where predictions from multiple LLMs are aggregated, demonstrating that this method achieves forecasting results on par with those from a large group of human forecasters. This finding underscores the potential of LLMs to replicate the "wisdom of the crowd" effect traditionally observed in human groups. In other words, aggregated predictions outperform individual forecasts.
The question then posed is whether LLM forecasts can be enhanced by integrating human-generated predictions. The results indicate that while LLM predictions improve when exposed to human forecasts, the most effective strategy is a hybrid approach, combining both human and machine predictions.
The study showed that the LLMs’ predictions could be improved by incorporating median human predictions, which resulted in a notable increase in accuracy.
John Horton had mentioned this before the study came out.
This suggests a promising direction for leveraging both human intuition and machine efficiency in decision-making processes.
Within Synthetic Users we advocate this approach. Organic users are not to be underestimated. A Hybrid approach is by far the best way forward.
The introduction and methodology sections of the study provided a detailed background on the increasing capabilities of LLMs in various complex and economically valuable tasks. The models used were diverse, ranging from GPT-4 to smaller models, each with unique training data and parameters, which could potentially enhance prediction accuracy by reducing individual biases and errors. The study also highlighted the importance of using real-world forecasting questions to ensure external validity and practical applicability of the findings. Hypotheticals have little value in this space.
LLMs perform at par with human crowds in forecasting. The "wisdom of the silicon crowd" indicates a significant step forward in the utilization of artificial intelligence in complex cognitive tasks like forecasting, opening up new avenues for future research and application.
There’s an elephant in the room, surrounded by blind men. In this ancient story, several blind men each touch a different part of an elephant and then attempt to describe the entire animal based on their limited experience. The man who feels the elephant's tusk insists it must be like a spear, while the one who feels its side is certain it is like a wall. Each perspective is both valid and limited.
The parable of the blind men and an elephant is a great metaphor when considering the limitations of both human and machine forecasting: go hybrid (for now). Let’s be honest, it’s just a matter of time before the steering wheel in most cars becomes obsolete, but for the time being, we still need it.
We tend to claim absolute truth based on our limited subjective experience, disregarding other people's subjective experiences. Both human intuition and machine learning can fall into this trap. Humans often rely on personal experiences or biases, while machines can overfit to their training data, failing to generalize to real-world situations.
In the context of forecasting, the parable underscores the value of the hybrid approach. Just as a more accurate understanding of the elephant comes from integrating all the blind men's perspectives, a more accurate forecast comes from integrating both human and machine predictions.
We can mitigate the limitations of each if we combine both, much like the blind men would have a more complete understanding of the elephant if they shared their experiences. This hybrid approach, which leverages the strengths of both human intuition and machine efficiency, holds the promise of providing a more comprehensive and accurate picture in complex forecasting tasks.