Generative Agent Simulations of 1,000 People
A paper that thoroughly executes a parity study between Synthetic and Organic users.
Synthetic Users are evolving to address criticism about their generalist nature by incorporating representative data sets and personal narratives.
This article mentioning Synthetic Users made us reflect on the main criticism Synthetic Users get. They are too generalist. It’s a fair criticism and one we are working hard to disprove with each new iteration of Synthetic Users.
The first biggest concern is participant bias. LLMs based pre-trained on the whole internet are not representative of real people. They have a different geographic distribution, gender distribution (the internet skews male) and a different socio-economic status.
The first thing we had to do is use a widely available data set representative of human behaviours in different countries. For the US we use the General Survey Study. It is the most rigorous yearly study on American behaviours, attitudes and opinions. It gave us the high quality US census data to fine tune our model with. This way we could start biasing it toward creating Synthetic Users that replicate the American consumer. By finetunning with other Survey Studies we are able to do so for other countries.
The benefit of this was easy to validate. When we creating a sample of Synthetic Users, we ask our model questions like what is your gender distribution? A pure GPT trained on the internet would answer 70% male, with the GSS fine tune we got 48% male, which is accurate. We validated this with multiple variable, like age, ethnicity, political inclination…
We ingested all of the following 23 surveys to ensure behavioural accuracy:
We now have the correct distributions (provided by the various national surveys). Our team is currently engineering new strategies to surface ‘personal accounts’ which lend more depth to our interviews.
Narrative Synthesis lies at the core of our strategy. We are developing algorithms capable of synthesizing personal narratives in a way that feels authentic and relevant to each persona’s background. This involves combining elements of different stories in a manner that maintains internal consistency and reflects the complexity of human experiences.
Sentiment Analysis allows us to best surface the narratives that require more attention. Researchers and product people want to focus on the pains and so we use sentiment analysis to gauge the emotional tone of narratives. This can help in understanding the context and emotional layers within personal stories, which is crucial for replicating human-like empathy and understanding in synthetic users.
From a sourcing perspective we have a wide range of sources for personal narratives. Public blogs, social media, forums are the richest environments we are tapping in order to enrich our dataset.
All this means that our interviews are gaining more depth but also more statistical accuracy, which leads us to the next iteration. Yes, better interviews but also Surveys. Yes, you’ve asked for them and they are coming!
Our upcoming Surveys will focus on two things:
We are able to plot how the appetite for a certain product will perform over time by Synthetic Users of certain Geographies, Social Backgrounds or Age groups.
This helps with your product development. Which features to roll out first.
As a subset of preference mapping we are looking at Targeted Advertising. Given a certain piece of content, which percentage of Synthetic Users will be more likely to be convinced by it? - will be the research question you will be able to pose.
What is the maximum price your consumers will pay for a certain product. You'll be able to ask Synthetic Users specific questions about their price sensitivity, allowing you to understand the optimal pricing strategy for your target audience. By incorporating the Gabor Granger model, you can accurately determine the price point that will maximize both profitability and consumer demand.
If you can get the obvious out of the way first, you are saving time and ensuring your teams align on a baseline so you can then focus on what is less obvious. If for argument’s sake the deviation is 15%, then you are 85% there. You have a good grasp of which interviews work and which don’t. You can adjust your script. You can hone in on the harder questions where organic users have an edge (for the time being) around personal accounts.