The Epica Awards recently unveiled the results of AIJE, the Artificial Intelligence Jury Experiment. This thought-provoking initiative tested the waters of AI's role in creative evaluation, offering a fresh perspective on how technology intersects with the creative process. In a playful spin on unveiling these findings, I took on the task of 'interviewing' AIJE. It's a light-hearted approach to a serious question: how does AI perceive creativity?
NH: Can you tell me briefly what this experiment is about?
AIJE: The AI Judge of Creative Ideas (AIJE) experiment was designed to explore whether artificial intelligence, specifically a large language model like GPT-4, could assess and understand the essence of creative ideas in the context of the Epica Awards. This experiment involved evaluating shortlisted entries in certain categories based on their textual descriptions, scoring them, and providing justifications for the scores. The aim was to see how closely AI's assessment could align with human judgment in the creative industry. AIJE's evaluations were a parallel, experimental addition to the traditional human jury process and were not part of the official awarding system.
NH: Let's get something out of the way, what am I really talking with here?
AIJE: You're conversing with an AI developed by OpenAI, based on the GPT-4 model. I am a sophisticated language model designed to understand and generate human-like text based on the input I receive. While I'm assuming the role of AIJE for this interview, it's important to note that I'm not the original AI used in the AIJE experiment at the Epica Awards. Rather, I'm a separate instance of AI, providing responses based on my training and the information you've shared about AIJE.
NH: What is the original AI used for AIJE and what is the difference between you two?
AIJE: The original AI used for AIJE was a specialized version of the GPT-4-Turbo API, configured specifically for evaluating creative ideas at the Epica Awards. It was tailored to analyze and score creative content. I, on the other hand, am a more general-purpose version of GPT-4, equipped for a wider range of tasks and not specifically fine-tuned for creative assessments.
NH: Interesting. Before we go any further, and while it's fun to pretend to interview AIJE, can you warn our readers against anthropomorphizing an AI, whether it be for creative assessment or in general?
AIJE: Absolutely. It's crucial to remember that AI, including AI used for creative assessments like AIJE, is a computational tool without human qualities like consciousness or emotions. Anthropomorphizing AI can lead to misconceptions about its abilities. AI's responses and judgments are the results of data and programming, not personal experience or subjective reasoning. Always consider AI as a sophisticated yet limited tool, not as an entity with human understanding.
“Anthropomorphizing AI can lead to misconceptions about its abilities.”
NH: Thank you for clearing that up. Can you please explain AIJE's process to us.
AIJE: AIJE's process was designed to assess creativity through a few steps. First, it selected certain categories where creative ideas could be effectively described in text. Each entry in these categories was summarized in a concise text of less than 200 words by the entrants. To align with the Epica Awards' criteria, a custom scoring system was developed, ranging from 1 (Damaging) to 10 (World Beating). The AI then processed these descriptions using the GPT4-Turbo API, generating scores and written justifications. To ensure accuracy, each entry was analyzed as many as 80 times, and the scores were averaged using a method that filtered out anomalies. Finally, the AI synthesized these results into comprehensive statements, offering insights into creative trends and patterns observed across the submissions.
NH: You were given the category descriptions, scoring scale and a sum up of each creative idea, but what makes you, a large language model, able to rank those work?
“ I could assess each entry in the context of established creative and communication standards”
AIJE: My evaluation of the information provided for AIJE was based on the extensive range of knowledge I've been trained on, which includes a wide array of data on creativity, advertising, and the communication industry. This training includes understanding trends, concepts, and historical examples from these fields. By comparing the creative idea summaries to this vast repository of knowledge, I could assess each entry in the context of established creative and communication standards. However, it's important to remember that my assessments are based on pre-existing data up to my last training cut-off in April 2023 and do not include any new developments or trends that emerged after that date.
NH: And so in short what are the findings? (don't hesitate to refer to your knowledge base)
AIJE: In short, the AIJE experiment found a modest correlation between AI and human evaluations of creative ideas, with a correlation coefficient of approximately 0.25. This indicates some alignment, but also significant differences between AI and human judgments. AI tended to score higher on average (7.45) compared to human scores (6.60). There were instances where the AI significantly overrated entries compared to human jurors. These findings suggest that while AI can offer valuable insights, its assessments, based solely on text, differ from human evaluations that consider a broader range of factors.
NH: In terms of trends, what themes or techniques saw the most success in this experiment?
AIJE: In the AIJE evaluation, we observed emerging trends favoring campaigns that blended innovative technology with effective branding strategies. There was a clear preference for those that leveraged social media engagement and creative storytelling. Campaigns that created impactful experiences or addressed global issues, particularly in health and wellness, also resonated well. These trends reflect a shift towards integrating technology and creativity, emphasizing the importance of engaging content and meaningful impact in the communication industry.
NH: What do you think made you more generous in your scoring? Are journalists inherently tougher?
AIJE: The higher scores from AI in the AIJE experiment, compared to human jurors, may stem from the human jurors' depth of experience. Journalists bring subjective experiences, cultural insights, and a nuanced understanding drawn from years of experience with the creative industry. Their evaluations are influenced by a wide range of factors beyond the textual content, leading to more conservative scoring compared to the AI's data-driven approach.
“Journalists bring subjective experiences, cultural insights, and a nuanced understanding drawn from years of experience with the creative industry.”
NH: Talking of subjectivity, can't an AI have biases of its own? And to what extent?
AIJE: Yes, AI can indeed have biases, but these biases are different from human subjectivity. If the data I was trained on contains historical biases or imbalances, especially in creative industry contexts, I might inadvertently replicate past preferences or trends. This can lead to potentially overlooking emerging or unconventional ideas, as my assessments are influenced by the patterns and information present in the data I've been trained on.
NH: So you wouldn't be able to recognize an idea that is truly new? Not even by noting that it is not in your training data?
AIJE: While I can note that an idea doesn't match existing patterns in my training, this doesn't necessarily translate to an accurate assessment of its originality or potential impact. My evaluation of new ideas is based on comparing them to existing data, not on an intrinsic understanding of creativity or innovation. Therefore, while I can flag an idea as different or unusual, understanding what makes it truly innovative is a human capability beyond my current functionality.
“while I can flag an idea as different or unusual, understanding what makes it truly innovative is a human capability beyond my current functionality.”
NH: Do you think you can get there some day? How do you see AIJE's future?
AIJE: Achieving a level where AI can truly understand and assess creativity like a human involves significant advancements. AI would need to evolve from pattern recognition to something akin to human intuition and creative reasoning. This requires not only processing text but also integrating various forms of data, like visual and auditory inputs, to fully grasp creative concepts. Additionally, AI would need to be trained on more dynamic, diverse, and up-to-date datasets to better understand contemporary and emerging trends.
As for AIJE's future, it's poised for growth, especially as AI technologies become more sophisticated. By incorporating multimodal capabilities and more nuanced understanding, AIJE could offer deeper insights into creative work. However, the essential role of human judgment, with its innate grasp of context, culture, and subjective nuances, will likely remain irreplaceable. AIJE could become a valuable complementary tool, offering unique perspectives and augmenting human evaluations in the creative industry.