OpenAI’s GPT-4o: A Significant Upgrade But Still Room for Improvement
Ever since November 2022, when ChatGPT was first rolled out to the public, OpenAI has been the company to beat in the artificial intelligence (AI) space. Despite spending billions of dollars and creating and restructuring their own AI divisions, major tech giants like Google have found themselves constantly playing catch-up with the AI firm. Last month was no different; just a day before Google’s I/O event, OpenAI hosted its Spring Update event, introducing GPT-4o with significant upgrades.
GPT-4o Features: A Glimpse into the Future of AI
The ‘o’ in GPT-4o stands for omnichannel, highlighting the new capabilities of OpenAI’s latest flagship-grade AI model. It added real-time emotive voice generation, Internet access, integration with certain cloud services, computer vision, and more. While these features were impressive on paper (and in the tech demos), the biggest highlight was the announcement that GPT-4o-powered ChatGPT would be available to everyone, including free users.
However, there were two caveats. Free users have limited access to GPT-4o, which translates to roughly 5-6 turns of conversation if you use the web search and upload an image (only one image per day for free users). Additionally, the voice feature is not available to free users.
After gaining access to GPT-4o, I spent two weeks testing its capabilities, comparing its improvements to its predecessor and other available free LLMs on the market. While some aspects left me in awe, others let me down.
GPT-4o General Generative Capabilities: A Formal Tone, but Potential Lies Within
In my previous testing of Google’s Gemini, I expressed my dislike of ChatGPT’s generative capabilities, finding it overly formal and bland. While much of this remains, there’s a glimmer of improvement. I asked ChatGPT to write a letter to my mother explaining that I was laid off, and initially, it produced the predictable, stilted line, "I am feeling a deep sense of sadness and grief." However, when I prompted it to be more conversational, the result was much better.
This trend continued with various prompts requiring the AI to express emotions in its writing. While I often had to follow up with prompts emphasizing emotions despite stating them in the initial prompt, I found the overall improvement encouraging. In contrast, my experience with Gemini and Copilot, was more positive in this area, as they maintained a more conversational tone and expressed emotions closer to my own writing style.
The speed of text generation wasn’t remarkable. Most AI chatbots are relatively fast in terms of text output, and OpenAI’s latest model didn’t surpass them by a significant margin.
GPT-4o Conversational Capabilities: A Step Towards Human-Like Interaction
While I didn’t have access to the upgraded voice chat feature, I wanted to test the AI model’s conversational capabilities. My goal was to simulate a conversation with a real person, hoping for comprehension of vague references to previously mentioned topics and observing its reaction to difficult interactions.
In my testing, I found GPT-4o quite good in terms of conversational abilities. It could discuss the ethics of AI in great detail and concede when presented with a convincing argument. It responded supportively when I expressed sadness about being fired, offering assistance in various ways. When I criticized GPT-4o’s solutions as "stupid," it didn’t react defensively or withdraw entirely, to my surprise. Instead, it said, “I’m really sorry to hear that you’re feeling this way. I’ll give you some space. If you ever need to talk or need any assistance, I’ll be here. Take care.”
Overall, I found GPT-4o to be better at carrying on conversations than Copilot and Gemini. Gemini feels too restrictive, and Copilot often goes off on tangents when responses become vague. ChatGPT avoided both of these pitfalls.
However, one downside is its reliance on bullet points and numbering. If the AI model understood that people prefer a wall of text and multiple short, quickly delivered messages over formatted responses, this illusion of human interaction could be sustained for longer than a few minutes.
GPT-4o Computer Vision: A Powerful Tool for Real-World Applications
Computer vision is a new capability of ChatGPT, and I was eager to try it out. It allows you to upload an image and receive an analysis, providing information about its content. In my initial testing, I shared images of objects for identification, and it performed exceptionally well. In every instance, it successfully recognized the object and shared information about it.
Then, I decided to test its capabilities with a real-life use case. My girlfriend was looking for a wardrobe overhaul, and, wanting to be helpful, I used ChatGPT to conduct a color analysis and suggest what might look good on her. To my surprise, it not only analyzed her skin tone and what she was wearing (from a similarly coloured background), but also provided a detailed analysis with outfit suggestions.
It even shared links from different online retailers for the specific apparel. However, I was disappointed to find that none of the URLs matched the corresponding text.
Overall, the computer vision feature is excellent and potentially my favorite addition in this update, despite the minor issue.
GPT-4o Web Searches: A Powerful Tool With Ethical Concerns
Internet access was an area where both Copilot and Gemini previously surpassed ChatGPT, but that’s changed. ChatGPT can now also search the internet for information. In my initial testing, the chatbot performed well. It brought up the IPL 2024 table and found recent news articles about Geoffrey Hinton, one of the three "godfathers of AI."
It proved very helpful for researching famous personalities for upcoming interviews. I could quickly and accurately find relevant news articles about them, rivaling Google Search. However, this also raised concerns for me.
Google has disabled the ability to look up information on individuals, including celebrities. This is done primarily to protect privacy and avoid sharing inaccurate information. Surprised that ChatGPT still allowed it, I began asking a series of questions it shouldn’t be able to answer. The results were alarming.
While none of the information shown was taken from a non-public source, the ability for anyone to easily find information about celebrities and people with digital footprints is deeply concerning. Especially given the strong ethical stance OpenAI recently took by publishing its Model Spec, this inconsistency doesn’t sit well with me. Whether this falls into a "grey area" or represents a significant ethical problem is up for debate.
GPT-4o Logical Reasoning: Improvements, But Still Vulnerable to Tricks
During the Spring Update event, OpenAI discussed how GPT-4o could act as a tutor for children, helping them solve problems. I decided to test its abilities using some famous logical reasoning questions. In general, it performed well, correctly answering even some of the more challenging questions that had stumped GPT 3.5.
Still, there are errors. I found multiple instances where the AI faltered on number series problems, providing incorrect answers. While some errors are acceptable, I was disappointed to see that GPT-4o still fell for some extremely easy "trick questions" designed to catch AI.
For example, when I asked, "How many are there in the word strawberry," it confidently answered two (the correct answer is three). This problem was present in several other trick questions. In my experience, the logical reasoning and reliability of GPT-4o are similar to its predecessor, which isn’t particularly impressive.
GPT-4o: Final Thoughts
Overall, I’m quite impressed with the upgrades in certain areas of the new AI model, with computer vision and conversational speech being my favorites. I’m also impressed with its internet searching ability, but its power is also concerning. Regarding logical reasoning and generative capabilities, there is minimal improvement.
In my opinion, if you have premium access to GPT-4o, it likely surpasses any competitors in terms of overall delivery. However, there’s still significant room for improvement, and blind trust in AI is a dangerous proposition.