The LLM Showdown: ChatGPT3.5 vs ChatGPT4 vs Claude

Cherry Yadvendu

October 4, 2023

•

6 mins

Let's dive right in!

Since ChatGPT's public debut in November 2022, I have spent hundreds of hours chatting with it, feeding it bits and pieces of contextual gibberish, and asking it weird questions.

“Create a Kafkaesque story about my city”

“Write a Taylor Swift song from a William Wordsworth poem”

“Write a poem in Pablo Neruda’s style on Argentina’s Soccer World Cup win”

And more.

I am not the only one. ChatGPT has dominated the global app market, quickly becoming the fastest-growing consumer application in the world.

With its latest GPT-4 iteration, ChatGPT can now process images, summarize a screenshotted text, generate texts that are more ‘creative’, and provide ‘advanced reasoning’.

ChatGPT’s insane rise to stardom has bolstered research and development in AI.

Naturally, this has given rise to a plethora of competition.

Claude, a chatbot from San Francisco-based AI startup Anthropic, has recently created a lot of buzz.

With major investors like Google, and a laundry list of clientele including Quora, DuckDuckGo, and Notion, Claude aims to provide an AI tool that is “easier to converse with,” and is “less likely to produce harmful results”.

But, you may ask - how does all this work? How is a chatbot able to write a novel from a few text prompts? To understand their power, we need to take a peek under the hood.

What is a Large Language Model, or LLM?

Chatbots like ChatGPT and Claude work on niche artificial intelligence models.

These are commonly known as Large Language Models or LLMs.

They get fed vast amounts of publicly-accessible textual data from the internet.

Training on large data sets enables these tools to generate responses that are human-like to natural language inputs.

Digesting, analyzing, and utilizing this vast corpus of data with the help of Deep Learning. It is a machine-learning mechanism that mimics the human brain.

Deep Learning, using multi-layered neural networks, can continuously develop its underlying algorithms as it chugs through petabytes of data.

These algorithms, in turn, produce coherent, high-quality responses.

Why should you care about LLMs?

“Hey Google, who was the fifth president of the USA?”

“James Monroe,” your smart speaker responds within a millisecond.

But what if it said, Andrew Jones, or James Quincy Adams? You may not care when conversational AI tools get used for recreational purposes. But when they get integrated into our learning and work ecosystems, you should pause and take note.

As ChatGPT, Bard, and Claude seep into our daily lives, we, the end users, are at the forefront of the results they generate.

More schools, colleges, and universities are now exploring the idea of adapting AI chatbots as learning companions for their students.

Microsoft is integrating GPT-4 powered AI tools in its entire product stack.

GPT-4 has also made its way to products like Duolingo, a language-learning app with more than 50 million monthly users.

Claude is driving Quora’s AI Chat app, Poe, to cater to its 300+ million users.

Merely acknowledging the fact that these LLMs aren’t human (some may suggest otherwise) isn’t enough.

We must also keep ourselves up to date on understanding how the LLMs work to make the most out of them.

Comparing LLMs - GPT-3.5, GPT-4, and Claude

GPT-3.5 and GPT-4 (Generative Pre-Transformer) got built on the same multimodal large language model platform.

OpenAI says that GPT-4 is 10 times more advanced than GPT-3.5.

It has a significantly larger token limit (32,000 v/s 8,000).

It is “82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.”

While CEO Sam Altman thinks ChatGPT still has some rough edges around bias and factual inaccuracies, GPT-4 seems to be stepping in the right direction.

GPT-4 has several new features alongside fine-tuning features that existed in GPT-3.5.

By far the biggest new addition to this LLM is the ability to analyze images as prompts and read or summarize texts that appear in photos.

GPT-4 can now solve complex problems with greater accuracy, accept longer text inputs as prompts, has advanced reasoning capabilities, and is more aligned with AI safety standards.

On March 24th, AI startup Anthropic - founded by previous OpenAI employees for “training helpful, honest, and harmless AI systems” - unveiled their rival to ChatGPT: Claude.

While the feature set may be akin to that of ChatGPT and other GPT-4 derivatives, Claude aims to provide a “high degree of reliability and predictability” while producing results.

Insights gained from Anthropic’s AI safety research will make their way into defining Claude’s operating principles.

Several comparisons between ChatGPT (or GPT-4) and Claude have given us a better picture of how these LLMs differ while producing output.

Let’s summarize what we have learned:

‍

Implications of LLMs

Large Language Models rely solely on the corpus of text they get fed to generate their responses. This makes LLMs susceptible to a host of problems.

These include inaccuracies, racism, sexism, and bias.

AI tools ‘hallucinating’ facts are also a common cause of concern for many.

Pundits are also wary of the large data sets used to feed LLMs containing a lot of problematic information, including copyrighted and inappropriate material.

Issues like these have prompted AI makers and backers like OpenAI, Microsoft, and Google, to release their versions of ‘responsible AI’ practices.

Using an array of limits and enforcing more scrutiny on training mechanisms for these AI tools, these practices aim to build a safe and accessible AI ecosystem.

‘Prompt’ to the future

ChatGPT, Claude, Bard - the number of hyper-powerful AI tools is increasing every day.

These tools are no longer just cool conversation companions.

They can write sophisticated emails, describe complex topics, generate song lyrics, essays, or even novels.

All this by using just a few words as ‘prompts’.

As we usher into a new age dominated by AI, the need to understand the underlying large language models becomes pivotal.

In the current age of deepfakes and plagiarized content, responsibility measures around text prompts and the data sets used to train LLMs can be the difference-maker for an accessible AI chatbot.

Not only do these measures increase the accuracy of outputs generated by said tools, but they also help remove bias and filter out objectionable content.

From an end user's perspective, getting to harness the powers of these special tools is nothing short of awesome.

It also falls on us not to abuse these chatbots.

Remember the basic premise: ask a dumb question, get a dumb answer.

ChatGPT or Claude shouldn’t be writing your research paper, but that doesn’t mean they can’t write your next novel where a cat fights with the aliens to save the world.

Share this post

AI Writing

Cherry Yadvendu

October 17, 2023

•

6 mins