What You Need To Know About ChatGPT-4

March 29, 2023

The AI space is moving extremely quickly at the moment, with the AI chat wars about to kick off.


GPT-4, the updated model from OpenAI was “released” on March 14th to those signed up for the paid version of ChatGPT. But it turns out that Bing’s chat offering powered by OpenAI's tech has been running GPT-4 since launch. Quite a twist.

So why does this matter?

GPT-4 is a much more capable large language model than its predecessor on several fronts.

While the GPT-3 was purely based on text input, GPT-4 becomes multi-modal with the addition of being able to accept an image and analyse that image to output text. This could be a game changer for things like summarising graphs or other data in image form, or if nothing else, it could improve its meme game.

Model improvements

The base model of GPT-4 is just an all-around more capable model. While GPT-3 was famously able to pass some professional exams GPT-4 improves on these test scores in almost every case - scoring 298/400 on the Uniform Bar Exam, up from 213/400 for GPT-3.

GPT-4 is also touted by OpenAI as having surpassed its predecessor in advanced reasoning capabilities - i.e. better able to take your text input and reach a logical conclusion consistent with the facts you presented. This is demonstrated by performance on tasks where you specify hours of availability for meeting participants and ask it when everyone would be free for 30 mins. Theoretically, this may translate into fewer outright fabrications, but these are still very possible at present.

The final and potentially the most important improvement is that the context length that you can work with in GPT-4 is much larger than in GPT-3. Context length is the amount of text that can be considered when generating outputs. It is defined by the number of “tokens” you are limited to in a request, but in simpler terms GPT-4 can handle many more words of input. GPT-3 maxed out at around 3000 English words. The two models of GPT-4 currently have limits of 6000 on the low end and 24000 on the high-end model.

Why does that matter? The more context you can feed the model about the tasks you are trying to complete, the better it will be able to perform tasks. But also, the more text you can feed it, the more useful it can become for summarisation - with the new model you could feed it a 50-page document and ask for the summary of the key points.

Context also matters for conversation as each time ChatGPT tries to respond to a prompt in your conversation, it is limited by the token limit in how far back in the conversation history it can “remember.”

If you’ve played with Bing’s Chat, you’ll know that this part of GPT-4 is not being fully utilised, because the chat box limits you to 2000 characters, let alone words. More than this though, largely in response to how weird Bing Chat can get if a conversation gets too deep, Bing has limited the number of questions or inputs into a conversation you can have before they force you to end the conversation and begin a new one.

This limit started at 5 when it was first put in place but appears to have been extended to 15 (unless you start asking it things that it really doesn’t want to talk about, which will have you cut off without much warning.)

Regardless, the new features of GPT-4 are well worth playing with and testing as you may find that some of the things that did not impress you from GPT-3 have been improved significantly, especially if you can give it more context. You could, for example, feed it previous articles you have written to give it more context on the style you are after and combine that with key facts for the piece you want it to write. Of course, testing it out with images as input could open up a lot of possibilities too.

Google Bard

Remember that line about the AI wars starting to kick off? Well, at least in the LLM (large language model) space, ChatGPT got a massive head start, but wasn’t really a threat to the status quo in the search industry for example until Bing integrated its tech into their search engine - the announcement of which prompted Google to unveil its plans for Bard a lot sooner than it probably intended.

Until today, we hadn’t seen much of what we can expect from Bard - the demo was fairly underwhelming and sent Google’s stock price tumbling back in February after an ad for the new service showed it with an inaccurate answer.

We have seen a few tidbits of information from people who had some early access to the product - the most notable things that came out were that, unlike Bing, Google did not appear to be weaving citations into its answers.

Today though (March 22nd NZ Time), Google has begun inviting users from the US and UK to use the Bard beta.

The key differences to Bing at present seem to be:

  • Bard seems to be drafting three responses to each query which is interesting
  • Bard is not interwoven with search anywhere close to the current Bing set-up (Bard is being offered in a fairly sterile environment with only a “Google it” button)
  • Bard’s logical processing in this current beta is poor compared to Bard, especially at answering maths questions. Sometimes Bard manages to give the incorrect answer in all three of its drafts.
  • Bard seems to be better at jokes
  • Google Bard is still not citing sources frequently, with no sources cited in most responses currently.

Obviously, this is super early to be judging Bard on its performance. Bard has a huge potential advantage if it does get woven correctly into the vast catalogue of Google services that are already integrated so deeply into our modern lives. With all of that said, if Google was hoping today’s launch was going to show it was not still behind the 8-ball on chat AI, it doesn’t look like it will achieve that from these very early previews.

What should you be doing?

This space is moving so fast that no one can be blamed for feeling a little overwhelmed with all of these developments. We’ve only spoken about chat and language AI and haven’t even touched on some of the amazing development in still image and even moving image AI platforms - and it still feels like a whirlwind.

I think the key right now is playing with these tools frequently to see where they may be able to improve your workflows and also where they can’t. But if they can’t at present perform any given task to the standard you feel is necessary, make sure you don’t write it off permanently and keep an ear out for developments that could change the status quo. We found ChatGPT weak at writing report commentary for example, but it’s definitely worth looking into again if we can feed it a large amount of expert context and also feed it images as input.

Written by:

Written by the team at Reason.

More from Reason