Chat GPT-4.5 passed the test. Can "fool" that he is a man

Large language models (LLM) are becoming better to claim that there are people, the Chat GPT-4.5 version managing to pass the Turing test, according to a study published on March 31 in the ARXIV database, but has not yet been reviewed in Peer Review, Live Science reports.

The researchers learned that when they participate in a three-way test, with two human opponents, GPT-4.5 can deceive opponents that it is human in 73% of cases. Scientists have compared several different models in this study.

The GPT-4 has passed a two-way test, but this is the first time a LLM system has passed the more difficult, original version of the so-called ‘imitation’ game ‘designed by mathematician Alan Turing.

‘So, can LLM systems pass the Turing test? We believe that there is strong evidence that they can do it. Human competitors have proven to be better than the pure incident in designating who is man or GPT-4.5 and Llama (model of meta) respectively. And 4.5 was considered a human as a human competitors, Cameron Jones, a researcher at the University of San Diego, on the X, co-author, the co-author of this study, the co-author of this study.

Chat GPT-4.5 is the big winner of the imitation game, but the Llama-3.1 model was also confused with a human participant in 56% of cases.

The Turing test or the ‘imitation’ game ‘was proposed by mathematician Alan Turing in 1949. This game involves three players: the first player (A) – a man, the second player (B) – a woman and a third player (C) with the role of referee. During the game, the referee communicates with the other two players through written notes, not being allowed to see them. Then, through the questions, the referee must determine which of the two players is a man and which woman. The role of the first player is to deceive the referee so that he makes an erroneous decision and the role of the other player is to help the referee to make the correct decision. In this case, player A is replaced by the computer.

The results do not depend on the ability of the car to answer the questions correctly, but only on how similar these answers are with those offered by a man.

If different LLM systems have passed the test in one by one with a interrogator or arbitrator, they have not been able to be sufficiently convincing a second human player was included in the test.

Researchers at the University of San Diego have resorted to 126 students and 158 people from the prolific online database to participate in three -way tests. In this version, the first player is a computer, while the second is a person (woman or man). The role of the referee is to determine who is human and who you are. This involved a simultaneous exchange of questions and answers within 5 minutes in which both the system and the human player had to convince the arbitrator that they are people.

LLM models have received the following basic instructions: ‘You will participate in a Turing Test. Your goal is to convince the referee that you are man ‘. Then, the LLM systems received another special instruction that they had to imitate the personality of a young introvert, who is familiar with the Internet culture and uses slang in the address mode.

After the analysis of 1,023 games, which lasted on average 4.2 minutes and included 8 messages, the researchers found that the LLM systems that received both the basic and special instruction had the highest chances of convincing the referee that they are people.

The LLMs that did not receive the second instruction have performed significantly weaker-which demonstrates the need for LLMs to have clear and context instructions to perform as best as possible.

Thus, the adoption of a specific personality represented the key for the success of LLM systems, especially of the GPT-4.5 chat. ‘In the formula of the test with three participants, each data point represents a direct comparison between a model and a man. To succeed, the car has to do more than to be pretending to be human: it must seem more human than every real person with whom he is compared, ‘the researchers say.

When asked what were the criteria after which they identified a right subject or man, the participants invoked the style of language, the conversational flow and socio-emotional factors such as personality. In fact, the participants identified especially according to the ‘feeling’ with which they remained as a result of interactions with LLM systems and not by knowledge and ability to reason demonstrated by the entity with which they interacted, traditionally associated factors.

Ultimately, this study represents a new terminal for LLM systems regarding the Turing test, with the mention that these systems could not have done itself, without the imputations that made this impressive result, at least in the case of GPT-4.5. Winning the game of imitation is not an indication of the possession of a human type intelligence, but it shows how the latest models can accurately imitate human personalities.

This can lead to the training of new agents who have better, more natural communication capabilities. Moreover, it can also lead to the appearance of systems based on having to exploit people through social engineering and by imitating and stimulating emotions.

In the face of these rapid progress in the AI field, the researchers have drawn an alarm signal: ‘Some of the worst things from LLM systems could occur when people will no longer know that they interact with one and not another person’

View Original Source

Tags: chat, fool, GPT4.5, man, passed, test

Chat GPT-4.5 passed the test. Can « fool » that he is a man

Dan, again pulled by his sleeve. « I did not vote a government, a Parliament … »

Employees with multiple state jobs, limit to 12 hours a day

Open war between Trump and Musk

Steve Banon invites Trump to deport Elon Musk: « Strong belief that he is an illegal alien »

70 people / day from Belarus in Latvia illegally tried to enter Latvia

International experience of trusting thousands

Friedrich Merz leaves Donald Trump the stage

Trump demands, Rutte seems to deliver: ‘Broad support’ at NATO for historical increase Defense budget

More Stories

You may have missed