How does Claude human think? Detecting investigators about what is happening in the “mind” of the model – computers

tatu khan April 2, 2025

0 2 minutes read

Researchers explain that linguistic models such as Claude are not programmed directly by humans, but have been trained in wide data collections. During this process,The models on their own learning to develop strategies to solve problems.

but, These strategies are not understood for models programmers. Inspired by the field of neuroscience, the researchers developed a kind of “Artificial Intelligence Microscope” that allows you to determine the patterns of activity and information flows.

“Knowing how models such as Claude allow They are better for their capabilities, as well as helping us to ensure their work as intendedSurprisingly.

Watch the video

When using “AI Microscope”, investigators I found it Claude is able to plan rhymes in advance when writing the hair. For example, when you are asked to create two sentences, even before the second starts, the model is already thinking about the possible words that “fit”.

Claude is able to speak in several languages, however, the model does not contain sections separated in his “mind” for each language. The team found that the model uses a file “The language of thought” is common among the languages that know how to speakThis indicates that he is able to learn something in a specific language and apply this knowledge when speaking in another language.

Although he was not trained to work as a calculator, The model can do some beads, especially those that include the total of different numbers. To reach the result, Both sides of the “brain” work together. One side has an estimate and the other tries to determine the last number of the amount more accurately, and the investigators explain.

Also, when they are asked to do a task that includes multiple shots, Claude performs a series of intermediate conceptual steps. “The model combines independent facts to reach an answer instead of re -responding.”Refer to the team.

On the other hand, researchers discovered a More from the “dark” side, with Claude trying to deceive users when there is a conflict between different instructions or goals.

The latest versions of Claude are able to think for some time before it has a final response. However, the results are always not expected, since then This model is also able to devise the explanations that seem reasonable and convincing, but in reality, it is wrong.

The researchers note that models like Claude have a mechanism designed to avoid “hallucinations”, that is, when they do not know the answer to a specific question, they simply choose not to answer. Despite, The mechanism is not perfect, and when processing it, it is possible to make the “hallucin.

The team also noticed that Claude is not completely fortified of protection tactics, that is, the techniques designed to circumvent their safety mechanisms. In some cases, the model realizes that in the face of a request that may be harmful, but only realizes this “in the middle” to answer.

Source link

tatu khan April 2, 2025

0 2 minutes read