A recent article reports on work by researchers at Anthropic, the AI lab that developed a ‘reasoning’ AI model, and their ability to look into the digital brains of large language models (LLM). Investigating what happens in a neural network as an AI model ‘thinks’, they uncovered some unexpected complexity that would suggest that, on some level, an LLM might have a grasp of broad concepts and does not simply engage in pattern matching. Conversely, there is evidence to suggest that when a reasoning AI explains how it has reached a conclusion, its account of how it has reasoned does not necessarily match what the ‘digital microscope’ suggests has gone on. Moreover, sometimes, an AI will simply produce random numbers in response to a mathematical problem that it can’t solve and then move on. On occasion, it will respond to a leading question with reasoning that leads to the suggested conclusion, even if that conclusion is false. Thus, it seems, the AI will appear to convince itself (or the human interlocutor) that it has reasoned its way to a conclusion when in fact it has not.
Read more >>(How) Does Artificial Intelligence Think? (What) Does it Know
