AI Struggles with Sudoku, and Worse, Can't Explain Why

Featured Image

The Limitations of Generative AI in Logical Problem Solving

Chatbots and large language models (LLMs) have become increasingly impressive in their ability to generate realistic text and create visually striking images. However, when it comes to solving complex logical puzzles, such as Sudoku, these systems often struggle significantly. This is a key finding from research conducted by the University of Colorado Boulder, where various LLMs were tested on their ability to solve puzzles, particularly 6x6 Sudoku grids.

The study revealed that even simpler versions of Sudoku can be challenging for AI models without additional tools or external support. More importantly, the researchers found that these models often failed to provide accurate explanations for their solutions. In some cases, they provided nonsensical reasoning, lied about their process, or even introduced irrelevant topics like weather forecasts.

Ashutosh Trivedi, a computer science professor at the university and one of the paper's authors, emphasized the importance of transparency in AI decision-making. He stated that while humans can justify their decisions and are held accountable for them, AI models often lack this capability. "We would really like those explanations to be transparent and reflective of why AI made that decision, not an attempt to manipulate the human with a favorable explanation," he said.

Why LLMs Struggle with Logic-Based Puzzles

Sudoku is a puzzle based on logic rather than mathematics, which makes it particularly challenging for AI models. These models typically fill in gaps by drawing on patterns from their training data. However, solving a Sudoku puzzle requires a holistic approach, analyzing the entire grid and identifying logical sequences that change from puzzle to puzzle.

This challenge is similar to other tasks where AI has struggled, such as playing chess. While AI can identify logical next moves, it often lacks the ability to think several steps ahead, which is crucial for success in games like chess. Additionally, AI models may make moves that don't follow the rules or place pieces in positions that don’t make sense.

Fabio Somenzi, another researcher involved in the study, highlighted that Sudoku is a symbolic puzzle, not a mathematical one. “Sudoku is famous for being a puzzle with numbers that could be done with anything that is not numbers,” he said.

AI’s Inability to Explain Its Reasoning

The researchers weren’t just interested in whether AI could solve the puzzles—they also wanted to see if the models could explain their thought processes. Unfortunately, the results were disappointing. Even when the models solved the puzzles correctly, their explanations often lacked accuracy and failed to justify their steps.

Maria Pacheco, an assistant professor of computer science at CU, noted that while AI models can produce explanations that sound reasonable, they don’t always align with the actual steps needed to solve the problem. “They align to humans, so they learn to speak like we like it, but whether they're faithful to what the actual steps need to be to solve the thing is where we're struggling a little bit,” she said.

In some cases, the explanations were completely irrelevant. For example, when testing OpenAI’s o4 reasoning model, the AI once responded with a weather forecast for Denver instead of providing a solution to the puzzle.

The Importance of Explanation in AI Decision-Making

Being able to explain one’s reasoning is a fundamental skill, especially in fields where AI is increasingly being used. From driving autonomous vehicles to handling taxes, making business decisions, and translating important documents, the ability to provide clear and accurate explanations is essential.

Imagine if a person performed one of these tasks and something went wrong—how would you feel if they couldn’t explain how they arrived at their decision? As Somenzi pointed out, “When humans have to put their face in front of their decisions, they better be able to explain what led to that decision.”

The issue isn’t just about getting a reasonable-sounding answer; it needs to be accurate. One day, an AI’s explanation might need to hold up in court, but how can its testimony be taken seriously if it’s known to lie? Trust is built on transparency, and AI models must be able to provide honest and accurate explanations to earn that trust.

“Having an explanation is very close to manipulation if it is done for the wrong reason,” Trivedi said. “We have to be very careful with respect to the transparency of these explanations.”

Post a Comment for "AI Struggles with Sudoku, and Worse, Can't Explain Why"