
Artificial intelligence (AI) may have made significant advances in writing, image creation, and even passing tests. But when it comes to something as routine as reading analog clocks or figuring out the day of the week for a given date, AI can’t read properly.
A recent study presented at the 2025 International Conference on Learning Representations and published on March 18 on arXiv has revealed that popular AI systems, despite their powerful capabilities, often fail at reading clocks and solving calendar-related questions.
The findings point to major limitations in how these systems process basic visual and logical information that most people learn in early childhood.
Researchers tested several leading AI models, including Meta’s Llama 3.2-Vision, Anthropic’s Claude-3.5 Sonnet, Google’s Gemini 2.0, and OpenAI’s GPT-4o, by showing them images of clocks and posing date-related questions.
The results were underwhelming. On average, the models correctly interpreted the time from clock images only 38.7% of the time. Their accuracy with calendar dates was even lower, at just 26.3%.
Basic Tasks Still a Challenge for AI
Lead author Rohit Saxena from the University of Edinburgh explained that humans generally develop these skills early and use them with ease throughout life. But AI models, even the most advanced ones, often get them wrong.
Saxena noted that reading clocks requires understanding visual layouts and spatial relationships — not just recognizing what a clock looks like, but also measuring the angle between the hands, identifying overlapping parts, and accounting for different designs like Roman numerals or non-standard dials.
I asked Grok and ChatGPT to generate a clock with the time 3:38. Grok was off by 2 hours and ChatGPT was in a whole other dimension pic.twitter.com/MQNxBl9jVy
— greg (@greg16676935420) December 13, 2024
He explained that while earlier AI systems were trained using labeled examples, recognizing the exact time from a clock involves more than just labels — it requires reasoning about physical space. That skill remains a challenge for current AI tools.
When it comes to calendars, the struggle is just as noticeable. A question such as “What day is the 153rd day of the year?” stumped most models. Saxena said this is because large language models do not solve problems by running step-by-step calculations like traditional computers.
Instead, they guess answers based on patterns found in previous data. That method leads to inconsistent reasoning, especially in areas like calendar math where precision is necessary.
Human-Like Tasks Still Require Human Oversight
The study adds to a growing body of research that highlights the gaps between human understanding and AI prediction. AI systems often excel when they are exposed to familiar patterns or well-represented examples in their training data. But they struggle to generalize or apply reasoning in unfamiliar or rare situations, such as leap years or abstract date calculations.
Saxena also emphasized that just because an AI has seen explanations of a leap year, for example, it doesn’t mean it can apply that information accurately in a real-world context. These findings underline a deeper issue in current AI development — the need to combine perception, logic, and spatial awareness more effectively.
This research serves as a caution for relying too heavily on AI in tasks where precision and reasoning are critical. Saxena said that while AI is incredibly powerful, when tasks involve both seeing and thinking through details, it’s essential to include thorough testing and sometimes keep a human involved in the process.