[Home](https://codefionn.eu/) · [About](https://codefionn.eu/about/) · [GitHub](https://github.com/codefionn)

---

# Problems with the current LLMs

> Problems that I encountered programming a code llm tui app

*Published on 2025-12-11 · [View as HTML](https://codefionn.eu/problems-with-the-current-llms/) · [Auf Deutsch lesen](https://codefionn.eu/probleme-mit-jetzigen-llms/)*

---


I recently developed a TUI for generating code via LLMs in my free time called
[scriptschnell](https://github.com/codefionn/scriptschnell). This gave me some
insight into what currently is lacking in current LLMs.

## Speed

After using Cerebras and Groq (Groq with **q**) to a lesser extent, the speed
that OpenAI's models gpt-5.1-codex or gpt-5.1-codex-mini provide is lacking.

Code generation requires a lot of looking up current implementation details,
creating or changing files and then validating the result. All this requires a
lot of tokens (in my experience just searching through the codebase takes at
least 20k tokens).

Some code generation applications like
[Windsurf use a significantly faster model](https://www.cerebras.ai/blog/case-study-cognition-x-cerebras)
to speed up some tasks like exploring the codebase.

## Tool calls

You can tell that many models are trained to use specific tool calls. E.g.
when prompting Kimi K2 Instruct with just "ls" it tries to use a tool
call called `shell` even though that doesn't exist.

More specifically, it required a lot of trial and error to make LLM models
write more complex programs for my golang sandbox tool call (e.g. building
an application and extracting key errors with a summarize method).

## Built-in assumptions

Model performance currently seems best when using built-in assumptions baked
in during the training process. This is especially problematic when using newer
external libraries than the training knowledge cut-off date.

## Context window woes

The current context windows of state-of-the-art models are at least 128k tokens.

But when some part of this is already used by the system prompt and tool call
descriptions and then the investigation of the codebase, the context window
is too little.

The biggest problem here is that model performance seems to drop off a cliff when
using large parts of the given context window.

E.g. when using Claude Opus 4.5, the latest, greatest and pricey model at the
time of writing, it seems to love to ignore the codebase style when 3/4 of its
context window is used up.

The compaction of the context window to allow seamlessly endless sessions is
also hard to steer to success. Transmitting information about code style and
problems solved over the summarization boundary feels like a roll of the dice. Also,
somehow telling the model that's only half the story leads to failure, because
then the model begins going through the codebase again, using up precious
tokens.

## Vision

Vision still doesn't really work. Even simple tasks like creating an HTML
file or SVG file from a screenshot make the model seem incompetent (e.g.
creating an SVG from a company logo).

## Closing words

I think we still have a long road ahead of us with the current state of
transformer models.

When you create agents for specific tasks, you have to optimize your program
for the specific model family.

We've come really far with the current LLM technology and I think a lot of "hacks"
on top of the current implementations can still improve the current state
significantly.

## Sources

- [Case Study - Cognition x Cerebras](https://www.cerebras.ai/blog/case-study-cognition-x-cerebras)
- [Context Length Alone Hurts LLM Performance Despite Perfect Retrieval](https://arxiv.org/abs/2510.05381)
- [Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell](https://arxiv.org/abs/2406.14673)
- [Vision language models are blind: Failing to translate detailed visual features into words](https://arxiv.org/abs/2407.06581)
- [Evaluating LLMs at Detecting Errors in LLM Responses](https://arxiv.org/abs/2404.03602)

---

[Impressum](https://codefionn.eu/impressum/) · [Datenschutzerklärung](https://codefionn.eu/datenschutz/) · [Mastodon](https://c.im/@codefionn)

© Copyright 2022-2026 Fionn Langhans