April 28, 2026

Orchestrating Long-Running AI Workflows

aisystemsorchestrationstreamingbackendsnaptrude

How we went from a monolithic AI agent that took 30 minutes and couldn't be interrupted, to a modular workflow engine with real-time streaming, mid-generation changes, and session resilience.

It started as one big prompt

Snaptrude lets architects describe buildings in natural language and have an AI generate them inside a live collaborative canvas. Type "design a 300-bed hospital with a central atrium" and the system builds out departments, floor layouts, spatial arrangements, and structural grids.

When we first shipped this, it was a single monolithic LLM call. Everything packed into one prompt, one response, one long wait. For small inputs it worked. For real projects it fell apart. Context windows overflowed. The model lost coherence halfway through. And users sat staring at a spinner for 20 to 30 minutes with no sense of progress.

We deployed it, watched users use it, and the feedback was immediate: this feels like a black box. We knew we had to fix it.

Problem one: users couldn't see anything happening

The first thing users complained about wasn't the quality — it was the silence. No indication of what the AI was doing, no partial results, nothing to look at until the whole thing was done. On longer generations they'd refresh the page thinking it had crashed.

We broke the monolithic call into a sequence of focused steps — site analysis, department generation, space layout, floor assignment, structural grid, and more. Each step had a single responsibility and streamed its output as it ran. The moment a step completed, results appeared on the canvas. Users could watch the building take shape in real time.

I built the workflow engine that orchestrates this — loading step definitions, resolving dependencies, executing steps in the right order, and emitting progress events to the frontend via Server-Sent Events throughout. The system spans four phases: pre-processing, core design, storey assignment, and final layout, with several steps running in parallel where dependencies allow. End-to-end generation time dropped 3 to 4x.

Users stopped thinking the system had crashed.

Problem two: users always wanted to change something mid-way

Once users could see the building being built, a new problem surfaced. They'd watch the floor plan appear, notice something was off — too many rooms on one floor, wrong department sizes — and want to fix it immediately. In the monolithic system, any change meant cancelling everything and starting over from scratch. A 20-minute wait, again.

The modular design made this solvable. Each step in the system has declared dependencies. When a user provides feedback mid-generation or after seeing an initial result, a classifier — itself an LLM step — reads the feedback alongside the current generation state and maps it to the specific steps that need to re-run. A space sizing change only re-runs the space generation steps and everything downstream. Everything upstream stays cached.

What previously meant a full restart now takes 2 to 3 minutes. Users started iterating on buildings the way you'd iterate on a document — make a change, see the result, make another change.

Problem three: connections dropped and everything was lost

The third problem was resilience. These generation workflows run for a long time. Long enough that network conditions change. A user on a mobile connection, a brief VPN dropout, a laptop going to sleep — any of these would kill the session and force a full restart.

We made the entire execution state persistent. Every step's output, every event emitted, every position in the workflow — all stored in PostgreSQL as it happened. When a client reconnected, it sent its last known event position. The engine replayed everything it had missed and resumed the live stream from where the client left off.

A user dropping off for 30 seconds during a 20-minute generation picks up exactly where they left off. The canvas state, the in-progress step, the streaming output — all restored transparently. Most users never even knew the connection had dropped.

Takeaways

The AI generation itself was not the hard part. The hard part was making it feel like a product — responsive, interruptible, and resilient. The three problems we solved were all user experience problems that had engineering solutions underneath.

Watching users go from frustrated to confident with the tool was the best signal that we'd gotten it right.

Orchestrating Long-Running AI Workflows

It started as one big prompt

Problem one: users couldn't see anything happening

Problem two: users always wanted to change something mid-way

Problem three: connections dropped and everything was lost

Takeaways

Contact

Talk to me about engineering, fast cars, or anything that gets the adrenaline going.

Theme

Accent color

Gray color

Appearance

Radius

Scaling

Panel background