Real-Time AI Streaming in Blazor: SignalR Patterns for .NET 10
The difference between an AI feature users tolerate and one they love is usually 3 seconds. That's the rough threshold where waiting for a full LLM response feels broken. Show the first token in 400 ms and stream the rest at reading speed, and the same response feels instant — even though total wall-clock time is identical. Streaming isn't a nice-to-have. It's the UX.
This guide is the deep technical companion to a chat or assistant UI: how Blazor Server and .NET 10 stream AI token-by-token over SignalR, how to handle backpressure when the model out-produces the UI, what to do when a connection drops mid-response, and how to keep two browser tabs in sync without doubling token costs.
SignalRPersistent transport
.NET 10LTS runtime
5Streaming patterns
The streaming stack on Windows + IIS
IAsyncEnumerable
Microsoft.Extensions.AI returns IAsyncEnumerable<ChatResponseUpdate>. .NET 10's async stream primitive is the perfect producer for chunked AI output.
✅ Transport
SignalR over WebSocket
Blazor Server keeps a SignalR connection open already. Reuse the existing pipe — no separate channel, no auth handshake per chunk.
✅ UI
Blazor Server components
Reactive re-rendering scoped to the component's state field. No JavaScript bridge, no manual DOM diffing.
🟡 Alternative
Server-Sent Events
Use SSE for non-Blazor clients (mobile app, vanilla JS, third-party API consumers). Built-in Results.Stream + text/event-stream works directly.
🟡 Required
IIS WebSocket support
IIS supports WebSockets out of the box on Windows Server 2019+. Plesk control panel toggles the module per-site if needed.
🟡 Optional
Multi-instance backplane
Azure SignalR Service or Redis backplane if you scale horizontally. Not needed on a single dedicated app pool.
Quick reference: five streaming patterns
- The basic streaming component
Append each chunk to a string field, call StateHasChanged() after each chunk, and Blazor's diff algorithm patches only the text node that changed. The browser sees text appear as if typed. No JS interop, no manual DOM updates.
@page "/assistant"
@inject IChatClient ChatClient
<div class="response">@_response</div>
<input @bind="_input" />
<button @onclick="StreamResponseAsync" disabled="@_busy">Ask</button>
@code {
string _input = "";
string _response = "";
bool _busy;
async Task StreamResponseAsync()
{
_busy = true;
_response = "";
StateHasChanged();
var messages = new List<ChatMessage> { new(ChatRole.User, _input) };
await foreach (var update in ChatClient.GetStreamingResponseAsync(messages))
{
_response += update.Text;
StateHasChanged();
}
_busy = false;
}
}
This 30-line component is a complete streaming AI UI. A model that produces 50 tokens per second triggers ~50 StateHasChanged calls per second — Blazor's diff algorithm handles that comfortably on the server side. The diff is small (a few characters appended to one text node) and SignalR ships it down the existing connection.
- Throttled re-render for high-throughput models
Some models (especially smaller local ones) produce tokens faster than the network round-trip. Calling StateHasChanged on every token adds overhead for diminishing UX gain — humans don't read at 200 chars/second anyway. The fix: batch chunks and re-render every ~50 ms.
async Task StreamThrottledAsync()
{
var buffer = new StringBuilder();
var lastRender = Stopwatch.GetTimestamp();
var minIntervalMs = 50;
await foreach (var update in ChatClient.GetStreamingResponseAsync(messages))
{
buffer.Append(update.Text);
var elapsedMs = Stopwatch.GetElapsedTime(lastRender).TotalMilliseconds;
if (elapsedMs >= minIntervalMs)
{
_response = buffer.ToString();
StateHasChanged();
lastRender = Stopwatch.GetTimestamp();
}
}
// Final flush in case last chunk didn't trigger a render
_response = buffer.ToString();
StateHasChanged();
}
The visual effect is identical (text still appears smoothly), but the server does ~20 renders per second instead of 50, and SignalR sends fewer larger diffs instead of many tiny ones. On a busy app pool this materially reduces CPU.
- Cancellation: stop billing when the user leaves
Without explicit cancellation, the streaming loop continues even after the user navigates away — tokens keep flowing from the provider and you keep paying for them. On a chat-heavy app, this can be 10–20% of monthly token spend wasted on responses no human will ever read.
Tie the streaming token to the component's lifetime via IDisposable + a CancellationTokenSource:
@implements IDisposable
@code {
CancellationTokenSource _cts = new();
async Task StreamWithCancellationAsync()
{
var ct = _cts.Token;
await foreach (var update in ChatClient.GetStreamingResponseAsync(messages, cancellationToken: ct))
{
if (ct.IsCancellationRequested) break;
_response += update.Text;
StateHasChanged();
}
}
public void Dispose()
{
_cts.Cancel();
_cts.Dispose();
}
}
When the user navigates away, Blazor disposes the component, which cancels the token, which cancels the HTTP stream to the provider. Token billing stops mid-response.
- Resumable streams: recover from disconnects
SignalR auto-reconnects within a few seconds of a network blip. But if the disconnect happened during a streaming response, the partial output is on the server and the client has no way to ask for "the rest." For mission-critical responses (long-form reports, agent outputs), capture the full text server-side and let the client request a resume.
// Server: persist partial output keyed by message ID
public class StreamingMessageStore
{
private readonly ConcurrentDictionary<Guid, StreamingMessage> _active = new();
public StreamingMessage Start(Guid messageId)
{
var msg = new StreamingMessage { Id = messageId };
_active[messageId] = msg;
return msg;
}
public void Append(Guid messageId, string chunk)
{
if (_active.TryGetValue(messageId, out var msg))
msg.Buffer.Append(chunk);
}
public string? GetPartial(Guid messageId, int offset)
{
if (_active.TryGetValue(messageId, out var msg))
return msg.Buffer.ToString()[offset..];
return null;
}
public void Complete(Guid messageId) => _active.TryRemove(messageId, out _);
}
On reconnect, the Blazor component checks if a streaming message was in progress, asks the store for the portion past its last-known offset, and re-attaches to the stream. For most chat UIs this is overkill — a refresh works fine. For long agent jobs, it's the difference between "the user wasted 5 minutes" and "the user picks up where they left off."
- Multi-tab sync without double-billing
A user opens the chat in two tabs. Both subscribe to the same conversation. When one tab sends a message, both should see the streaming response — but you must not call the LLM twice. The pattern: route the streaming response through a server-side broadcast, with the LLM call running once and fan-out happening at the SignalR layer.
// One LLM call per message. Multiple subscribers fan out from a Channel.
public class BroadcastStreamer
{
private readonly ConcurrentDictionary<Guid, Channel<string>> _channels = new();
public ChannelReader<string> Subscribe(Guid messageId) =>
_channels.GetOrAdd(messageId, _ => Channel.CreateUnbounded<string>()).Reader;
public async Task StreamAsync(Guid messageId, IAsyncEnumerable<string> source)
{
var channel = _channels.GetOrAdd(messageId, _ => Channel.CreateUnbounded<string>());
await foreach (var chunk in source)
await channel.Writer.WriteAsync(chunk);
channel.Writer.Complete();
_channels.TryRemove(messageId, out _);
}
}
Each Blazor circuit subscribes to Reader. The single LLM call writes to the channel; both tabs consume. Cost: one LLM round trip. UX: synchronized streaming across N tabs.
IIS configuration for production
Streaming AI over a persistent WebSocket is the antithesis of serverless. Each connected user holds a circuit; cold starts kill UX; idle timeouts kill long agent runs. Adaptive Web Hosting's dedicated Windows + IIS app pools were built for exactly this: persistent connections, configurable idle timeouts, and predictable per-request resources.
Three IIS settings worth knowing for streaming workloads:
WebSocket module enabled. Windows Server includes the module; Adaptive Web Hosting plans enable it by default. Confirm via Plesk → Apache & nginx Settings → WebSockets is allowed.
App pool idle timeout. Default is 20 minutes. For streaming-heavy apps where some users may sit idle on a long agent task, increase to 0 (no timeout) on the IIS app pool's Advanced Settings.
Application Initialization. Pre-warm the worker process so the first request after a deploy doesn't take 5 seconds to spin up. Plesk has a one-click toggle for this.
SignalR message size limits
SignalR's default MaximumReceiveMessageSize is 32 KB. A single long token chunk shouldn't approach this, but a render pass with thousands of accumulated tokens might. Configure it explicitly:
builder.Services.AddServerSideBlazor(o =>
{
o.MaxBufferedUnacknowledgedRenderBatches = 50; // back-pressure threshold
})
.AddHubOptions(o =>
{
o.MaximumReceiveMessageSize = 256 * 1024; // 256 KB ceiling
});
Hosting recommendations
ASP.NET Business — $17.49/mo
Production AI assistants with moderate traffic. 2 GB per pool gives headroom for many open circuits. Most-common tier.
View Business plan →
ASP.NET Professional — $27.49/mo
High-traffic public AI products, multi-tenant streaming platforms. 4 GB per pool, highest priority scheduling.
View Professional plan →
FAQs
Is SignalR slower than raw WebSockets?
Marginally — SignalR adds a small framing overhead and a fallback negotiation. In exchange you get auto-reconnect, transport fallback, type-safe hubs, and integration with Blazor Server. For 99% of cases, SignalR is the right choice.
Can I stream to a Blazor WebAssembly app instead of Blazor Server?
Yes, but the model API key must stay off the client. Use an API project that holds the key, exposes a SignalR hub or SSE endpoint, and proxies the streaming response. The WASM app subscribes to the hub. More moving parts than Blazor Server, but works.
What about Server-Sent Events?
SSE is one-way (server → client only) and works perfectly for stream-out AI. Minimal API: app.MapGet("/stream", () => Results.Stream(...)) with Content-Type: text/event-stream. Choose SSE for non-Blazor clients or simple HTTP-style integration. Choose SignalR for Blazor + bidirectional needs.
How do I avoid duplicate messages on reconnect?
Tag every message with a client-generated GUID. The server stores recently-processed IDs in a ~5-minute LRU cache. On reconnect, the client replays from its last-known position; the server rejects duplicates by ID. Idempotent by construction.
Do I need Azure SignalR Service?
Only for multi-instance deployments. On a single dedicated app pool (the standard Adaptive Web Hosting setup), in-process SignalR is the simplest and most efficient option.
What's the practical limit on concurrent streams?
Each Blazor Server circuit is ~250-500 KB depending on component complexity. A 2 GB pool comfortably holds 1,000+ open circuits with active streams. The real ceiling is usually outbound model API rate limits, not server resources.
Ship it
Streaming is the single biggest UX upgrade you can give an AI feature. .NET 10 + Blazor Server + Microsoft.Extensions.AI ships it natively — no JS, no separate API, no manual transport plumbing. Adaptive Web Hosting's ASP.NET hosting plans are built for exactly this kind of persistent-connection workload on real Windows + IIS, with WebSocket support on by default and SQL Server 2022 included for conversation persistence.