Exploring .NET for Machine Learning Applications

"Use Python for ML, .NET for production" was reasonable advice in 2020. It's outdated in 2026. ML.NET, ONNX runtime, and the new Microsoft.Extensions.AI abstractions have matured to the point where a substantial chunk of production ML workloads — tabular inference, computer vision, embeddings, and even small language models — runs perfectly inside .NET services with no Python in the path.

This is the practical .NET ML use-case map for 2026. Five patterns we see in production, ranked by deployability, with honest lines on where the Python ecosystem still has the edge.

ONNXCross-framework runtime

.NET 10LTS — current for

FREESSL + WAF on every plan

The .NET ML ecosystem in 2026

ML.NET

Tabular ML, time-series, recommendations, anomaly detection. Trains AND infers natively in .NET.

✅ Cross-framework

ONNX Runtime

Import PyTorch / TensorFlow / scikit models. Inference in .NET with full hardware acceleration.

✅ AI abstractions

Microsoft.Extensions.AI

Unified IChatClient + IEmbeddingGenerator across OpenAI, Anthropic, Ollama, ONNX-local. Swap providers without rewriting.

🟡 Bring your own

Large-model training

Training transformers / large vision models still belongs in Python on GPU clusters. .NET does inference; Python does training.

🟡 Bring your own

Vector database

Embeddings live in pgvector / Qdrant / Pinecone / Milvus. .NET clients exist for all of them.

🟡 Bring your own

Frontier LLM hosting

GPT-4-class and frontier-tier models live behind provider APIs. Microsoft.Extensions.AI abstracts the client.

Quick reference: five .NET ML patterns

  • Embedded Tabular ML

Fraud scoring, churn prediction, lead scoring, recommendation systems, dynamic pricing. The 80% of production ML in B2B SaaS is structured-data classification or regression — exactly what ML.NET was designed for. Train once, embed the model, infer in milliseconds inside your existing .NET service.

// Training pipeline — runs offline, output is a single .zip model artifact

var ml = new MLContext(seed: 42);

var data = ml.Data.LoadFromTextFile<Transaction>("transactions.csv", separatorChar: ',', hasHeader: true);

var pipeline = ml.Transforms.Categorical.OneHotEncoding("Merchant")

.Append(ml.Transforms.Concatenate("Features",

nameof(Transaction.Amount), nameof(Transaction.HourOfDay),

nameof(Transaction.Merchant), nameof(Transaction.CardTenure)))

.Append(ml.BinaryClassification.Trainers.FastTree(

labelColumnName: nameof(Transaction.IsFraud),

featureColumnName: "Features",

numberOfTrees: 200));

var model = pipeline.Fit(data);

ml.Model.Save(model, data.Schema, "fraud-model.zip");

// Runtime — load once, use forever (singleton in DI)

public class FraudScorer

{

private readonly PredictionEngine<Transaction, FraudPrediction> _engine;

public FraudScorer(MLContext ml, IConfiguration cfg)

{

var model = ml.Model.Load(cfg["Models:Fraud"], out _);

_engine = ml.Model.CreatePredictionEngine<Transaction, FraudPrediction>(model);

}

public float ScoreTransaction(Transaction t)

=> _engine.Predict(t).Probability;

}

A FastTree model with 200 trees scores in <100 microseconds per row on a modern CPU. Faster than the SQL query you're enriching. No GPU required; runs on any ASP.NET Core hosting tier — even the $9.49 Developer plan is plenty for thousands of scores per second.

  • ONNX Cross-Framework Inference

Your data team trains models in PyTorch / TensorFlow / scikit-learn. Export to ONNX. Load and infer in .NET with the same numerical results, no Python in the production path, no separate inference cluster.

Python side (training pipeline) — export the trained model to ONNX

import torch

import torch.onnx

Assume model + dummy input tensor exist

torch.onnx.export(

model, # PyTorch model

dummy_input, # tensor with the right shape

"churn-predictor.onnx",

input_names=["features"],

output_names=["probability"],

dynamic_axes={"features": {0: "batch_size"}},

opset_version=18,

)

// .NET side — load the .onnx file, infer with full SIMD / hardware acceleration

public class ChurnPredictor : IDisposable

{

private readonly InferenceSession _session;

public ChurnPredictor(IConfiguration cfg)

{

var opts = new SessionOptions { EnableMemoryPattern = true };

opts.EnableCpuMemArena = true;

_session = new InferenceSession(cfg["Models:ChurnPath"]!, opts);

}

public float Predict(float[] features)

{

using var input = OrtValue.CreateTensorValueFromMemory(

features, new long[] { 1, features.Length });

var inputs = new Dictionary<string, OrtValue> { ["features"] = input };

using var results = _session.Run(new RunOptions(), inputs, _session.OutputNames);

return results[0].GetTensorDataAsSpan<float>()[0];

}

public void Dispose() => _session.Dispose();

}

Tree ensembles, neural nets, transformers up to a few billion parameters, vision models, embedding models, audio models. The ecosystem is broad enough that ~95% of production-trained models export cleanly. Edge cases: very recent research architectures, custom CUDA kernels, model-specific quantization formats.

  • AutoML for Non-Experts

Most B2B teams don't have a data scientist on staff. Microsoft.ML.AutoML automates the model-selection step so a backend engineer can ship a working prediction model in an afternoon:

var ml = new MLContext(seed: 42);

var data = ml.Data.LoadFromTextFile<Lead>("leads.csv", hasHeader: true, separatorChar: ',');

// AutoML tries multiple algorithms, picks the best

var settings = new BinaryExperimentSettings

{

MaxExperimentTimeInSeconds = 600,

OptimizingMetric = BinaryClassificationMetric.AreaUnderRocCurve,

CacheDirectoryName = "automl-cache"

};

var experiment = ml.Auto().CreateBinaryClassificationExperiment(settings);

var result = experiment.Execute(data, labelColumnName: nameof(Lead.Converted));

Console.WriteLine($"Best trainer: {result.BestRun.TrainerName}");

Console.WriteLine($"AUC: {result.BestRun.ValidationMetrics.AreaUnderRocCurve:F3}");

// Save and load the best model just like a hand-tuned one

ml.Model.Save(result.BestRun.Model, data.Schema, "lead-scorer.zip");

AutoML doesn't beat a hand-tuned model from an expert team — but it routinely lands within 2-3% of expert performance, in 1% of the time. For most non-mission-critical predictions (lead scoring, content recommendation, anomaly thresholds), that's a strong tradeoff.

  • Computer Vision Inference

YOLO for object detection, ResNet for classification, ViT for fine-grained visual tasks, CLIP for image-text embeddings. All have ONNX exports; all run in .NET with first-class hardware acceleration.

// Object detection — YOLOv8-nano on incoming images

public class ObjectDetector(IConfiguration cfg) : IDisposable

{

private readonly InferenceSession _session =

new(cfg["Models:Yolov8Nano"]!,

new SessionOptions { EnableCpuMemArena = true });

public List<Detection> Detect(byte[] imageBytes)

{

using var img = Image.Load<Rgb24>(imageBytes);

img.Mutate(c => c.Resize(640, 640));

// Convert image to NCHW float tensor

var tensor = new DenseTensor<float>(new[] { 1, 3, 640, 640 });

// ... (pixel-to-tensor copy omitted for brevity)

using var input = OrtValue.CreateTensorValueFromMemory(tensor.Buffer.Span, new long[] { 1, 3, 640, 640 });

using var results = _session.Run(new RunOptions(),

new Dictionary<string, OrtValue> { ["images"] = input },

_session.OutputNames);

return PostProcessYoloOutput(results[0].GetTensorDataAsSpan<float>());

}

public void Dispose() => _session.Dispose();

}

YOLOv8-nano on CPU: ~30-50 ms per inference. YOLOv8-small: ~80-120 ms. Production use cases include content moderation (reject inappropriate uploads at the API layer), receipt OCR pre-processing, manufacturing defect detection, automated tagging. The ~4 GB RAM ceiling on ASP.NET Professional ($27.49) comfortably holds the model + headroom for typical concurrent inference.

  • Provider-Agnostic AI Clients (Microsoft.Extensions.AI)

For workloads where you need a hosted LLM (RAG over your docs, structured extraction, classification at scale), Microsoft.Extensions.AI gives you one abstraction: IChatClient. Swap between OpenAI, Anthropic, AWS Bedrock, local Ollama, or a small ONNX model — same interface, configuration-driven.

// Register a provider — config switches it without code changes

builder.Services.AddChatClient(sp => new OpenAIClient(

builder.Configuration["AI:ApiKey"]!).AsChatClient("gpt-4-class-model"));

// Or use Ollama for a local model

// builder.Services.AddChatClient(new OllamaChatClient(new Uri("http://localhost:11434"), "phi-3-mini"));

// Inject and call — no provider-specific code

public class SupportTicketClassifier(IChatClient chat)

{

public async Task<string> ClassifyAsync(string subject, string body)

{

var response = await chat.GetResponseAsync<TicketClassification>(

new[]

{

new ChatMessage(ChatRole.System,

"Classify support tickets. Output JSON with category and urgency."),

new ChatMessage(ChatRole.User, $"Subject: {subject}\n\nBody: {body}")

});

return response.Result.Category;

}

}

public record TicketClassification(string Category, string Urgency);

IEmbeddingGenerator<string, Embedding<float>> abstracts every embedding provider. Pair with pgvector on your SQL Server-adjacent Postgres, or a managed vector DB like Pinecone / Qdrant. RAG over a B2B knowledge base is a 200-line .NET service.

When to reach for Python instead

🟡 Cutting-edge research models

Papers ship with Python reference implementations. Custom CUDA kernels, novel architectures, exotic loss functions — Python first, ONNX export later (sometimes years later, sometimes never).

🟡 Notebook-driven exploration

Data scientists genuinely live in Jupyter. .NET Interactive exists but the muscle memory + library ecosystem (pandas, polars, seaborn, statsmodels) is firmly Python.

🟡 Distributed training

Ray, Horovod, FSDP — the multi-GPU / multi-node training stack is Python. Even Microsoft's frontier-model training pipelines are PyTorch + DeepSpeed.

The healthy split for most B2B teams: train in Python, ship to .NET. Your data team's iteration speed stays high; your production team's deployment surface stays single-stack.

Reference RAG architecture in .NET

Production readiness checklist

✅ Inference Performance

PredictionEngine or InferenceSession as singleton

Batch inference where supported (10-100× throughput gain)

Profile P95 latency, not just mean

Circuit breaker around external LLM API calls

✅ Monitoring & Drift

Log every prediction + label (when known) for drift detection

Alert on accuracy drop or input distribution shift

Sample inputs to S3 for offline retraining

Track model version in OpenTelemetry traces

✅ Security

Model URLs signed with short TTLs

Verify model checksum before loading

Input validation prevents prompt injection

PII redacted before sending to external LLM

Hosting-Layer Capabilities Adaptive Provides

NeedWhat's Included

Modern .NET runtimes.NET 8 LTS, .NET 10 LTS — both support ML.NET, ONNX Runtime, Microsoft.Extensions.AI

DatabaseReal SQL Server 2022 — Always Encrypted on training-data PII, ledger tables for audit

Memory headroom1 GB (Developer) / 2 GB (Business) / 4 GB (Professional) — sized for typical ML.NET + ONNX inference

NetworkTLS 1.3 termination, FREE SSL on every site, post-quantum-ready on .NET 10 LTS

IsolationDedicated IIS Application Pools per site — one model spike can't starve other apps

What you bringTraining cluster, vector database, frontier-LLM API keys, GPU inference (none of these run on AWH)

InfrastructureAWS US-East data center, 99.99% uptime SLA, 30-day money-back guarantee

Choose a plan

$9.49/mo

Lightweight tabular ML, lead scoring, recommendation engines. Up to ~100 MB model fits comfortably.

View Developer plan →

Popular

ASP.NET Business

$17.49/mo

Vision inference, RAG orchestration, multiple embedded models. Up to ~500 MB total model footprint.

View Business plan →

ASP.NET Professional

$27.49/mo

Multiple ML services or larger models (small LLMs via ONNX). Up to ~2 GB working set.

View Professional plan →

Frequently Asked Questions

Can I train models on Adaptive's hosting?

Light ML.NET training on small-to-medium datasets works fine. For real training workloads — gradient boosting on millions of rows, neural network training, anything GPU-bound — train on a dedicated platform (Azure ML, AWS SageMaker, Databricks, or your laptop with a GPU) and deploy the trained model to AWH. The training-vs-serving split is the standard production pattern.

Do I need a GPU for inference?

For most tabular ML and small computer-vision models, no — CPU inference on .NET via ONNX runtime hits single-digit-millisecond latency. For larger transformer models (LLMs, large vision models with 1B+ parameters), GPU helps a lot. AWH doesn't offer GPU compute; for GPU inference, point your AWH-hosted orchestrator at a managed LLM API (OpenAI, Anthropic, Bedrock) or a dedicated GPU inference cluster.

What's Microsoft.Extensions.AI and why should I care?

It's the .NET 9+ standard interface for chat clients, embedding generators, and AI services. Same code works against OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, local Ollama, or a small ONNX model — provider choice becomes a configuration decision, not a code rewrite. Especially valuable for B2B SaaS that needs to support customer-bring-your-own-key scenarios.

How do I do RAG (retrieval-augmented generation) in .NET?

Three components: (1) an embedding generator (via Microsoft.Extensions.AI) turns documents into vectors, (2) a vector DB (pgvector on Postgres, Qdrant, Pinecone) stores them, (3) at query time, embed the question, retrieve top-K chunks, send chunks + question to a chat client. The orchestration is ~200 lines of .NET — see the reference architecture above. Adaptive hosts the orchestration; the vector DB lives elsewhere.

What about Semantic Kernel?

Microsoft's higher-level orchestration framework built on top of Microsoft.Extensions.AI. Useful if you're building agent-style applications with multiple tool calls and complex workflows. For straightforward RAG or single-call inference, plain Microsoft.Extensions.AI is simpler. Both run on the same .NET runtime on the same AWH hosting.

How do I monitor model drift in production?

Log every prediction (input, output, model version) to a separate analytics table or stream. Periodically: compare input distribution to training distribution (data drift), compare prediction accuracy on labeled feedback to baseline (concept drift). Alert when either drifts past thresholds. Most teams retrain quarterly; some monthly. Tools like Evidently AI ship Python-first but have OSS .NET ports for the most common metrics.

Can ML.NET handle real-time anomaly detection on streaming data?

Yes — ML.NET ships built-in spike and change-point detection transformers. Combined with a BackgroundService reading from a queue or MQTT broker, you can stream-process tens of thousands of events per second per service instance. We use this pattern in the time-series IoT scenario covered in our .NET IoT use cases article (strategy #4).

What if I need bigger models than fit in 4 GB?

Several options: (1) quantize the model — INT8 or INT4 quantization typically halves model size with minimal accuracy loss, fits in less RAM; (2) move the model behind a separate inference service (Azure ML endpoint, dedicated VM with GPU) and have your AWH-hosted service call it; (3) use a hosted LLM API instead of self-hosting. For most B2B use cases, options 1 + 3 cover the field without needing dedicated GPU infra.

Bottom line

.NET's ML story in 2026 is much stronger than the "use Python" reputation suggests. ML.NET handles the tabular 80%, ONNX runtime imports anything trained in PyTorch / TensorFlow / scikit, Microsoft.Extensions.AI abstracts every hosted LLM provider, and embedded inference runs faster than the SQL query you're enriching. The only places Python is still clearly ahead: training large models, distributed training, notebook-driven exploration, cutting-edge research. For deployment, .NET is now a first-class ML runtime.

On Adaptive Web Hosting, the .NET ML primitives — modern runtimes, real SQL Server 2022 for training data, dedicated IIS Application Pools with predictable RAM budgets, FREE SSL, and post-quantum-ready TLS on .NET 10 LTS — are included on every tier. ASP.NET Developer ($9.49/mo) for lightweight tabular models, ASP.NET Business ($17.49/mo) for vision + RAG orchestration, ASP.NET Professional ($27.49/mo) for multi-model deployments. Every plan ships with a 30-day money-back guarantee.

If you're integrating ML into a Blazor UI, our enterprise Blazor patterns covers the dashboard side. For the cloud-native architecture surrounding the ML service, see our .NET cloud-native use cases (which covers ONNX inference at pattern #5). For securing the API exposing your model, ASP.NET Core API security strategies applies directly. View all plans or talk to an ML engineer about a specific scenario.

Back to Blog