Self-Hosted Coding AI Case Study

Building a Coding AI Portal Without Pretending the Model Is the Product

A local-first AI assistant for potential clients asking software architecture, debugging, Firebase cost, MVP scope, and codebase-risk questions.

The original work is not simply running an open model. It is the product layer around it: BVT-specific retrieval, client-intake workflow, evaluation, rate limits, safety rules, Mac-hosted backend infrastructure, and human handoff.

View Prototype See Architecture

Prototype shell

Client Coding Question Intake

This page currently demonstrates the workflow. The production version connects to a Mac-hosted backend running a local model behind a custom API.

BVT Coding AI

"My Firebase app works, but screens are slow and reads are climbing. Is this an architecture problem?"

A useful assistant should not guess from vibes. It should ask about listener usage, query shape, high-traffic screens, data duplication, and production risk before recommending a rewrite.

Architecture Firebase Cost Human Handoff

Generate an Intake Brief

Question type

Tech stack

Client question

Choose a question type and generate a brief.

Restarted direction

What Is Actually Original Here?

Running Ollama with someone else's model is useful infrastructure. The original project is the consulting-specific AI product built around that model.

BVT Knowledge Layer

Approved context from site pages, blog posts, tools, case studies, service pages, and engineering philosophy.

Client Triage Workflow

Questions are routed into architecture, debugging, Firebase cost, MVP scope, code review, and launch-risk patterns.

Evaluation And Handoff

Realistic client prompts measure usefulness, caution, refusal behavior, and when the assistant should send someone to Bill.

Production Shape

Local-first V0

GitHub Pages hosts the portal. A FastAPI backend runs on a dedicated MacBook, Mac mini, or Mac Studio. Ollama runs the local model. Cloudflare Tunnel exposes the API.

Cloud-ready later

The API can later route heavy traffic or larger models to GPU cloud compute without changing the public website experience.

Build Order

1. Self-host an existing coding model

Start with a local open-weight model through Ollama. No training is required for the first useful version.

2. Add BVT retrieval and prompt rules

The assistant becomes useful by retrieving relevant BVT context and answering in a practical client-intake style.

3. Add evals before fine-tuning

Collect real client-style questions and score the answers before deciding whether fine-tuning is worth the added complexity.

4. Keep from-scratch training as a lab track

A tiny transformer trained from scratch can be documented as a learning artifact. It should not be confused with the production assistant.

Discuss a Self-Hosted AI Portal AI Integration Consulting

What Clients Say

Verified reviews from real projects

“Amazing in communication.”

⭐⭐⭐⭐⭐

Client · iOS App (Swift & Firebase)

“Went above and beyond.”

⭐⭐⭐⭐⭐

Client · Firebase Integration Revamp

“It was great working with Bill! Very pleasant and knowledgeable.”

⭐⭐⭐⭐⭐

Client · Language Learning App

View All 30+ Verified Reviews