How I Built a Local LLM Infrastructure and Exposed It Securely Across Networks

AI

Running local LLMs has become significantly more practical with modern consumer GPUs. What started as experimentation with local inference quickly evolved into a lightweight AI infrastructure setup capable of powering development tools, coding assistants, automation workflows, and remote AI agents from anywhere.

In this article, I’ll walk through how I built a local LLM environment on my Windows machine using LM Studio and exposed it securely over the internet so it could be used from another network, including work laptops, VSCode extensions, and automation platforms like n8n.


The Goal

The primary objective was simple:

  • Run a local LLM on my home PC using my GPU
  • Access the model remotely from another network
  • Use it with:
    • VSCode Continue.dev
    • OpenAI-compatible applications
    • AI agents
    • n8n workflows
    • future Laravel integrations
  • Avoid direct router port forwarding
  • Keep the setup secure and maintainable

This eventually became a lightweight self-hosted AI infrastructure stack.


The Hardware

The setup was powered by a consumer gaming PC equipped with:

  • RTX 5060 Ti 16GB
  • Ryzen CPU
  • 32GB RAM
  • Windows 11

The GPU memory was sufficient for running smaller and mid-sized coding models comfortably through LM Studio.


The Core Components

The architecture consisted of four major components:

LM Studio
→ Caddy Reverse Proxy
→ Cloudflare Tunnel
→ Remote Clients

Each layer had a specific responsibility.


Step 1 — Running the Local LLM

I used LM Studio because it provides:

  • local model management
  • GPU acceleration
  • OpenAI-compatible APIs
  • simple server controls
  • support for GGUF models

Once the model was downloaded, LM Studio exposed an OpenAI-compatible API locally:

http://localhost:1234/v1

This immediately made the local model compatible with:

  • Continue.dev
  • OpenAI SDKs
  • n8n AI Agents
  • Open WebUI
  • custom applications

Testing the endpoint:

curl http://localhost:1234/v1/models

At this stage, the API was only available on my local machine.


Why I Did NOT Expose LM Studio Directly

Exposing LM Studio directly to the internet would have been risky because:

  • no proper public authentication
  • no rate limiting
  • no HTTPS termination
  • no routing controls
  • no centralized gateway

Instead, I introduced a reverse proxy.


Step 2 — Introducing a Reverse Proxy

I used Caddy as the reverse proxy.

The purpose of the reverse proxy was to:

  • act as the public-facing gateway
  • forward traffic internally
  • allow future authentication
  • centralize networking
  • support multiple services later

The architecture became:

Internet
→ Caddy
→ LM Studio

My Caddy configuration:

:8080 {
reverse_proxy localhost:1234
}

Now instead of exposing LM Studio directly:

localhost:1234

I exposed:

localhost:8080

Caddy handled all incoming requests and forwarded them internally to LM Studio.


Understanding Reverse Proxy Architecture

A reverse proxy is essentially a traffic manager.

Instead of clients connecting directly to internal applications, they connect to the reverse proxy first.

This provides several advantages:

  • centralized security
  • HTTPS management
  • routing
  • authentication
  • abstraction of internal services

The reverse proxy becomes the single entry point for all applications.

Future expansion becomes straightforward:

llm.domain.com {
reverse_proxy localhost:1234
}

n8n.domain.com {
reverse_proxy localhost:5678
}

app.domain.com {
reverse_proxy localhost:8000
}

Step 3 — Exposing the API Securely with Cloudflare Tunnel

Rather than opening firewall ports or configuring router forwarding, I used Cloudflare Tunnel.

This was one of the most important architectural decisions.

Cloudflare Tunnel works by:

  • creating an outbound encrypted tunnel from the PC
  • securely routing traffic through Cloudflare’s network
  • avoiding direct inbound exposure

The flow became:

Remote Client
→ Cloudflare
→ Tunnel
→ Caddy
→ LM Studio

Starting the tunnel was surprisingly simple:

cloudflared tunnel --url http://127.0.0.1:8080

Cloudflare generated a public HTTPS endpoint:

https://random-name.trycloudflare.com

At this point, the local LLM was accessible securely from anywhere.


Why I Used 127.0.0.1 Instead of localhost

One issue I encountered was:

Unable to reach the origin service

The problem was that localhost resolved to IPv6 (::1) while Caddy was listening on IPv4.

Switching to:

127.0.0.1

resolved the issue immediately.

This is a subtle but common networking issue on Windows.


Using the Local LLM from Another Network

Once the tunnel was active, the endpoint became:

https://random-name.trycloudflare.com/v1

This allowed remote applications to use the local model exactly like OpenAI APIs.


Using Continue.dev with the Remote LLM

One of the most practical integrations was VSCode Continue.dev.

The configuration looked like this:

{
"models": [
{
"title": "Home LM Studio",
"provider": "openai",
"model": "local-model",
"apiBase": "https://random-name.trycloudflare.com/v1",
"apiKey": "dummy"
}
]
}

This effectively transformed my home PC into a remote AI coding server.

Now my work laptop could:

  • use my home GPU
  • run coding assistants
  • generate code
  • explain repositories
  • create components
  • review Laravel/Vue code

all remotely.


Integrating with n8n and AI Agents

Since LM Studio exposed an OpenAI-compatible API, it also integrated easily with:

  • n8n AI Agent nodes
  • OpenAI SDKs
  • custom Laravel applications
  • OpenClaw-style agents
  • automation systems

This meant my local infrastructure could power:

  • research agents
  • developer agents
  • workflow automation
  • document analysis
  • content generation

without paying per-token cloud costs.


Security Considerations

This setup works extremely well, but security matters.

Key lessons:

  • never expose raw local services directly
  • avoid router port forwarding
  • use reverse proxies
  • prefer tunnels over open ports
  • eventually add authentication

For production-grade usage, the next steps would include:

  • Cloudflare Access authentication
  • API key validation
  • rate limiting
  • request logging
  • named Cloudflare tunnels
  • custom domains

Final Architecture

The final infrastructure looked like this:

Work Laptop / Remote Apps
→ Cloudflare Tunnel
→ Caddy Reverse Proxy
→ LM Studio
→ Local GPU

This architecture is surprisingly scalable for:

  • local AI labs
  • self-hosted development assistants
  • agent-based workflows
  • remote GPU inference
  • private coding copilots

Final Thoughts

What started as a local AI experiment became a lightweight distributed AI infrastructure.

The most important realization was this:

Modern local LLM infrastructure is no longer just about running models locally. It is about exposing those models safely, integrating them into development workflows, and treating them like internal platform services.

With a consumer GPU, reverse proxy, and secure tunnel, it is now possible to build highly capable AI systems from home infrastructure that integrate seamlessly with professional workflows across networks and devices.