Running local LLMs has become significantly more practical with modern consumer GPUs. What started as experimentation with local inference quickly evolved into a lightweight AI infrastructure setup capable of powering development tools, coding assistants, automation workflows, and remote AI agents from anywhere.

In this article, I’ll walk through how I built a local LLM environment on my Windows machine using LM Studio and exposed it securely over the internet so it could be used from another network, including work laptops, VSCode extensions, and automation platforms like n8n.

The Goal

The primary objective was simple:

Run a local LLM on my home PC using my GPU
Access the model remotely from another network
Use it with:
- VSCode Continue.dev
- OpenAI-compatible applications
- AI agents
- n8n workflows
- future Laravel integrations
Avoid direct router port forwarding
Keep the setup secure and maintainable

This eventually became a lightweight self-hosted AI infrastructure stack.

The Hardware

The setup was powered by a consumer gaming PC equipped with:

RTX 5060 Ti 16GB
Ryzen CPU
32GB RAM
Windows 11

The GPU memory was sufficient for running smaller and mid-sized coding models comfortably through LM Studio.

The Core Components

The architecture consisted of four major components:

LM Studio
→ Caddy Reverse Proxy
→ Cloudflare Tunnel
→ Remote Clients

Each layer had a specific responsibility.

Step 1 — Running the Local LLM

I used LM Studio because it provides:

local model management
GPU acceleration
OpenAI-compatible APIs
simple server controls
support for GGUF models

Once the model was downloaded, LM Studio exposed an OpenAI-compatible API locally:

http://localhost:1234/v1

This immediately made the local model compatible with:

Continue.dev
OpenAI SDKs
n8n AI Agents
Open WebUI
custom applications

Testing the endpoint:

curl http://localhost:1234/v1/models

At this stage, the API was only available on my local machine.

Why I Did NOT Expose LM Studio Directly

Exposing LM Studio directly to the internet would have been risky because:

no proper public authentication
no rate limiting
no HTTPS termination
no routing controls
no centralized gateway

Instead, I introduced a reverse proxy.

Step 2 — Introducing a Reverse Proxy

I used Caddy as the reverse proxy.

The purpose of the reverse proxy was to:

act as the public-facing gateway
forward traffic internally
allow future authentication
centralize networking
support multiple services later

The architecture became:

Internet
→ Caddy
→ LM Studio

My Caddy configuration:

:8080 {
    reverse_proxy localhost:1234
}

Now instead of exposing LM Studio directly:

localhost:1234

I exposed:

localhost:8080

Caddy handled all incoming requests and forwarded them internally to LM Studio.

Understanding Reverse Proxy Architecture

A reverse proxy is essentially a traffic manager.

Instead of clients connecting directly to internal applications, they connect to the reverse proxy first.

This provides several advantages:

centralized security
HTTPS management
routing
authentication
abstraction of internal services

The reverse proxy becomes the single entry point for all applications.

Future expansion becomes straightforward:

llm.domain.com {
    reverse_proxy localhost:1234
}

n8n.domain.com {
    reverse_proxy localhost:5678
}

app.domain.com {
    reverse_proxy localhost:8000
}

Step 3 — Exposing the API Securely with Cloudflare Tunnel

Rather than opening firewall ports or configuring router forwarding, I used Cloudflare Tunnel.

This was one of the most important architectural decisions.

Cloudflare Tunnel works by:

creating an outbound encrypted tunnel from the PC
securely routing traffic through Cloudflare’s network
avoiding direct inbound exposure

The flow became:

Remote Client
→ Cloudflare
→ Tunnel
→ Caddy
→ LM Studio

Starting the tunnel was surprisingly simple:

cloudflared tunnel --url http://127.0.0.1:8080

Cloudflare generated a public HTTPS endpoint:

https://random-name.trycloudflare.com

At this point, the local LLM was accessible securely from anywhere.

Why I Used 127.0.0.1 Instead of localhost

One issue I encountered was:

Unable to reach the origin service

The problem was that localhost resolved to IPv6 (::1) while Caddy was listening on IPv4.

Switching to:

127.0.0.1

resolved the issue immediately.

This is a subtle but common networking issue on Windows.

Using the Local LLM from Another Network

Once the tunnel was active, the endpoint became:

https://random-name.trycloudflare.com/v1

This allowed remote applications to use the local model exactly like OpenAI APIs.

Using Continue.dev with the Remote LLM

One of the most practical integrations was VSCode Continue.dev.

The configuration looked like this:

{
  "models": [
    {
      "title": "Home LM Studio",
      "provider": "openai",
      "model": "local-model",
      "apiBase": "https://random-name.trycloudflare.com/v1",
      "apiKey": "dummy"
    }
  ]
}

This effectively transformed my home PC into a remote AI coding server.

Now my work laptop could:

use my home GPU
run coding assistants
generate code
explain repositories
create components
review Laravel/Vue code

all remotely.

Integrating with n8n and AI Agents

Since LM Studio exposed an OpenAI-compatible API, it also integrated easily with:

n8n AI Agent nodes
OpenAI SDKs
custom Laravel applications
OpenClaw-style agents
automation systems

This meant my local infrastructure could power:

research agents
developer agents
workflow automation
document analysis
content generation

without paying per-token cloud costs.

Security Considerations

This setup works extremely well, but security matters.

Key lessons:

never expose raw local services directly
avoid router port forwarding
use reverse proxies
prefer tunnels over open ports
eventually add authentication

For production-grade usage, the next steps would include:

Cloudflare Access authentication
API key validation
rate limiting
request logging
named Cloudflare tunnels
custom domains

Final Architecture

The final infrastructure looked like this:

Work Laptop / Remote Apps
→ Cloudflare Tunnel
→ Caddy Reverse Proxy
→ LM Studio
→ Local GPU

This architecture is surprisingly scalable for:

local AI labs
self-hosted development assistants
agent-based workflows
remote GPU inference
private coding copilots

Final Thoughts

What started as a local AI experiment became a lightweight distributed AI infrastructure.

The most important realization was this:

Modern local LLM infrastructure is no longer just about running models locally. It is about exposing those models safely, integrating them into development workflows, and treating them like internal platform services.

With a consumer GPU, reverse proxy, and secure tunnel, it is now possible to build highly capable AI systems from home infrastructure that integrate seamlessly with professional workflows across networks and devices.

The Goal

The Hardware

The Core Components

Step 1 — Running the Local LLM

Why I Did NOT Expose LM Studio Directly

Step 2 — Introducing a Reverse Proxy

Understanding Reverse Proxy Architecture

Step 3 — Exposing the API Securely with Cloudflare Tunnel

Why I Used 127.0.0.1 Instead of localhost

Using the Local LLM from Another Network

Using Continue.dev with the Remote LLM

Integrating with n8n and AI Agents

Security Considerations

Final Architecture

Final Thoughts

Navigate

Contact

© 2024 Digital Danfe

How I Built a Local LLM Infrastructure and Exposed It Securely Across Networks

The Goal

The Hardware

The Core Components

Step 1 — Running the Local LLM

Why I Did NOT Expose LM Studio Directly

Step 2 — Introducing a Reverse Proxy

Understanding Reverse Proxy Architecture

Step 3 — Exposing the API Securely with Cloudflare Tunnel

Why I Used 127.0.0.1 Instead of localhost

Using the Local LLM from Another Network

Using Continue.dev with the Remote LLM

Integrating with n8n and AI Agents

Security Considerations

Final Architecture

Final Thoughts

Navigate

Contact

© 2024 Digital Danfe