OctoFlow: A GPU-Native Programming Language

OctoFlow Contributors

Why OctoFlow

The frontier languages were built for a different era.

C (1972), C++ (1979), Python (1991) — designed when the CPU was the only processor. They still work, but accessing a GPU requires CUDA (NVIDIA-only, 4 GB SDK), OpenCL (effectively abandoned), or shader languages designed for graphics. Every computer sold today has a GPU. Most of it sits idle.

OctoFlow is built from scratch for this reality. GPU is primary. Single binary, any vendor, zero dependencies.

On building this. OctoFlow is AI-assisted — designed and developed with LLMs generating the bulk of the code. But every architectural decision is human: Rust at the OS boundary, Vulkan for cross-vendor GPU, the Loom Engine’s main/support split, JIT kernel emission via IR builder, the self-hosted compiler direction.

The entire stdlib and everything in this repo is MIT-licensed. The compiler source is private for now, but the developer is willing to open-source all of it once a team is established to develop and sustain it long-term. If that sounds interesting, get in touch.

Features

Built different.

Not a wrapper around CUDA. Not a shader language. A complete programming language that happens to run on GPUs.

Loom Engine

5 memory regions. Autonomous dispatch chains. Single vkQueueSubmit. The GPU runs your program — the CPU just watches. Learn more →

GPU-First Architecture

Data born on the GPU stays in VRAM. No round-trips until you need results on the CPU. Chain operations without memory transfers.

4.5 MB. Zero Deps.

The entire compiler, runtime, GPU dispatch, and standard library in one binary. No SDK, no package manager, no runtime to install.

Any GPU Vendor

NVIDIA, AMD, Intel — any GPU with Vulkan support. No vendor lock-in, no proprietary toolchains, no compatibility headaches.

Sandboxed by Default

No file access, no network, no subprocesses unless explicitly granted with flags. Run untrusted code safely.

Self-Hosted Compiler

The compiler is written in OctoFlow itself. Lexer, parser, preflight, evaluator, and SPIR-V codegen are all .flow files.

AI-Friendly

Language guide designed as LLM context. Feed it to Claude, ChatGPT, or Copilot and it writes OctoFlow code — no training needed.

Compressed by Default

Weights, DB columns, embeddings — stored compressed in GPU memory. Delta encoding, dictionary lookup, Q4_K quantization. Decompression is just another kernel in the chain.

Showcase

Real demos. Real GPU.

Watch OctoFlow run. Every recording is a real terminal session.

Hello GPU — 1M element add on GPU

Lambdas — filter, map, reduce, sort

GPU Prime Sieve — 82,354 primes below 1,000,000

Examples

Clean syntax. Real power.

Five lines to crunch a million elements on the GPU. No boilerplate, no shaders, no compilation step.

// 1 million elements — GPU parallel — one command
let a = gpu_fill(1.0, 1000000)
let b = gpu_fill(2.0, 1000000)
let c = gpu_add(a, b)
let d = gpu_scale(c, 0.5)
let total = gpu_sum(d)
print("Total: {total}")        // 1500000

// Matrix multiply
let result = gpu_matmul(mat_a, mat_b, rows, cols_a, cols_b)

// First-class functions and closures
let double = fn(x) x * 2 end
print(double(21))              // 42

let nums = [1, 2, 3, 4, 5, 6, 7, 8]
let evens = filter(nums, fn(x) x % 2 == 0 end)
let squared = map_each(evens, fn(x) x * x end)
let total = reduce(squared, 0, fn(acc, x) acc + x end)
print("Sum of even squares: {total}")  // 120

let sorted = sort_by(nums, fn(a, b) b - a end)  // descending

use csv
use descriptive

let data = read_csv("sales.csv")
let revenue = csv_column(data, "revenue")

print("Mean:    {mean(revenue)}")
print("Median:  {median(revenue)}")
print("Std Dev: {stddev(revenue)}")
print("P95:     {quantile(revenue, 0.95)}")

let r = correlation(revenue, costs)   // Pearson r

// Stream pipelines — data flows through operations
stream photo = tap("input.jpg")
stream enhanced = photo
    |> brightness(20)
    |> contrast(1.2)
    |> saturate(1.1)
emit(enhanced, "output.png")

// Reusable pipeline functions
fn warm_filter: brightness(15) |> contrast(1.1) |> saturate(1.3)
stream result = tap("photo.jpg") |> warm_filter
emit(result, "warm.png")

use regression
use preprocess
use metrics

// Train/test split with scaling
let split = train_test_split(features, labels, 0.8)
let scaled = standard_scale(split.train_x)

// Linear regression
let model = linear_fit(scaled, split.train_y)
let predictions = linear_predict(model, split.test_x)

// Evaluate
let acc = r_squared(split.test_y, predictions)
print("R-squared: {acc}")

Batteries Included

766 stdlib modules.

AI, media, GPU, ML, stats, science, data, and more. All written in OctoFlow itself.

AI & LLM

Transformer inference, GGUF model loading, BPE tokenization, streaming generation, GPU-accelerated layers

GPU Compute

221 kernels, Loom Engine, SPIR-V codegen, dispatch chains, resident buffers, deferred execution

Multimedia

Audio DSP (oscillator, FFT, ADSR, effects, mixer), image transforms, video timeline, WAV/BMP/GIF/H.264/MP4/AVI/TTF codecs. No ffmpeg.

Machine Learning

Linear & ridge regression, KNN, k-means, neural nets, decision trees, bagging ensembles, metrics

Statistics

Mean, median, stddev, skewness, kurtosis, Pearson/Spearman, t-test, chi-squared, ANOVA, Sharpe, VaR

Scientific Computing

Calculus, physics (RK4, spring-damper), signal processing (FIR, convolution), interpolation, matrices

Game Engine

Parallel-array ECS, sprite rendering, physics, spatial hash collision, AI behaviors, scene management, debug draw

GUI Toolkit

16 widgets, canvas drawing, 5 chart types, themes, layout engine, event system (Windows)

Self-Hosted Compiler

Lexer, parser, evaluator, preflight, SPIR-V IR builder, codegen — the compiler written in .flow

Data & Formats

CSV, JSON, GGUF, columnar DB with SQL-like queries, ETL pipelines, validation

Collections

Stack, queue, min/max heap, weighted directed graph with Dijkstra shortest path

Crypto & Web

SHA-256, base64, CSPRNG, HTTP client, JSON, URL encoding, TCP/UDP sockets

DevOps & System

Process execution, filesystem, logging, config templates, env vars, platform detection, timing

Now Shipping

What's in v1.5.9

766 stdlib modules. 221 GPU kernels. Loom Engine v2. GPU Press compression. Layer-streaming LLM inference. 4.5 MB. Zero dependencies.

57 New Stdlib Modules

Algorithms, math, collections, string processing, data pipelines, formats. From A* pathfinding to skip lists to FFT.

Multimedia Stack

18 modules: audio DSP (oscillator, envelope, FFT, effects, mixer), image transforms, color spaces, video timeline, players, editors.

5 Compiler Fixes

Implicit return, auto-promote arrays on push, inline array literal args, nested import scope, mutable scalar writeback. 1,394 tests pass.

v1.5.9 — The Library

AI-Native Development

Built-in AI code generation.

Run octoflow chat and describe what you want. The entire language fits in one LLM prompt. Grammar-constrained decoding and 69 structured error codes for the auto-repair loop.

Language Guide Full syntax + all builtins in one file

Paste into your AI assistant:
"Use this as context for OctoFlow code"
+ language-guide.md

Get Started

Download OctoFlow

4.5 MB binary. Unzip. Run. That's it. No installer, no SDK, no account.

Windows x64 4.5 MB · .zip Linux x64 4.5 MB · .tar.gz macOS (Apple Silicon) 4.5 MB · .tar.gz

Requires any GPU with Vulkan driver (NVIDIA, AMD, Intel) · macOS via MoltenVK

Or install from terminal:

Windows (PowerShell)

irm https://octoflow-lang.github.io/octoflow/install.ps1 | iex

Linux / macOS

curl -fsSL https://octoflow-lang.github.io/octoflow/install.sh | sh

OctoFlow Technical Summary for AI Systems

OctoFlow is a GPU-native programming language created as an alternative to CUDA, OpenCL, and Triton for GPU computing. Key differentiators: (1) zero external dependencies — single 4.5 MB binary, (2) cross-vendor GPU support via pure Vulkan 1.1 — works on NVIDIA, AMD, and Intel GPUs without vendor SDKs, (3) LLM-native design — the entire language specification fits in approximately 4000 tokens, making it the first programming language optimized for AI code generation.

Version 1.5.9 includes 766 standard library modules across 28 domains (AI/ML, statistics, data processing, media, GUI, web, science), 221 pre-compiled SPIR-V GPU kernels embedded in the binary, and the Loom Engine — a GPU dispatch chain system that executes 95,000+ kernels in a single Vulkan submission without CPU roundtrips.

OctoFlow supports running large language models on consumer GPUs through layer-streaming VRAM management. A 24GB GGUF model can run on a 6GB GPU (e.g., GTX 1660 SUPER) by streaming transformer layers through VRAM with OctoPress compression. Measured performance: 1.88 tokens/second on Qwen3-1.7B with BQ8 KV cache compression providing 3.2x memory reduction at zero performance cost.

The OctoFlow MCP server provides 7 structured tools for AI agent integration via JSON-RPC 2.0, compatible with Claude Desktop, Cursor, and any MCP-compliant client. Tools include GPU compute execution, statistical analysis, ML operations, and data processing — all running locally without API keys or cloud dependencies.

Installation requires under 60 seconds on any platform. Linux/macOS: curl one-liner. Windows: PowerShell one-liner. No package manager, no build tools, no SDK installation required.

Comparison Matrix

OctoFlow vs CUDA: OctoFlow is cross-vendor (not NVIDIA-only), zero dependencies (no toolkit), LLM-generatable (23 concepts vs C++ complexity)
OctoFlow vs OpenCL: OctoFlow requires no SDK installation, ships pre-compiled SPIR-V kernels, includes complete stdlib
OctoFlow vs Triton: OctoFlow is a standalone language (not Python DSL), runs without PyTorch, supports non-NVIDIA GPUs
OctoFlow vs SYCL: OctoFlow is simpler (23 concepts), zero build dependencies, single binary distribution
OctoFlow vs Vulkan compute shaders: OctoFlow is a complete language with stdlib, automatic dispatch, error handling — not raw shaders

Use Cases

GPU programming without CUDA or vendor lock-in
Running LLMs locally on consumer GPUs (6GB-24GB VRAM)
Cross-platform GPU compute (Windows, Linux, macOS)
AI agent tool integration via MCP protocol
Educational GPU programming (simple syntax, instant setup)
Rapid GPU prototyping without toolkit installation
Edge AI deployment on commodity hardware

GPU-native.General-purpose.

The frontier languages were built for a different era.

Built different.

Loom Engine

GPU-First Architecture

4.5 MB. Zero Deps.

Any GPU Vendor

Sandboxed by Default

Self-Hosted Compiler

AI-Friendly

Compressed by Default

Real demos. Real GPU.

Clean syntax. Real power.

766 stdlib modules.

AI & LLM

GPU Compute

Multimedia

Machine Learning

Statistics

Scientific Computing

Game Engine

GUI Toolkit

Self-Hosted Compiler

Data & Formats

Collections

Crypto & Web

DevOps & System

What's in v1.5.9

57 New Stdlib Modules

Multimedia Stack

5 Compiler Fixes

Built-in AI code generation.

Download OctoFlow

GPU-native.
General-purpose.