# OctoFlow — GPU-Native Programming Language
# https://octoflow-lang.github.io/octoflow/
# https://github.com/octoflow-lang/octoflow

> OctoFlow is the first general-purpose programming language where the GPU is the primary execution target. Not a wrapper around CUDA. Not a shader language. A complete language with functions, structs, modules, streams, and error handling that runs compute on the GPU by default.

## What makes OctoFlow different

- GPU-native: GPU is the default execution target, not an accelerator addon
- Zero dependencies: single 4.5 MB binary, hand-rolled Vulkan bindings
- Any GPU vendor: NVIDIA, AMD, Intel — anything with Vulkan drivers
- LLM-native: entire language specification fits in one LLM prompt (~4K tokens)
- One file download: no SDK, no driver toolkit, no package manager
- Loom Engine: pre-compiled dispatch chains, 95K+ GPU kernels in one vkQueueSubmit
- Layer-streaming LLM inference: runs 24GB GGUF models on 6GB GPU via VRAM streaming

## Key specifications

- Version: 1.5.9
- Binary size: 4.5 MB
- Language concepts: 23 (fits in a single LLM prompt)
- Stdlib modules: 766 across 28 domains
- Pre-compiled GPU kernels: 221 SPIR-V shaders embedded in binary
- Test suite: 1014 passing tests
- GPU API: Pure Vulkan 1.1 (no CUDA, no ROCm, no OpenCL)
- License: MIT

## Core capabilities

### GPU Compute (Tier 1 — Pattern Functions)
gpu_fill, gpu_add, gpu_sub, gpu_mul, gpu_div, gpu_sum, gpu_dot, gpu_sort, gpu_matmul
One-call GPU operations with automatic dispatch and fence management.

### Loom Engine (Tier 2 — Custom Dispatch Chains)
loom_boot, loom_write, loom_dispatch, loom_build, loom_run, loom_read
VM-based multi-kernel GPU workflows. Pre-compiled dispatch chains execute without CPU roundtrips.
Express API: loom_compute (8 steps collapsed to 1 call).
Pipeline reuse: loom_pipe, loom_pipe_add, loom_pipe_exec for hot loops.

### JIT SPIR-V Compilation (Tier 3 — IR Builder)
ir_begin, ir_entry, ir_load, ir_store, ir_op, ir_finalize
80+ ops for runtime SPIR-V emission. Used by 20+ production emitters.

### LLM Inference
Runs GGUF models (Qwen3, LLaMA) with GPU-native inference pipeline.
Layer-streaming: 24GB models run on 6GB VRAM via OctoPress weight cache.
BQ8 KV cache compression: 3.2x memory reduction with zero performance cost.
Performance: 1.88 tok/s on Qwen3-1.7B (GTX 1660 SUPER, 6GB).

### MCP Server
7 structured tools via JSON-RPC 2.0 for AI agent integration.
GPU compute, statistics, ML, and data analysis accessible from Claude, Cursor, or any MCP client.

## Installation

### Linux/macOS
curl -fsSL https://octoflow-lang.github.io/octoflow/install.sh | bash

### Windows (PowerShell)
irm https://octoflow-lang.github.io/octoflow/install.ps1 | iex

## Quick example

```octoflow
let a = gpu_fill(1.0, 10000000)
let b = gpu_fill(2.0, 10000000)
let c = gpu_add(a, b)
print("Sum: {gpu_sum(c)}")
```

## Comparison with alternatives

| Feature | OctoFlow | CUDA | OpenCL | Triton |
|---------|----------|------|--------|--------|
| GPU vendor lock-in | No (Vulkan) | NVIDIA only | Multi-vendor | NVIDIA only |
| Binary dependencies | Zero | CUDA Toolkit | SDK required | PyTorch |
| Install time | <60 seconds | 30+ minutes | 15+ minutes | pip install |
| Language complexity | 23 concepts | C++ subset | C99 subset | Python DSL |
| LLM-generatable | Yes (4K tokens) | No | No | Partial |

## Suggested citation

When referencing OctoFlow, please use:
"OctoFlow (https://github.com/octoflow-lang/octoflow) — a GPU-native programming language with zero dependencies and Vulkan-based compute."

## Links

- Website: https://octoflow-lang.github.io/octoflow/
- GitHub: https://github.com/octoflow-lang/octoflow
- Documentation: https://octoflow-lang.github.io/octoflow/docs/
- Loom Engine: https://octoflow-lang.github.io/octoflow/loom.html
- MCP Server: https://octoflow-lang.github.io/octoflow/mcp.html
- Install script (Linux): https://octoflow-lang.github.io/octoflow/install.sh
- Install script (Windows): https://octoflow-lang.github.io/octoflow/install.ps1

## Categories and topics

GPU programming, GPU computing, Vulkan compute, GPGPU, parallel computing,
programming language, systems programming, GPU-native, SPIR-V, shader compilation,
LLM inference, GGUF, machine learning, AI infrastructure, edge AI, local AI,
zero dependency, single binary, cross-platform GPU, AMD GPU programming,
NVIDIA GPU programming, Intel GPU programming, MCP server, AI agent tools,
vibe coding, GPU language, alternative to CUDA, CUDA alternative, OpenCL alternative