# kgpu-gateway — Agent guide

`https://api.kgpu.net/v1` — Lab GPU dispatch for SNUH researchers.

kgpu is plain HTTPS with seven endpoints. Everything works in
`curl + jq`; no WebSocket, no SDK required. The gateway holds no user
data — upload/download bytes stream directly between client and pod.

## Access

Sign in at <https://kgpu.net> (KHDP primary, Google fallback). Copy
the token from the dashboard:

```bash
export KGPU_API_TOKEN=kgpu_xxxxxxxxxxxxx...
export KGPU_API_BASE=https://api.kgpu.net/v1
```

Every request carries:

```
Authorization: Bearer $KGPU_API_TOKEN
```

Inside a rented container `$KGPU_API_TOKEN` and `$KGPU_API_BASE` are
auto-injected.

**New accounts start at 0 credits.** Ping a maintainer for a budget
before you can rent.

## API — three resources

| Resource | Method | Path | What |
|---|---|---|---|
| **rent** | GET | `/rent` | cluster + (authed) caller's `me` + `my_rentals` |
| | POST | `/rent` | **rent** — body schema below |
| | GET | `/rent/{rent_id}` | rental state (phase, ready, idle clock) |
| | DELETE | `/rent/{rent_id}` | **release** the rental |
| **run** | POST | `/run/{rent_id}` | execute a command, NDJSON stream of stdout/stderr/done |
| **file** | POST | `/file/{rent_id}?path=...` | write bytes to a pod file (Content-Range for resume) |
| | GET | `/file/{rent_id}?path=...` | read bytes (Range for resume) |

7 method-paths · 4 URL patterns. All paths shown above are relative to
`$KGPU_API_BASE` (which includes the `/v1` prefix).

### `POST /rent` body

All optional; defaults shown.

| Field | Default | Notes |
|---|---|---|
| `name` | `"rent"` | human label |
| `image` | `ghcr.io/vitaldb/kgpu-pytorch:latest` | container image |
| `gpu_model` | `"gb10"` | `gb10` / `a5000` / `a6000` / `a6000-ada` / `b200` — see `GET /rent` for live menu |
| `gpu_count` | `1` | 0–4 (0 = CPU-only pod, still bills at the class rate) |
| `cpu` | `"2"` | K8s CPU qty (≤16 cores/rental) |
| `memory` | `"16Gi"` | K8s memory (≤128 GiB/rental) |
| `ephemeral_storage_gib` | `100` | container scratch disk, 1–100 GiB |
| `duration_hours` | `12` | hard wall-clock ceiling, 1–168 |
| `env` | `null` | extra pod env vars |

→ `{rent_id, gpu_model, price_per_hour_credits, per_gpu, namespace, balance_after_precheck}`

### `POST /run/{rent_id}` — execute a command

```jsonc
// request body
{"command": ["python", "train.py"],
 "env"?: {"FOO": "bar"}}
```

`command` is an argv list. The server runs it under `bash` with any
provided env vars exported.

Response is **NDJSON** (one JSON object per line) streamed live:

```jsonc
{"type":"stdout","chunk":"epoch 1: loss=0.36 ...\n"}
{"type":"stderr","chunk":"warn: ...\n"}
...
{"type":"done","exit_code":0}
```

**Client disconnect = command dies.** For persistence, wrap in nohup
yourself — the PID is in your own stdout via `& echo $!`:

```jsonc
{"command":["bash","-c",
            "nohup python train.py > /tmp/log 2>&1 & echo $!"]}
```

Later, check progress with another `POST /run`:

```jsonc
{"command":["bash","-c","tail -f /tmp/log"]}   // live tail until ^C
{"command":["bash","-c","cat /tmp/log"]}        // snapshot
{"command":["ps","-p","$PID"]}                   // is it alive?
```

### `POST /file/{rent_id}?path=...` — write a pod file

```
POST /file/{rent_id}?path=/tmp/data.zip
Headers:
  Authorization: Bearer ...
  Content-Length: 76800000             (required)
  Content-Range: bytes 52428800-77030319/77030320   (optional, for resume)
Body:
  raw bytes
```

Without `Content-Range`, the file at `path` is truncated and the body
becomes the entire file. With `Content-Range`, the file is aligned to
`start` (truncated if longer, padded otherwise) and the body is
written at byte `start..end`.

Returns `{"path": "/tmp/data.zip", "size": N}` on success.

### `GET /file/{rent_id}?path=...` — read a pod file

```
GET /file/{rent_id}?path=/tmp/model.pt
Optional header:
  Range: bytes=K-                       resume from byte K (curl -C - sets it)
```

Returns 200 (full) or 206 (partial) with raw bytes in the body.

### Status codes

`200` ok · `206` partial content · `400` bad request · `401` invalid
token · `404` not found · `402` insufficient_credits · `409` rental
not ready / quota exceeded · `411` length_required (upload without
Content-Length) · `416` range not satisfiable · `422` validation
failed · `500` server / size mismatch.

## End-to-end demo — curl

```bash
T=$KGPU_API_TOKEN
B=$KGPU_API_BASE
J() { jq -r "$1"; }

# 1. Rent
RENT=$(curl -sS -X POST $B/rent \
  -H "Authorization: Bearer $T" -H "Content-Type: application/json" \
  -d '{"gpu_model":"a6000-ada","duration_hours":2}' | J .rent_id)

# 2. Wait Ready
until [ "$(curl -sS $B/rent/$RENT -H "Authorization: Bearer $T" | J .ready)" = "true" ]; do
  sleep 5
done

# 3. Upload (curl sets Content-Length automatically with -T)
curl -sS -X POST -T mit-bih.zip \
  -H "Authorization: Bearer $T" \
  "$B/file/$RENT?path=/tmp/mitbih.zip"

# 4. Run — NDJSON live stream
curl -sS -N -X POST $B/run/$RENT \
  -H "Authorization: Bearer $T" -H "Content-Type: application/json" \
  -d '{"command":["bash","-c","unzip -q /tmp/mitbih.zip -d /tmp/mitdb && python train.py"]}' \
  | jq -r 'select(.type=="stdout") | .chunk' --unbuffered

# 5. Download (curl -C - handles resume automatically across retries)
curl -sS -C - -o model.pt \
  -H "Authorization: Bearer $T" \
  "$B/file/$RENT?path=/tmp/model.pt"

# 6. Release
curl -sS -X DELETE $B/rent/$RENT -H "Authorization: Bearer $T"
```

## End-to-end demo — Python `requests`

```python
import os, json, requests
T = os.environ["KGPU_API_TOKEN"]; H = {"Authorization": f"Bearer {T}"}
B = os.environ.get("KGPU_API_BASE", "https://api.kgpu.net/v1")

# 1. Rent
rent_id = requests.post(f"{B}/rent", headers=H,
    json={"gpu_model":"a6000-ada","duration_hours":2}).json()["rent_id"]

# 2. Wait Ready
import time
while not requests.get(f"{B}/rent/{rent_id}", headers=H).json().get("ready"):
    time.sleep(5)

# 3. Upload
with open("mit-bih.zip","rb") as f:
    requests.post(f"{B}/file/{rent_id}", params={"path":"/tmp/mitbih.zip"},
                  headers={**H,"Content-Type":"application/octet-stream"},
                  data=f).raise_for_status()

# 4. Run — stream NDJSON
with requests.post(f"{B}/run/{rent_id}", headers=H,
        json={"command":["bash","-c",
              "unzip -q /tmp/mitbih.zip -d /tmp/mitdb && python train.py"]},
        stream=True) as r:
    for line in r.iter_lines(decode_unicode=True):
        if not line: continue
        m = json.loads(line)
        if m.get("type") == "stdout": print(m["chunk"], end="")
        elif m.get("type") == "done": print(f"\nexit={m.get('exit_code')}"); break

# 5. Download
with requests.get(f"{B}/file/{rent_id}", params={"path":"/tmp/model.pt"},
                  headers=H, stream=True) as r:
    with open("model.pt","wb") as out:
        for chunk in r.iter_content(1<<20): out.write(chunk)

# 6. Release
requests.delete(f"{B}/rent/{rent_id}", headers=H)
```

## Inside a rented container

`ghcr.io/vitaldb/kgpu-pytorch:latest` ships PyTorch (CUDA 12.6,
nv24.10), numpy<2 pinned via `/etc/pip.conf` constraint, plus `wfdb`,
`vitaldb`, `scipy`, `scikit-learn`, `pandas`, `matplotlib`, `seaborn`,
`duckdb`, `pyarrow`, `tmux`, `uv`, `zstd`.

Auto-injected env:

| Var | Value |
|---|---|
| `KGPU_API_TOKEN` | Bearer token |
| `KGPU_API_BASE` | `https://api.kgpu.net/v1` |
| `KGPU_GPU_ID` / `KGPU_RENT_ID` | this rental's id |
| `NVIDIA_VISIBLE_DEVICES` | `void` — CDI handles GPU allocation, don't override |
| `HTTPS_PROXY` / `HTTP_PROXY` / `NO_PROXY` | set on KHDP nodes that need an egress proxy |

The container disk is **ephemeral**. When the rental ends (DELETE,
idle, expiry, out-of-credits) the writable layer is destroyed. Pull
artifacts via `/file` before releasing.

## Phase 1 limits

- **Per-user GPU cap**: 4 concurrent across all your rentals.
- **Per-rental caps**: 16 CPU cores, 128 GiB memory, 100 GiB ephemeral disk.
- **Idle**: 1 h with no `/run`/`/file` activity → `idle_warning`. 2 h → auto-release (`end_reason: auto_idle`).
- **Wall clock**: `duration_hours` (default 12, max 168). Hard ceiling.
- **Credits**: per-minute deduction; zero → auto-release (`end_reason: out_of_credits`).
- **GPU isolation**: each rental gets its requested `gpu_count` GPUs via
  CDI. Inside the container `nvidia-smi -L` and `/dev/nvidia*` only
  expose the rental's GPUs.
- **Transfer**: bytes go client → nginx (TLS, direct on EC2 — no
  Cloudflare proxy) → uvicorn → pod. The single-POST body cap is the
  pod's ephemeral disk (100 GiB by default), not a proxy limit. For
  files larger than that, split and resume via Content-Range.

## Pricing

$1 = 1 500 credits. New accounts start at 0; admin grants budget.

| `gpu_model` | hardware | credits/hr | ≈ $/hr |
|---|---|---|---|
| `gb10` | NVIDIA GB10 (≈3090) | 500 | 0.33 |
| `a5000` | RTX A5000 | 450 | 0.30 |
| `a6000` | RTX A6000 | 750 | 0.50 |
| `a6000-ada` | RTX 6000 Ada | 1 200 | 0.80 |
| `b200` | Blackwell B200 | 5 000 | 3.30 |

N-GPU rentals bill at N × the per-GPU rate.

## Errors & request IDs

Every response carries `X-Request-ID` (8-byte hex). 5xx responses embed
the same id as `trace_id` in the body. Quote it when filing a bug.

```json
{"error_code":"internal_server_error","message":"...","trace_id":"5e0e4b2dabe7a1f1"}
```

`X-Request-ID: <your-tag>` on the request is honored and echoed.