Pipeline Runbook¶
Step-by-step guide to build the full IONIS pipeline from a clean slate. This is the tested, dependency-ordered sequence — proven during the v3.0 migration rebuild (2026-02-13). Each phase links to the relevant Methodology page for detail; this page is the consolidated checklist with commands, expected counts, and wall times.
Total wall time: ~2 hours on reference hardware (9975WX ingest + population), plus ~4 hours training on M3 Ultra.
Phase 1: Install Packages (9975WX)¶
Step 1.1: Enable COPR and install¶
Step 1.2: Verify installed versions¶
Step 1.3: Verify DDL and scripts installed¶
ls /usr/share/ionis-core/ddl/*.sql | wc -l
# Expected: 29
ls /usr/share/ionis-core/scripts/*.sh | wc -l
# Expected: 12
Phase 2: Create Database Schema (< 1 min)¶
All DDL files live in /usr/share/ionis-core/ddl/. Each is idempotent
(CREATE TABLE/VIEW IF NOT EXISTS). Apply in numerical order — the
sequence encodes dependencies.
Step 2.1: Apply all DDL¶
Or manually:
for f in /usr/share/ionis-core/ddl/*.sql; do
echo "Applying: $(basename $f)"
clickhouse-client --multiquery < "$f"
done
Step 2.2: DDL Inventory (29 files)¶
# File Database Creates
-- -------------------------------- ----------- ----------------------------------------
01 wspr_schema_v2.sql wspr bronze, mv_spots_daily
02 solar_indices.sql solar bronze
03 solar_silver.sql solar v_daily_indices
04 data_mgmt.sql data_mgmt config
05 geo_functions.sql geo v_grid_validation_example
06 lab_versions.sql data_mgmt lab_versions
07 callsign_grid.sql wspr callsign_grid
08 model_features.sql wspr silver
09 quality_distribution_mv.sql wspr v_quality_distribution (MV → silver)
10 rbn_schema_v1.sql rbn bronze
11 contest_schema_v1.sql contest bronze
12 signatures_v1.sql wspr signatures_v1
13 training_stratified.sql wspr gold_stratified
14 training_continuous.sql wspr gold_continuous
15 training_v6_clean.sql wspr gold_v6
16 validation_step_i.sql validation step_i_paths, step_i_voacap
17 balloon_callsigns.sql wspr balloon_callsigns
18 validation_quality_test.sql validation quality_test_paths, quality_test_voacap
19 dxpedition_synthesis.sql dxpedition catalog; rbn.dxpedition_paths
20 signatures_v2_terrestrial.sql wspr signatures_v2_terrestrial
21 balloon_callsigns_v2.sql wspr balloon_callsigns_v2
22 pskr_schema_v1.sql pskr bronze
23 contest_signatures.sql contest signatures
24 rbn_signatures.sql rbn signatures
25 live_conditions.sql wspr live_conditions
26 validation_model_results.sql validation model_results
27 mode_thresholds.sql validation mode_thresholds
28 pskr_ingest_log.sql pskr ingest_log
29 rbn_dxpedition_signatures.sql rbn dxpedition_signatures
DDL 09 depends on DDL 08
The v_quality_distribution materialized view reads from wspr.silver.
DDL 08 must be applied first. Sequential numbering handles this
automatically.
Step 2.3: Verify table count¶
clickhouse-client --query "
SELECT count()
FROM system.tables
WHERE database NOT IN ('system', 'information_schema', 'INFORMATION_SCHEMA', 'default')
AND engine <> ''
AND name NOT LIKE '.inner_id%'
"
# Expected: 35
See Bronze Stack — Step 1 for additional detail.
Phase 3: Bronze Ingest (~35 min)¶
Order matters. Solar must run first — downstream JOINs depend on it.
Run wspr-turbo solo (it consumes ~80 GB RAM at 16 workers).
Step 3.1: Solar backfill¶
Additional solar sources
solar-backfill loads GFZ Potsdam historical data only. Additional
solar data comes from solar-history-load (NOAA, 6-hour cron) and
solar-live-update (SWPC, 15-min cron). Run those manually or wait
for their cron cycles to fully populate solar.bronze and
wspr.live_conditions.
Step 3.2: WSPR ingest¶
Run wspr-turbo solo
With 16 workers, wspr-turbo consumes ~80 GB RAM. Do not run other ingesters concurrently.
Step 3.3: RBN ingest¶
Step 3.4: Contest ingest¶
Step 3.5: PSK Reporter ingest¶
Step 3.6: Verify all bronze tables¶
clickhouse-client --query "
SELECT 'solar.bronze' AS tbl, count() AS rows FROM solar.bronze
UNION ALL SELECT 'wspr.bronze', count() FROM wspr.bronze
UNION ALL SELECT 'rbn.bronze', count() FROM rbn.bronze
UNION ALL SELECT 'contest.bronze', count() FROM contest.bronze
UNION ALL SELECT 'pskr.bronze', count() FROM pskr.bronze
ORDER BY 1
FORMAT PrettyCompact
"
QA Actuals¶
Clean-slate rebuild on 9975WX (2026-02-07):
| Table | Expected Rows | Time | Throughput |
|---|---|---|---|
solar.bronze |
~76,000 | < 1 s | — |
wspr.bronze |
~10,800,000,000 | ~8 min | 21.91 Mrps |
rbn.bronze |
~2,184,000,000 | ~3 min 30 s | 10.30 Mrps |
contest.bronze |
~234,000,000 | ~22 min | — |
pskr.bronze |
~81,000,000+ | varies | — |
See Bronze Stack for per-step verification queries.
Phase 4: Populate Derived Tables (~20 min)¶
All 12 population scripts live in /usr/share/ionis-core/scripts/. They
accept CH_HOST env var (default: 192.168.1.90, use 10.60.1.1 for DAC).
Run in dependency order — three tiers.
Tier 1: No dependencies beyond bronze¶
# 4.1 — Callsign→Grid Rosetta Stone (depends on: wspr.bronze)
bash /usr/share/ionis-core/scripts/populate_callsign_grid.sh
# Expected: ~3.6M rows in wspr.callsign_grid
# 4.2 — WSPR Signatures V1 (depends on: wspr.bronze, solar.bronze)
bash /usr/share/ionis-core/scripts/populate_signatures.sh
# Expected: ~93.8M rows in wspr.signatures_v1 (~3 min)
# 4.3 — Continuous training set (depends on: wspr.bronze, solar.bronze)
bash /usr/share/ionis-core/scripts/populate_continuous.sh
# Expected: 10M rows in wspr.gold_continuous (~4 min)
# 4.4 — Stratified training set (depends on: wspr.bronze, solar.bronze)
bash /usr/share/ionis-core/scripts/populate_stratified.sh
# Expected: 10M rows in wspr.gold_stratified (~7 min)
Tier 2: Depends on Tier 1 outputs¶
# 4.5 — Balloon callsigns V2 (depends on: wspr.bronze, wspr.callsign_grid)
bash /usr/share/ionis-core/scripts/populate_balloon_callsigns.sh
# Expected: ~1,443 rows in wspr.balloon_callsigns_v2
# 4.6 — RBN Signatures (depends on: rbn.bronze, wspr.callsign_grid, solar.bronze)
bash /usr/share/ionis-core/scripts/populate_rbn_signatures.sh
# Expected: ~56.7M rows in rbn.signatures
# 4.7 — Contest Signatures (depends on: contest.bronze, wspr.callsign_grid, solar.bronze)
bash /usr/share/ionis-core/scripts/populate_contest_signatures.sh
# Expected: ~6.3M rows in contest.signatures
# 4.8 — Quality test paths (depends on: wspr.signatures_v1)
bash /usr/share/ionis-core/scripts/populate_quality_test_paths.sh
# Expected: 100K rows in validation.quality_test_paths
# 4.9 — V6 clean training set (depends on: wspr.gold_continuous)
bash /usr/share/ionis-core/scripts/populate_v6_clean.sh
# Expected: 10M rows in wspr.gold_v6
DXpedition chain (depends on Tier 1 + rbn.bronze)¶
# 4.10 — DXpedition catalog (depends on: static TSV data file)
bash /usr/share/ionis-core/scripts/populate_dxpedition_catalog.sh
# Expected: 332 rows in dxpedition.catalog
# 4.11 — DXpedition paths + signatures (depends on: catalog, rbn.bronze, callsign_grid, solar.bronze)
bash /usr/share/ionis-core/scripts/populate_dxpedition_paths.sh
# Expected: ~3.9M rows in rbn.dxpedition_paths, ~260K rows in rbn.dxpedition_signatures
Tier 3: Balloon-filtered signatures¶
# 4.12 — Signatures V2 Terrestrial (depends on: wspr.bronze, solar.bronze, balloon_callsigns_v2)
bash /usr/share/ionis-core/scripts/populate_signatures_v2_terrestrial.sh
# Expected: ~93.3M rows in wspr.signatures_v2_terrestrial
Step 4.13: Verify all derived tables¶
clickhouse-client --query "
SELECT 'wspr.callsign_grid' AS tbl, count() AS rows FROM wspr.callsign_grid
UNION ALL SELECT 'wspr.signatures_v1', count() FROM wspr.signatures_v1
UNION ALL SELECT 'wspr.signatures_v2_terrestrial', count() FROM wspr.signatures_v2_terrestrial
UNION ALL SELECT 'wspr.balloon_callsigns_v2', count() FROM wspr.balloon_callsigns_v2
UNION ALL SELECT 'wspr.gold_continuous', count() FROM wspr.gold_continuous
UNION ALL SELECT 'wspr.gold_stratified', count() FROM wspr.gold_stratified
UNION ALL SELECT 'wspr.gold_v6', count() FROM wspr.gold_v6
UNION ALL SELECT 'rbn.signatures', count() FROM rbn.signatures
UNION ALL SELECT 'contest.signatures', count() FROM contest.signatures
UNION ALL SELECT 'dxpedition.catalog', count() FROM dxpedition.catalog
UNION ALL SELECT 'rbn.dxpedition_paths', count() FROM rbn.dxpedition_paths
UNION ALL SELECT 'rbn.dxpedition_signatures', count() FROM rbn.dxpedition_signatures
UNION ALL SELECT 'validation.quality_test_paths', count() FROM validation.quality_test_paths
ORDER BY 1
FORMAT PrettyCompact
"
Phase 5: Silver Layer (~50 min, optional)¶
Generate CUDA float4 embeddings from WSPR spots joined with solar indices.
GPU required
bulk-processor requires an NVIDIA GPU. It is not yet packaged in the
ionis-cuda RPM — build locally:
cd ionis-cuda && mkdir build && cd build && cmake .. && make
| Target Table | Expected Rows | Time |
|---|---|---|
wspr.silver |
~4.43B | ~45 min |
wspr.v_quality_distribution |
~6.1M | auto (MV) |
Verification:
See Silver Layer for details.
Phase 6: Export Training Data¶
Export the gold_v6 training set as CSV and transfer to the training host.
clickhouse-client --query "SELECT * FROM wspr.gold_v6 FORMAT CSV" \
> /mnt/ai-stack/ionis-ai/ionis-training/data/gold_v6.csv
# Transfer to M3 Ultra via DAC link
scp data/gold_v6.csv gbeam@10.60.1.2:workspace/ionis-ai/ionis-training/data/
Verification:
See Gold Layer for training table lineage.
Phase 7: Verify Services (9975WX)¶
Step 7.1: Solar live update¶
solar-live-update
clickhouse-client --query "SELECT * FROM wspr.live_conditions ORDER BY updated_at DESC LIMIT 1"
Step 7.2: Cron jobs¶
See Operations & Maintenance for the full cron schedule.
Step 7.3: pskr-collector service¶
Phase 8: M3 Ultra Setup (Sage Node)¶
Step 8.1: Clone all repos¶
mkdir -p /Users/gbeam/workspace/ionis-ai && cd /Users/gbeam/workspace/ionis-ai
for repo in ionis-core ionis-apps ionis-cuda ionis-docs ionis-training ionis-devel ionis-tools; do
git clone git@github.com:IONIS-AI/${repo}.git
done
Step 8.2: Verify ClickHouse connectivity over DAC¶
clickhouse-client --host 10.60.1.1 --query "SELECT version()"
clickhouse-client --host 10.60.1.1 --query "SELECT count() FROM wspr.bronze"
Step 8.3: Set up Python venv¶
cd /Users/gbeam/workspace/ionis-ai/ionis-training
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Phase 9: Train and Validate V20 (M3 Ultra)¶
Step 9.1: Train V20¶
cd /Users/gbeam/workspace/ionis-ai/ionis-training
source .venv/bin/activate
python train.py --config versions/v20/config_v20.json
# Expected: ~4h 16m on M3 Ultra (MPS backend)
Step 9.2: Run validation suite¶
python versions/v20/verify_v20.py
python versions/v20/test_v20.py
python versions/v20/validate_v20.py
Step 9.3: PSK Reporter live validation¶
Step 9.4: Acceptance criteria¶
| Metric | Target | V20 Reference |
|---|---|---|
| Pearson | > +0.48 | +0.4879 |
| Kp sidecar | > +3.0 sigma | +3.487 sigma |
| SFI sidecar | > +0.4 sigma | +0.482 sigma |
| RMSE | < 0.87 sigma | 0.862 sigma |
Summary: Full Stack Build Order¶
Phase 1: Install Packages
1.1 dnf install ionis-core ionis-apps
1.2 Verify versions, DDL count (29), script count (12)
Phase 2: Schema
2.1 ionis-db-init (<1 min)
2.2 Verify table count (35)
Phase 3: Bronze Ingest
3.1 solar-backfill solar.bronze (<1s)
3.2 wspr-turbo wspr.bronze (~8m) RUN SOLO
3.3 rbn-ingest rbn.bronze (~3m30s)
3.4 contest-ingest contest.bronze (~22m)
3.5 pskr-ingest pskr.bronze (varies)
Phase 4: Population Scripts (Tier 1 → 2 → 3)
4.1 populate_callsign_grid.sh wspr.callsign_grid (~3.6M)
4.2 populate_signatures.sh wspr.signatures_v1 (~93.8M)
4.3 populate_continuous.sh wspr.gold_continuous (10M)
4.4 populate_stratified.sh wspr.gold_stratified (10M)
── Tier 2 ──
4.5 populate_balloon_callsigns.sh wspr.balloon_callsigns_v2 (~1.4K)
4.6 populate_rbn_signatures.sh rbn.signatures (~56.7M)
4.7 populate_contest_signatures.sh contest.signatures (~6.3M)
4.8 populate_quality_test_paths.sh validation.quality_test_paths (100K)
4.9 populate_v6_clean.sh wspr.gold_v6 (10M)
── DXpedition ──
4.10 populate_dxpedition_catalog.sh dxpedition.catalog (332)
4.11 populate_dxpedition_paths.sh rbn.dxpedition_paths (~3.9M)
rbn.dxpedition_signatures (~260K)
── Tier 3 ──
4.12 populate_signatures_v2_terrestrial.sh wspr.signatures_v2_terrestrial (~93.3M)
Phase 5: Silver Layer (optional, requires GPU)
5.1 bulk-processor (CUDA) wspr.silver (~4.43B, ~45m)
Phase 6: Export Training Data
6.1 gold_v6.csv export + SCP to M3
Phase 7: Verify Services
7.1 solar-live-update, cron jobs, pskr-collector
Phase 8: M3 Ultra Setup
8.1 Clone repos, venv, DAC connectivity
Phase 9: Train and Validate V20
9.1 train.py --config versions/v20/config_v20.json (~4h16m)
9.2 verify_v20.py, test_v20.py, validate_v20.py
9.3 validate_v20_pskr.py (>84% recall)