IONIS Lab¶

Performance benchmarks for the IONIS sovereign infrastructure — ingestion, query, and cross-system comparisons on 10.8 billion WSPR rows.

System Overview¶

┌─────────────────────────────────────────────────────────────────┐
│              IONIS SOVEREIGN AI DATA CENTER                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  INGEST ────────▶   STORE ────────▶   THINK                     │
│                                                                 │
│  Threadripper       ClickHouse        DeepSeek-R1-70B           │
│  9975WX             10.8B rows        RTX PRO 6000              │
│  24.67 Mrps         ~200 GB           96 GB VRAM                │
│  3.5 GB/s writes    60 GB/s scan      ~15 tok/s                 │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Layer	Component	Capacity	Verified
Ingest	wspr-turbo @ 24 workers	24.67 Mrps (3.5 GB/s)	2026-02-01
Store	ClickHouse (wspr.bronze)	10.8B rows, 60 GB/s scan	2026-02-01
Think	DeepSeek-R1-Distill-70B (FP8)	96 GB VRAM, ~15 tok/s	2026-02-02

Hardware¶

Control Node — Threadripper 9975WX¶

Primary ingestion and query engine. This server forms the core of the lab.

Spec	Value
CPU	AMD Threadripper PRO 9975WX (32C/64T, 5.3 GHz boost, Zen 5)
RAM	128 GB DDR5 (8-channel, ~384 GB/s bandwidth)
GPU	NVIDIA RTX PRO 6000 Blackwell (96 GB VRAM)
Boot Device	Micron 7450 NVMe 980GB 22110
Primary Storage	2x Samsung 9100 Pro NVMe 4TB, 3.6 TB ClickHouse, 3.6 TB working data
Bulk Storage	HighPoint NVMe carrier with 4x Samsung 990 Pro 4 TB in a 2x2 ZFS
Primary NIC	Intel X710 4-Port 10Gbe DAC
Spare NIC	Mellanox ConnectIX-5 2-Port 25Gbe DAC
OS	Rocky Linux 9.7

I/O is decoupled: source files on one NVMe, ClickHouse writes to a separate NVMe. No contention during ingestion.

Sage Node — Mac Studio M3 Ultra¶

Model training and evaluation.

Spec	Value
SoC	Apple M3 Ultra
RAM	96 GB unified memory
Backend	PyTorch MPS
Management NIC	1x 10Gbe
LAB NIC	Thunderbold 5 40Gbe
Role	V20 training (100 epochs in 4h 16m)

Forge Node — EPYC 7302P¶

Backup, replica, and VM host (Proxmox).

Spec	Value
CPU	AMD EPYC 7302P (16C/32T, 3.0 GHz base, Zen 2)
RAM	128 GB DDR4-3200 ECC (8-channel, ~204 GB/s bandwidth)
GPU	NVIDIA RTX 5080 (16 GB VRAM)
Boot Device	Micron 7450 NVMe 980GB 22110
Primary NIC	Intel X710 4-Port 10Gbe DAC
Spare NIC	Mellanox ConnectIX-5 2-Port 25Gbe DAC
NVMe Storage	4x Samsung 970 EVO Plus 1 TB (NVMe mirror)
SSD Storage	LSI HBA 9300-8i, 8x Samsung 870 EVO 2 TB (RAIDZ2)

Storage Node — TrueNAS¶

NAS for bulk archive storage.

Spec	Value
CPU	AMD Ryzen 9 5950X (16C/32T, Zen 3)
RAM	128 GB DDR4 ECC
Boot Device	2x Samsung 870 EVO 1TB 2x2 Mirror
Primary NIC	Intel X710-DAC 4-Port 10Gbe DAC
Storage	8x Seagate IronWolf Pro 14 TB (3-way + 4-way mirror, ~28 TB usable)

DAC Network¶

All inter-node links are direct-attach 10 Gbps point-to-point with jumbo frames (MTU 9000). No switches.

Link	Subnet	Endpoints	Latency
Thunderbolt 5	`10.60.1.0/24`	9975WX — M3 Ultra	~0.19 ms
x710 SFP+ AOC	`10.60.2.0/24`	9975WX — Proxmox/EPYC	~0.12 ms
x710 SFP+ AOC	`10.60.3.0/24`	9975WX — TrueNAS	~0.03 ms

ZFS Archive Pool¶

HighPoint NVMe carrier with 4x Samsung 990 Pro 4 TB in a 2x2 ZFS mirror (~7.12 TB usable) on the 9975WX.

Dataset	Compression	Purpose
`wspr-data`	lz4	WSPR raw CSV archives (.csv.gz)
`contest-logs`	zstd-9	CQ contest Cabrillo logs
`rbn-data`	lz4	RBN daily ZIP archives
`pskr-data`	lz4	PSK Reporter MQTT collection (live)

Ingestion Benchmarks¶

All benchmarks on 10.8 billion WSPR rows using ch-go native protocol with LZ4 wire compression.

Threadripper 9975WX — Champion Results¶

wspr-turbo (streaming from .gz archives):

Workers	Time	Throughput	I/O Rate	Effective Write	Memory
16	7m 54s	22.77 Mrps	399 MB/s	1.74 GB/s	45.3 GB
24	7m 18s	24.67 Mrps	433 MB/s	1.88 GB/s	71.5 GB

wspr-shredder (pre-decompressed CSV, 878 GB):

Workers	Time	Throughput	I/O Rate
24	8m 15s	21.81 Mrps	~1.81 GB/s

wspr-parquet-native (Parquet files, 109 GB):

Workers	Time	Throughput	I/O Rate
24	9m 39s	17.02 Mrps	193 MB/s

Ingestion Leaderboard¶

Method	Throughput	Time	Bottleneck	Verdict
wspr-turbo (gzip)	24.67 Mrps	7m 18s	Database merge	Champion
wspr-shredder (CSV)	21.81 Mrps	8m 15s	Database merge	Runner-up
wspr-parquet-native	17.02 Mrps	9m 39s	Library overhead	Slowest

Key Findings¶

Compression is free. wspr-turbo (gzip) beats wspr-shredder (raw CSV) because both hit the same ClickHouse merge ceiling at ~3.5 GB/s sustained writes. Decompression overhead is invisible.
The 3.5 GB/s wall. All three tools hit ClickHouse's background merge limit. The bottleneck is the database, not the ingester.
The Parquet paradox. Parquet is faster for queries but 30% slower for ingestion due to parquet-go serialization overhead.
24 workers optimal. 8.3% faster than 16 workers, uses ~72 GB RAM. 32 workers exceeded 128 GB and OOM'd.
Thermals. 74 C at full load (5.0 GHz sustained), no throttling.
LLM coexistence. Running DeepSeek-R1-70B (SGLang) concurrently had less than 1% impact on ingestion throughput.

Query Benchmarks¶

Threadripper 9975WX — 10.8B Rows¶

SELECT callsign, avg(snr) AS avg_snr, count(*) AS count
FROM wspr.bronze
GROUP BY callsign
ORDER BY count DESC
LIMIT 10;

Metric	Result
Time	3.050 sec
Rows processed	10.80 billion
Throughput	3.54 billion rows/s
Data scanned	183.57 GB
Scan rate	60.19 GB/s
Peak memory	828.79 MiB

Top 5 transmitters by spot count:

Callsign	Avg SNR	Spots
WW0WWV	-12.79 dB	105.96M
WB6CXC	-12.47 dB	70.83M
NI5F	-11.68 dB	58.74M
DL6NL	-17.00 dB	53.28M
TA4/G8SCU	-12.46 dB	49.21M

Cross-System Comparison¶

Ingestion (wspr-turbo)¶

System	Workers	Time	Throughput	vs 9950X3D
Threadripper 9975WX	24	7m 18s	24.67 Mrps	2.79x
Threadripper 9975WX	16	7m 54s	22.77 Mrps	2.58x
Ryzen 9 9950X3D	16	20m 13s	8.83 Mrps	baseline
EPYC 7302P	16	23m 27s	7.63 Mrps	0.86x

Query (GROUP BY over 10.8B rows)¶

System	Time	Scan Rate	Winner
Threadripper 9975WX	3.05s	60.19 GB/s	Yes
Ryzen 9 9950X3D	3.88s	~47 GB/s
EPYC 7302P	4.54s	~40 GB/s

System Specifications¶

Spec	Threadripper 9975WX	9950X3D	EPYC 7302P
Cores / Threads	32C/64T	16C/32T	16C/32T
Architecture	Zen 5	Zen 5 (3D V-Cache)	Zen 2
Boost clock	5.3 GHz	5.4 GHz	3.0 GHz
RAM	128 GB DDR5	64 GB DDR5-6000	128 GB DDR4-3200 ECC
Memory channels	8	2	8
Memory bandwidth	~384 GB/s	~96 GB/s	~204 GB/s

The Threadripper 9975WX dominates all workloads. Eight-channel DDR5 combined with 32 Zen 5 cores finally unlocks the memory bandwidth needed for analytical queries while maintaining the single-threaded performance needed for gzip decompression.

wspr-turbo Architecture¶

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  .csv.gz file   │────▶│  Stream Decomp   │────▶│  Vectorized     │
│  (on disk)      │     │  (klauspost/gzip)│     │  CSV Parser     │
└─────────────────┘     └──────────────────┘     └────────┬────────┘
                                                          │
                                                          ▼
┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  ClickHouse     │◀────│  ch-go Native    │◀────│  Double Buffer  │
│                 │     │  Blocks + LZ4    │     │  Fill A/Send B  │
└─────────────────┘     └──────────────────┘     └─────────────────┘

Stream decompress: klauspost/gzip (ASM-optimized, no disk extraction)
Vectorized parse: custom scanner fills columnar slices directly (no row structs)
Double buffer: Block A fills while Block B sends over the wire
Zero-alloc hot path: sync.Pool for 1M-row column blocks (~490 GC cycles for 10B rows)
Native protocol: proto.ColUInt64, proto.ColDateTime — binary wire format with LZ4 compression

End-to-End Pipeline Comparison¶

Threadripper 9975WX:

Pipeline	Steps	Total Time	Temp Disk	Throughput
wspr-turbo streaming	Stream from .gz	7m 18s	0 GB	24.67 Mrps
pigz + wspr-shredder	Decompress + ingest	9m 31s	878 GB	21.81 Mrps
pigz + parquet-native	Decompress + convert + ingest	~12m	987 GB	17.02 Mrps

wspr-turbo requires zero intermediate storage. The "compression is free" finding means there is no benefit to pre-decompressing archives — the ClickHouse merge wall is the bottleneck, not gzip decompression.