Interactive Results

Explore the Embedding Spaces

Hover over tokens. Zoom into regions. Compare boundary saturation against radial hierarchy in real time.

Results at a Glance

173.6
Emb-layer Hyperbolic PPL
113.8
Emb-layer Euclidean PPL
117.4
Out-layer Hyperbolic PPL
+0.82
Centroid norm-freq ρ
0.07
Gromov δ (emb hyp)
0.43
Gromov δ (co-occur graph)

The Poincare Disk: Saturation vs Hierarchy

Left: embedding-layer Poincare embeddings are all crammed into a thin shell at the boundary (mean ||x|| = 0.92). Right: output-layer HyperbolicMLR centroids spread throughout the ball, with high-frequency tokens near the origin (mean ||p|| = 0.33).

Hover over any point to see the token, its frequency rank, and its norm. Zoom with scroll wheel.

GPT-2 (WikiText-103): Euclidean vs Centroids

Left: GPT-2 Euclidean embeddings (PCA). The transformer naturally organizes tokens with strong frequency-norm correlation (ρ = +0.924). Right: HyperbolicMLR centroids on the Poincare ball. Frequent tokens cluster near center, rare tokens spread outward (ρ = +0.70).

Embedding Spaces (PCA Projection)

Both embedding sets projected to 2D with PCA, normalized to the same scale. The Euclidean embeddings (left) spread freely. The hyperbolic embeddings (right) collapse into a ring.

Boundary Saturation in Real Time

The Poincare embeddings flatline at the ball boundary (r=1.0) from the very first checkpoint. Euclidean norms grow freely to 3.5+. The output-layer models (dashed) all use Euclidean embeddings.

Validation Perplexity During Training

The embedding-layer hyperbolic model never recovers from its poor start. The output-layer hyperbolic model converges faster than Euclidean.