Interactive Results

Explore the Embedding Spaces

Hover over tokens. Zoom into regions. Compare boundary saturation against radial hierarchy in real time.

Back to Paper Full Analysis GitHub

Overview

Results at a Glance

173.6

Emb-layer Hyperbolic PPL

113.8

Emb-layer Euclidean PPL

117.4

Out-layer Hyperbolic PPL

+0.82

Centroid norm-freq ρ

0.07

Gromov δ (emb hyp)

0.43

Gromov δ (co-occur graph)

The Poincare Disk: Saturation vs Hierarchy

Left: embedding-layer Poincare embeddings are all crammed into a thin shell at the boundary (mean ||x|| = 0.92). Right: output-layer HyperbolicMLR centroids spread throughout the ball, with high-frequency tokens near the origin (mean ||p|| = 0.33).

Hover over any point to see the token, its frequency rank, and its norm. Zoom with scroll wheel.

GPT-2 (WikiText-103): Euclidean vs Centroids

Left: GPT-2 Euclidean embeddings (PCA). The transformer naturally organizes tokens with strong frequency-norm correlation (ρ = +0.924). Right: HyperbolicMLR centroids on the Poincare ball. Frequent tokens cluster near center, rare tokens spread outward (ρ = +0.70).

Embedding Spaces (PCA Projection)

Both embedding sets projected to 2D with PCA, normalized to the same scale. The Euclidean embeddings (left) spread freely. The hyperbolic embeddings (right) collapse into a ring.

Boundary Saturation in Real Time

The Poincare embeddings flatline at the ball boundary (r=1.0) from the very first checkpoint. Euclidean norms grow freely to 3.5+. The output-layer models (dashed) all use Euclidean embeddings.

Validation Perplexity During Training

The embedding-layer hyperbolic model never recovers from its poor start. The output-layer hyperbolic model converges faster than Euclidean.