Hover over tokens. Zoom into regions. Compare boundary saturation against radial hierarchy in real time.
Overview
Left: embedding-layer Poincare embeddings are all crammed into a thin shell at the boundary (mean ||x|| = 0.92). Right: output-layer HyperbolicMLR centroids spread throughout the ball, with high-frequency tokens near the origin (mean ||p|| = 0.33).
Hover over any point to see the token, its frequency rank, and its norm. Zoom with scroll wheel.
Left: GPT-2 Euclidean embeddings (PCA). The transformer naturally organizes tokens with strong frequency-norm correlation (ρ = +0.924). Right: HyperbolicMLR centroids on the Poincare ball. Frequent tokens cluster near center, rare tokens spread outward (ρ = +0.70).
Both embedding sets projected to 2D with PCA, normalized to the same scale. The Euclidean embeddings (left) spread freely. The hyperbolic embeddings (right) collapse into a ring.
The Poincare embeddings flatline at the ball boundary (r=1.0) from the very first checkpoint. Euclidean norms grow freely to 3.5+. The output-layer models (dashed) all use Euclidean embeddings.
The embedding-layer hyperbolic model never recovers from its poor start. The output-layer hyperbolic model converges faster than Euclidean.