CLIP - Luca — AI, Coffee & Structural Thinking

Text Guides Image

Feb 11, 2026

—

by

Noise has no direction. Without text, it stays noise. “A cat flying through space”—this sentence guides the generation. The image asks: what should I become? Text answers through cross-attention. How Stable Diffusion uses Query, Key, Value to turn prompts into pixels.

Contrast Creates Meaning

Feb 1, 2026

—

by

Luca

in AI Works

Labels aren’t necessary. ImageNet needed 25,000 workers to label 14 million images. But the internet already has the answers—400 million image-text pairs. CLIP learned without labels and classifies things it’s never seen. How contrastive learning aligned images and text into one space.

Into a Shared Space

Jan 27, 2026

—

by

Luca

in AI Works

2012. CNN conquered images. Transformer conquered text. But each lived in separate worlds—vectors that couldn’t compare. What if a cat photo and the word “cat” existed at the same location? Shared embedding space makes this possible. How CLIP and ImageBind unified different senses into one language.

Tag: CLIP

Text Guides Image

Contrast Creates Meaning

Into a Shared Space