Tag: cross-modal

  • Into a Shared Space

    Into a Shared Space

    2012. CNN conquered images. Transformer conquered text. But each lived in separate worlds—vectors that couldn’t compare. What if a cat photo and the word “cat” existed at the same location? Shared embedding space makes this possible. How CLIP and ImageBind unified different senses into one language.