Rendered at 22:09:08 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
tanelpoder 6 hours ago [-]
Great article, including the interactive parts that were just simple enough to illustrate the point at high level.
I never got that far, but I once built a little page that just visualized embedding vectors generated from a few hundred cat, dog and plane photos as heatmaps. I used this for demonstrating to database & storage folks what embedding vectors physically are, at low level. The point of the heatmap (of different vectors) was to show that there are visually observable "vertical bands" standing out when plotting out many embeddings of the same types of objects (like different cats with different backgrounds) in a single heatmap.
I then also took a single cat photo, rotated it by one more degree 360 times and created a heatmap of these vectors to illustrate the point of what the embedding models really detect (you have to uncheck the "Normalized" checkbox in the "same cat rotated by 360 degrees" page to see the vertical bands show up).
Beautiful illustrations
I find, 'Playing' is just the free and motivated version of 'exploration'.
One thought on your nicely illustrated "key observation [is] that neural networks tend to place features along directions": my guess is that the neural net was TOLD to behave that way by choosing e.g. Cosine Loss?
jacomyma 7 hours ago [-]
Fantastic article.
For readers interested in this, let me point to the somewhat similar "Activation Atlas" interactive paper, published in the sadly now-defunct scientific journal Distill.
Nice article! The generated images make me so nostalgic for the early days of AI image generation. DeepDream and others had such uncanny, interesting generations.
vintermann 8 hours ago [-]
Yeah, generative AI used to be wild, alien creativity and not something that made art kids furious.
I wonder if models can be trained for "high-temperature" purposes. I'd rather have a model which can surprise me than one which can predicably produce generic mediocre results. I mean you can run them on high temperature of course, but it doesn't seem like it's optimized for that.
RealityVoid 12 hours ago [-]
For some reason, the uncanniness of the feature pictures are deeply unsettling for me. It just stirs intense unease. A bit amusing, to be honest.
piyh 4 hours ago [-]
I was asking nano banana to modify pictures of me to reflect if I dressed or wore my hair differently.
It ended up inserting alien hands touching my hair with smooth pale skin holding weird wormy things.
Every once in awhile you get a peek of the Eldritch horror lurking below our helpful assistants
joaquincabezas 9 hours ago [-]
This article is very well structured and provides just the right amount of details for non-practitioners to enjoy it.
Mechanistic interpretability is a fun topic to "play with" (good title there). I recommend watching videos featuring Neel Nanda or Chris Olah
jcattle 15 hours ago [-]
Very nice visualizations, thanks for that!
One thing I still struggle with in my head is how these vision embeddings can then be used to give LLMs eyes.
Because you somehow need a giant training set which describes images in natural language, no? Is that actually how it works, or is there some smart trick so you don't need to pay labellers a bunch of money to look at pictures and describe them.
dilyevsky 14 hours ago [-]
> Because you somehow need a giant training set which describes images in natural language, no?
That's definitely one way - they train a text encoder together with an image encoder on a labelled set of images. WL & 3b1b made a nice video on it: https://www.youtube.com/watch?v=iv-5mZ_9CPY
jcattle 13 hours ago [-]
Thanks I'll check out that video
agentbraker 9 hours ago [-]
Awesome project! Preserving and sharing knowledge like this is incredibly valuable. Thanks for making these resources accessible to everyone.
I never got that far, but I once built a little page that just visualized embedding vectors generated from a few hundred cat, dog and plane photos as heatmaps. I used this for demonstrating to database & storage folks what embedding vectors physically are, at low level. The point of the heatmap (of different vectors) was to show that there are visually observable "vertical bands" standing out when plotting out many embeddings of the same types of objects (like different cats with different backgrounds) in a single heatmap.
I then also took a single cat photo, rotated it by one more degree 360 times and created a heatmap of these vectors to illustrate the point of what the embedding models really detect (you have to uncheck the "Normalized" checkbox in the "same cat rotated by 360 degrees" page to see the vertical bands show up).
The web-app is here:
https://tanelpoder.com/catvector/
One thought on your nicely illustrated "key observation [is] that neural networks tend to place features along directions": my guess is that the neural net was TOLD to behave that way by choosing e.g. Cosine Loss?
For readers interested in this, let me point to the somewhat similar "Activation Atlas" interactive paper, published in the sadly now-defunct scientific journal Distill.
https://distill.pub/2019/activation-atlas/
I wonder if models can be trained for "high-temperature" purposes. I'd rather have a model which can surprise me than one which can predicably produce generic mediocre results. I mean you can run them on high temperature of course, but it doesn't seem like it's optimized for that.
It ended up inserting alien hands touching my hair with smooth pale skin holding weird wormy things.
Every once in awhile you get a peek of the Eldritch horror lurking below our helpful assistants
Mechanistic interpretability is a fun topic to "play with" (good title there). I recommend watching videos featuring Neel Nanda or Chris Olah
One thing I still struggle with in my head is how these vision embeddings can then be used to give LLMs eyes.
Because you somehow need a giant training set which describes images in natural language, no? Is that actually how it works, or is there some smart trick so you don't need to pay labellers a bunch of money to look at pictures and describe them.
That's definitely one way - they train a text encoder together with an image encoder on a labelled set of images. WL & 3b1b made a nice video on it: https://www.youtube.com/watch?v=iv-5mZ_9CPY