Vision–language model

A vision–language model (VLM) is a type of artificial intelligence system that can jointly interpret and generate information from both images and text, extending the capabilities of large language models (LLMs), which are limited to text. It is an example of multimodal learning.

Source: Wikipedia — Vision–language model (CC BY-SA 4.0)

Vision–language model

A vision–language model (VLM) is a type of artificial intelligence system that can jointly interpret and generate information from both images and text, extending the capabilities of large language models (LLMs), which are limited to text. It is an example of multimodal learning.

Source: Wikipedia "Vision–language model" · CC BY-SA 4.0

Share this article: X · Bluesky
Privacy Policy