ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal ... - arXiv
: The show intentionally deconstructs the "meddling kids" archetype, making the characters more flawed and cynical. Vilma 1x1
: ViLMA is a task-agnostic benchmark designed to evaluate how well Video-Language Models (VidLMs) understand moving images. ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal
: Analyze why current models struggle with temporal grounding compared to human-level understanding. Vilma 1x1