Яндекс.Метрика

Vid_20220422_110945_466.mp4 -

: The file is part of a large-scale collection (40,000 videos) designed to cover a wide range of real-world scenarios, from daily activities to cinematic clips.

: It serves as a test case for how well a Multimodal Large Language Model (MLLM) can describe complex temporal actions. VID_20220422_110945_466.mp4

The project and its associated code are maintained on the ShareGPT4Video GitHub repository, which provides tools for reproducing the paper's results and accessing the full dataset. : The file is part of a large-scale