Correct for different camera viewpoints without needing manual calibration.

A human-language query like "Find the new building" or "Highlight the moved chair."

The scene with a change (e.g., a car moved, a building added).

A visual "heatmap" or mask overlaying the video, showing that the AI successfully located the change requested in the text. Technical Significance