Download 736 740 Zip Now

Clotho is an audio dataset used for intermodal translation (audio-to-text) tasks. It is widely utilized in the (Detection and Classification of Acoustic Scenes and Events) challenges. 📂 Key Data Components

Reference the original paper: Drossos, K., Lipping, S., & Virtanen, T. (2020). "Clotho: an Audio Captioning Dataset." Proc. IEEE ICASSP, pp. 736-740 . Download 736 740 zip

The full development set is approximately 6.5 GB . Clotho is an audio dataset used for intermodal

Visit the DCASE Automated Audio Captioning task page for the most recent version (v2.1). Download 736 740 zip