AVCaps: An Audio-Visual Dataset With Modality-Specific Captions | Publicación