
Language Modeling / LLM
- Babu, A., Wang, C., Tjandra, A., Lakhotia, K., Xu, Q., Goyal, N., Singh, K., von Platen, P., Saraf, Y., Pino, J., Baevski, A., Conneau, A., & Auli, M. (2022). XLS-R: Self-supervised cross-lingual speech representation learning at scale. In Proceedings of Interspeech 2022 (pp. 2278–2282). ISCA. https://doi.org/10.21437/Interspeech.2022-143
- Baevski, A., Zhou, H., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020) (pp. 12449–12460). Curran Associates.
- Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020) (pp. 8440–8451). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.747
- Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86–92. https://doi.org/10.1145/3458723
- Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., S. Wang, L. Wang, & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. In Proceedings of the International Conference on Learning Representations (ICLR 2022).
- Jiang, A., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Le Scao, T., Lavril, T., Wang, T., Lacroix, T., & El Sayed, W. (2023). Mistral: Efficient dense transformer for language modeling [Preprint]. arXiv:2310.06825.
- Kreutzer, J., Caswell, I., Wang, L., Wahab, A., van Esch, D., Ulzii-Orshikh, N., Tapo, A., Subramani, N., Sokolov, A., Sikasote, C., Setyawan, M., Sarin, S., Samb, S., Sagot, B., Rivera, C., Rios, A., Papadimitriou, I., Osei, S., Ortiz Suarez, P., Orife, I., … Adeyemi, M. (2022). Quality at a glance: An audit of web-crawled multilingual datasets. Transactions of the Association for Computational Linguistics, 10, 50–72. https://doi.org/10.1162/tacl_a_00447
- Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). Model cards for model reporting. In Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency (FAccT 2019) (pp. 220–229). ACM. https://doi.org/10.1145/3287560.3287596
- Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., & Finn, C. (2023). Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023).
ASR
- Bérard, A., Besacier, L., Pellegrini, T., & Schwab, D. (2021). Cross-lingual transfer for ASR in low-resource languages [Preprint]. arXiv:2005.04290.
- IARPA Babel Program. (2016). Low resource speech recognition (Research Program overview). Retrieved from https://www.iarpa.gov/research-programs/babel
- Kong, J., Kim, J., & Bae, J. (2020). HiFi-GAN: Generative adversarial networks for efficient and high-fidelity speech synthesis. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020).
- Łańcucki, A. (2021). FastPitch: Parallel text-to-speech with pitch prediction. In Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021) (pp. 6588–6592). IEEE. https://doi.org/10.1109/ICASSP39728.2021.9413381
- Mariani, J., Negri, M., Turchi, M., & Rotolo, M. (2022). Italian dialect ASR using wav2vec 2.0 [Preprint]. arXiv:2205.02732.
- Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). LibriSpeech: An ASR corpus based on public domain audio books. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015) (pp. 5206–5210). IEEE. https://doi.org/10.1109/ICASSP.2015.7178964
- Park, D. S., Chan, W., Zhang, Y., Chiu, C.-C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). SpecAugment: A simple data augmentation method for automatic speech recognition. In Proceedings of INTERSPEECH 2019 (pp. 2613–2617). ISCA. https://doi.org/10.21437/Interspeech.2019-2680
- Pratap, V., Tjandra, A., Shi, B., Tomasello, P., Babu, A., Kundu, S., Elkahky, A., Ni, Z., Vyas, A., Fazel-Zarandi, M., Baevski, A., Adi, Y., Zhang, X., Hsu, W.-N., Conneau, A., & Auli, M. (2024). Scaling speech technology to 1,000+ languages. Journal of Machine Learning Research, 25(97), 1–52.
- Radford, A., Kim, J. W., Xu, T., Brockman, G., Mcleavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. In Proceedings of the 40th International Conference on Machine Learning (ICML 2023) (Vol. 202, pp. 28492–28518). PMLR.
- Slam, W., Li, Y., & Urouvas, N. (2023). Frontier research on low-resource speech recognition technology. Sensors, 23(22), 9096. https://doi.org/10.3390/s23229096
- Wang, Y., & Cao, Y. (2022). Phonetic lexicon design for under-resourced languages: A case study on Tu. In Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022) (pp. 8562–8566). IEEE. https://doi.org/10.1109/ICASSP43922.2022.9746580
- Yksel, H., Krüger, A., & Kirchhoff, K. (2023). NoRefER: A referenceless quality metric for automatic speech recognition [Preprint]. arXiv:2304.00612.
TTS
- Casanova, E., Weber, J., Shulby, C. D., Candido, A. V., Gölge, E., & Ponti, M. A. (2022). YourTTS: Towards zero-shot multi-speaker TTS and voice conversion for everyone. In Proceedings of the 39th International Conference on Machine Learning (ICML 2022) (Vol. 162, pp. 2709–2720). PMLR. Retrieved from https://proceedings.mlr.press/v162/casanova22a.html
- Łańcucki, A. (2021). FastPitch: Parallel text-to-speech with pitch prediction. In Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021) (pp. 6588–6592). IEEE. https://doi.org/10.1109/ICASSP39728.2021.9413381
Evaluation
- Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002) (pp. 311–318). ACL. https://doi.org/10.3115/1073083.1073135
- Popović, M. (2015). chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the 10th Workshop on Statistical Machine Translation (WMT 2015) (pp. 392–395).
- Rei, R., Farinha, A. C., Lavie, A., & Specia, L. (2020). COMET: A neural framework for MT evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020) (pp. 2685–2702). ACL. https://doi.org/10.18653/v1/2020.emnlp-main.213
- Sadat, F., Kazemi, F., & Farzindar, A. (2014). Automatic identification of Arabic dialects in social media. In Proceedings of the 1st Workshop on Arabic Natural Language Processing (VarDial 2014) (pp. 43–53).
- Scherrer, Y., & Ljubešić, N. (2020). Discriminating between similar languages in Swiss German texts. In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2020) (pp. 155–163). ACL. https://doi.org/10.18653/v1/2020.vardial-1.17
- Yoshimura, T., Stølsmark, H., Saino, K., Wang, X., Kubin, G., & Yamagishi, J. (2023). Rethinking mean opinion scores in speech quality assessment. In Proceedings of INTERSPEECH 2023 (pp. 2068–2072). ISCA.
HITL / Active Learning
- Nguyen, A., Wallace, E., Iyyer, M., & Neubig, G. (2022). Active learning for low-resource neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022) (pp. 6413–6428). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.444
