I'm a Research Scholar in Natural Language Processing at the International Institute of Information Technology, Hyderabad (IIIT-H), currently working on text classification, generation, and evaluation metrics. My research primarily focuses on developing resources and methodologies for low-resource Indian languages, with a particular emphasis on Telugu language processing.
I have been actively contributing to the field through several significant projects. I led the development of TeClass, the first human-annotated Telugu news headline classification dataset, which has helped improve headline generation models. I've also worked on Mukhyansh, a comprehensive multilingual dataset for headline generation across eight Indian languages, and contributed to SemRel2024, a collection of semantic textual relatedness datasets covering 14 languages. I have also worked on developing unsupervised approaches for evaluating text fluency, particularly for low-resource Indian languages, to create robust evaluation metrics that don't rely on reference texts.
I'm passionate about bridging the resource gap that exists for Indian languages in NLP. Through my research, I aim to advance the state of natural language processing for underrepresented languages while making these technologies more accessible and effective.
N. Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, I. Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, A. Ayele, Pavan Baswani, Meriem Beloucif, Christian Biemann, Sofia Bourhim, Christine de Kock, Genet Shanko Dekebo, Oumaima Hourrane, Gopichand Kanumolu, Lokesh Madasu, Samuel Rutunda, Manish Shrivastava, T. Solorio, Nirmal Surange, Hailegnaw Getaneh Tilaye, Krishnapriya Vishnubhotla, Genta Indra Winata, Seid Muhie Yimam, Saif Mohammad
Annual Meeting of the Association for Computational Linguistics 2024