<aside> ✉️ E-Mail
</aside>
<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/93fd7543-4be2-4d67-8382-e0f84ce7e893/Twitter_social_icons_-_circle_-_blue.png" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/93fd7543-4be2-4d67-8382-e0f84ce7e893/Twitter_social_icons_-_circle_-_blue.png" width="40px" /> Twitter
</aside>
<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a03de0ac-c13b-41b1-94c9-954166c44607/github-mark.png" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a03de0ac-c13b-41b1-94c9-954166c44607/github-mark.png" width="40px" /> Github
</aside>
<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/3eccf282-5ed6-4735-b2b4-4818e29a1907/google-scholar-doctor-of-philosophy-university-google-logo-google-743c4083f26230cd6698b60687d7d963.png" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/3eccf282-5ed6-4735-b2b4-4818e29a1907/google-scholar-doctor-of-philosophy-university-google-logo-google-743c4083f26230cd6698b60687d7d963.png" width="40px" /> Google Scholar
</aside>
<aside> 💫 I am a PhD student at Language Technologies Institute at Carnegie Mellon University where I work on research pertaining to making post-training more effective for long-horizon tasks.
I find it compelling that contemporary language models are effective value encoders; given many responses that can come from the model itself or sampled elsewhere, models can a) provide response judgement that can correlate to some notion of preference and b) score responses based on sets of defined checklist or descriptions. Reward shaping has been a key part to making RL pipelines and today we can (although potentially naively) tell the model what values we want to reward. This provides richer learning signal beyond RLVR that would allow for more effective training in long horizon tasks such as SWE tasks or other tasks where a reward is delayed and even for tasks where the reward itself is not clear.
Before moving to Pittsburgh, I was a contributor at EleutherAI where I was part of the team that maintained the LM-Eval Harness, a popular library for evaluating language models. Aside of building better models, I believe equitable and accessible language technologies hinges upon well governed open-source artifacts. As such, I strive to advocate for open source initiatives (initiatives such as EleutherAI and AI2).
Sometime ago, I cofounded an OCR and information extraction solutions startup that was acquired in 2022 by Datasaur, Inc.
</aside>
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Nikhil Kandpal*, Brian Lester*, Colin Raffel* and 24 others including Lintang Sutawika The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions Emmy Liu, Amanda Bertsch, Lintang Sutawika, and 9 others Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Xiang Yue*, Yueqi Song* and 8 others including Lintang Sutawika The Thirteenth International Conference on Learning Representations (ICLR), 2025
Lessons from the Trenches on Reproducible Evaluation of Language Models Stella Biderman*, Hailey Schoelkopf*, Lintang Sutawika* and 27 others arXiv preprint arXiv:2405.14782, 2024.
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling Stella Biderman*, Hailey Schoelkopf* and 11 others including Lintang Sutawika Fortieth International Conference on Machine Learning (ICML), 2023
Emergent and Predictable Memorization in Large Language Models Stella Biderman, USVSN Sai Prashanth, Lintang Sutawika, Hailey Schoelkopf, Quentin Anthony, Shivanshu Purohit, Edward Raf arXiv preprint arXiv:2304.11158, 2023
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting Zheng-Xin Yong, Hailey Schoelkopf, and 12 others including Lintang Sutawika arXiv preprint arXiv:2212.09535, 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, and 386 others including Lintang Sutawika arXiv preprint arXiv:2211.05100, 2022.
Crosslingual Generalization through Multitask Finetuning Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, and 15 others arXiv preprint arXiv:2211.01786, 2022.
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao*, Thomas Wang*, Daniel Hesslow*, Lucile Saulnier*, Stas Bekman*, and 13 others including Lintang Sutawika Findings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Multitask Prompted Training Enables Zero-Shot Task Generalization Victor Sanh*, Albert Webson*, Colin Raffel*, Stephen H. Bach*, and 37 others including Lintang Sutawika 10th International Conference on Learning Representations (ICLR), 2022.