Back
Wikipedia Monthly
Fresh Wikipedia dumps for 340+ languages
Wikipedia Monthly provides automated, monthly Wikipedia dumps for 340+ languages in researcher-friendly formats. It powers NLP research for low-resource languages by making Wikipedia data accessible and up-to-date.
Related posts
Why I stopped trusting the official Wikipedia dataset, and what I did about it
It all started with a DM from a friend, member and contributor to the Moroccan Wikipedia community. "Are you using the current version of Wikipedia? The official dataset is severely outdated. We added so many cool articles nowhere on huggingface" He was right. I was running a 2023 snapshot in 2025.
Introducing Wikipedia Monthly: Fresh, Clean Wikipedia Dumps for NLP & AI Research
Announcing Wikipedia Monthly, an always fresh dataset to support research for low-resource languages