quanteda R library

“quanteda” stands for the quantitative analysis of textual data, and is implemented the R language as a “package” of the same name.

Entirely open-source, quanteda benefits from all of the scrutiny and transparency that demanding users have applied over six years of development. This R package has been downloaded and installed a half a million times by users across the globe, working in fields of academic research, government, and business. Because it is based on open-source software – contrary to other commercial text mining solutions – Quanteda Guru contains no black boxes or proprietary algorithms, only scientifically tested and fully documented open-source functions. Every quanteda function is fully documented and usually references a scientific journal article or textbook. Some of what it implements, furthermore, are innovations created by the developers of quanteda themselves, and referenced in scientific literature. What is implemented in quanteda and Quanteda Guru can be verified, and because of that, trusted.

quanteda is actually the key member of a family of related packages for text analysis, built on R, C++, and Python. These packages include tools for reading and converting documents, providing stopword lists and dictionaries, and proving access to more fundamental language models for analyzing parts-of-speech, recognizing entities, and analyzing linguistic structure. This suite of open-source software is maintained by the Quanteda Initiative, a community interest company founded in 2017 by Kenneth Benoit for the benefit of the open-source text analysis community. Part of the revenues from Quanteda Guru go to support this initiative.

Testimonials for quanteda:

We are a young company based in Zurich and specialize in policy evaluation. Wherever possible, we try to leverage the potential of data science methods in our projects. quanteda is our first choice when dealing with large amounts of text, because it is fast and stable, it offers many possibilities to quickly find patterns and it provides excellent interfaces for further analysis. Our customers are always impressed how much insight can be gained from large amounts of unstructured text data in short time. Without quanteda our job would be less easy and definitely less fun.
— gfzb

This package is amazing! I use it on a daily basis to perform a large number of text analysis tasks. The responsiveness of Ken and the rest of the team is simply amazing. They really pay attention to and help their users. Others at my federal agency now use it frequently for text analysis in R too (once others saw how clean my code was and how fast it ran it didn’t take much convincing). I personally use it to understand complaints submitted by hundreds of thousands of consumers. The whole agency benefits from being able to understand more text from our stakeholders faster. Kudos to the whole team.
— lmkirvan

Quanteda is THE swiss army knife for NLP in R. It combines everything one needs to perform state of the art text analysis: Speed, Clean API, Out-of-the box visualizations, Other NLP API wrappers. Quanteda has played a central role in a major effort to parse large text blocks in order to create word tags, find keywords and text collocations. The pipeline has helped solve a major bottleneck, how to create matches without structured data. This process is being [used] to assist both government and commercial customers to put their text data to work.
— abresle

Features

Corpora

NLP

Analysis

Visualisation

About

Who we are

quanteda R library

Support

FAQs

Help Articles

App Context Help

Quanteda Guru