datatrove
pypiv0.9.0HuggingFace library to process and filter large amounts of webdata
License Apache-2.0permissive10 versions72 deps
huggingface/datatrove51
/ 100
Health
safe to use
[email protected] is safe to use (health: 51/100)
Health breakdown0 – 100
20/25
maintenance
0/20
popularity
25/25
security
6/15
maturity
0/15
community
Vulnerabilities
0
none known
Health History
Dependency Tree
License Audit
Dependencies (72)
dillfsspechuggingface-hubhumanizelogurumultiprocessnumpytqdmrich;faust-cchardet;pyarrow;python-magic;warcio;datasetsorjson;zstandard;s3fsfasttext-numpy2-wheel;nltk;inscriptis;tldextract;trafilaturatokenizers;ftfy;fasteners;regex;xxhash;pyahocorasick;lightevalspacy[ja]stanza;pyvi;pythainlp;jieba;indic-nlp-library;kiwipiepyurduhack;tensorflowkhmer-nltk;laonlp;botok;pyidaungsu-numpy2;datatrove[io];aiofiles;httpx;aiosqlite;vllm;sglang;bitsandbytes;numpytyper;pyyaml;pandas;transformersray[default];ruffdatatrove[cli];datatrove[io];datatrove[processing];datatrove[multilingual];
API access
Get this data programmatically — free, no authentication.
curl https://depscope.dev/api/check/pypi/datatroveLast updated · 2026-03-04T13:44:33.968520Z