sentencepiece

cranv0.2.5

Text Tokenization using Byte Pair Encoding and Unigram Modelling. Unsupervised text tokenizer allowing to perform byte pair encoding and unigram modelling. Wraps the 'sentencepiece' library <https://github.com/google/sentencepiece> which provides a language independent tokenizer to split text in words and smaller subword units. The techniques are explained in the

License MPL-2.0weak copyleft0 versions1 maintainers2 deps84 weekly dl
bnosac/sentencepiece
47
/ 100
Health
safe to use

[email protected] is safe to use (health: 47/100)

Health breakdown0 – 100
20/25
maintenance
0/20
popularity
25/25
security
0/15
maturity
2/15
community
Vulnerabilities
0
none known

Health History

Dependency Tree

License Audit

Dependencies (2)
API access

Get this data programmatically — free, no authentication.

curl https://depscope.dev/api/check/cran/sentencepiece

First published · 2026-02-16 01:49:10

Last updated · 2026-02-09T13:40:02+00:00