A Byte Pair Encoding (BPE) tokenizer, which algorithmically follows along the GPT tokenizer(tiktoken), allows you to train your own tokenizer. The tokenizer is capable of handling special tokens and uses a customizable regex pattern for tokenization(includes the gpt4 regex pattern). supports `save` and `load` tokenizers in the `json` and `file` format. The `bpetokenizer` also supports [pretrained](bpetokenizer/pretrained/) tokenizers.
[email protected] low health (38/100) — consider alternatives
Get this data programmatically — free, no authentication.
curl https://depscope.dev/api/check/pypi/bpetokenizerLast updated · 2024-06-06T18:34:54.795691Z