AirLLM allows single 4GB GPU card to run 70B large language models without quantization, distillation or pruning. 8GB vmem to run 405B Llama3.1.
[email protected] is safe to use (health: 50/100)
Get this data programmatically — free, no authentication.
curl https://depscope.dev/api/check/pypi/airllmLast updated · 2024-09-21T02:52:22.091498Z