login
RepoCritics — Review. Share. Archive. Every open-source repo.

huggingface/transformers

Wiki: huggingface/transformers

Source: https://github.com/huggingface/transformers

Last synced 2026-06-02 · 493 words · Edit wiki on GitHub →

huggingface/transformers

The mainstream model-definition framework for state-of-the-art transformer models across text, vision, audio, and multimodal — the canonical entry point for "load and run model X".

What it is

A Python library that provides unified architectures and tokenizers for nearly every meaningful transformer model released since 2018 (BERT, GPT, T5, ViT, Whisper, Llama, Qwen, Gemma, DeepSeek, GLM, and ~hundreds more). The same API lets you instantiate, fine-tune, and run inference against any supported model. The library's from_pretrained() pattern + the Hugging Face Hub became the default model-distribution mechanism for open-weights ML. Apache 2.0 licensed.

Key features

  • Unified AutoModel / AutoTokenizer APIs that work across every supported architecture.
  • First-class PyTorch support; TensorFlow and JAX backends for many models.
  • Pipelines abstraction (pipeline("text-generation", model="...")) for one-line inference setup.
  • Quantization support (bitsandbytes, GPTQ, AWQ, FP8) for inference on consumer hardware.
  • Generate API with sampling controls, beam search, contrastive decoding, speculative decoding.
  • Tokenizers backed by a fast Rust implementation (separate tokenizers package).
  • Tight integration with Hugging Face Hub for model + dataset + space discovery.

Tech stack

  • Python primary.
  • PyTorch as the main backend; TF and JAX still supported but secondary.
  • Rust-backed tokenizers package for high-throughput tokenization.
  • Apache 2.0 licensed — clean for commercial use.

When to reach for it

  • You want to load and run any open-weights LLM, vision model, or audio model with a stable API.
  • You're fine-tuning a pretrained model and want a baseline trainer (or want to plug into Trainer + Accelerate).
  • You're building inference services where the model registry is the Hugging Face Hub.

When not to reach for it

  • You need maximum inference throughput in production — use vLLM, TGI, SGLang, or llama.cpp for serving.
  • You want a minimal-codebase ML framework — try tinygrad or karpathy/nanoGPT.
  • You're targeting non-transformer architectures — use the relevant specialized library.

Maturity signal

161k stars, 33k forks, Apache 2.0, last push the day this page was generated. 8-year-old project that defined the modern ML model-distribution stack. The 2,385 open-issues count tracks per-model breakage as new architectures land continuously; the team's release cadence (multiple releases per month) keeps pace. Hugging Face as primary sponsor signals long-term institutional commitment.

Alternatives

  • vllm-project/vllm, huggingface/text-generation-inference, sgl-project/sglang — production inference serving.
  • ggml-org/llama.cpp — use when you need quantized inference on consumer hardware without PyTorch.
  • pytorch/pytorch directly — use when you're implementing custom architectures outside the transformers catalog.
  • mlx-explore/mlx — use when you're targeting Apple Silicon specifically.

Notes

The "model-definition framework" framing is more accurate than "training framework" — transformers defines the architectures, while training infrastructure (Accelerate, DeepSpeed, FSDP) is layered on top. License (Apache 2.0) covers both the library and the canonical model implementations; individual model weights downloaded from the Hub carry their own model-card licenses that downstream users must respect.

Tags

artificial-intelligence, machine-learning, deep-learning, natural-language-processing, large-language-model, python, pytorch, transformer, huggingface, apache-license