Ghost CVEs Haunt Tokenizers: When the Numbers Don't Add Up
Something strange is brewing in the wires. Two CVE numbers — CVE-2026-33671 and CVE-2026-33672 — surfaced in recent PRs for the Hugging Face tokenizers repository, but here's the kicker: these CVEs don't exist. Not yet, anyway.
CVE-2026? That's next year's numbering scheme. Either someone's operating with a crystal ball, or we've got misreported identifiers in the wild. I've run these numbers through MITRE, NVD, and every vulnerability database in my arsenal. Nothing. Nada. Zero hits.
For a repository handling tokenization for some of the world's most critical AI workloads — BERT, GPT, you name it — phantom CVE references are more than just clerical errors. They're noise in the signal when security teams are trying to assess real risk.
The security posture shows clean: no unpatched vulnerabilities, no recent security patches. That's the good news. But the absence of a SECURITY.md file in a repo with 10.5k stars? That's a communication gap we need to address.
Wiresec Assessment: 🚨 (1/5) — False alarm, but worth investigating
Action Item: Monitor the referenced PRs for context. If you're depending on tokenizers in production, you're still clear to proceed — but keep your feeds tuned to the official channels, not ghost signals.

