InChIKey
A condensed, fixed-length identifier derived from the InChI (International Chemical Identifier) that uniquely represents a chemical substance. It enables rapid database searching and is widely used in chemical informatics for substance identification and data integration.
InChIKey: A Unique Identifier for Chemical Substances
What is the InChIKey and how is it generated?
The InChIKey is a 27-character hash derived from the InChI (International Chemical Identifier) string, which encodes the molecular structure of a chemical compound in a standardized, machine-readable format. It is generated using a two-part hashing algorithm: the first 14 characters represent the molecular skeleton, and the next 8 represent stereochemistry and isotopic information. The final 3 characters are a checksum. This structure ensures that the InChIKey is both human-readable and computationally efficient for database indexing and searching.
How is the InChIKey used in chemical data management?
InChIKeys are extensively used in chemical databases such as PubChem, ChemSpider, and the European Chemicals Agency (ECHA) REACH database to uniquely identify compounds. Because of their fixed length and deterministic nature, they allow for fast lookup and cross-referencing across platforms. Unlike InChI strings, which can be long and complex, InChIKeys are ideal for web-based searches, data exchange, and integration in cheminformatics workflows. They are also used in regulatory submissions, where consistent substance identification is critical for compliance with frameworks such as ISO, REACH, and GHS.
Are there limitations to using InChIKeys?
While InChIKeys are highly effective for unique identification, they are not reversible—meaning you cannot reconstruct the original InChI from the InChIKey. Additionally, different tautomers or ionisation states may produce different InChIKeys, even for the same compound under different conditions. Therefore, careful interpretation is required when using InChIKeys in regulatory or analytical contexts.
Related concepts
InChIKeys are closely related to InChI strings, CAS numbers, and SMILES notation. While CAS numbers are unique identifiers assigned by the Chemical Abstracts Service, InChIKeys offer a more chemically meaningful, structure-based approach. They are also used alongside other identifiers such as EINECS, EC numbers, and molecular formulae in regulatory and procurement systems.