API documentation¶
This part of the documentation is automatically generated from the PubChemPy source code and comments. It contains comprehensive information on every function, class and method available in the PubChemPy library.
Search functions¶
- pubchempy.get_compounds(identifier: str | int | list[str | int], namespace: str = 'cid', searchtype: str | None = None, as_dataframe: bool = False, **kwargs: QueryParam) list[Compound] | pd.DataFrame¶
Retrieve the specified compound records from PubChem.
- Parameters:
identifier – The compound identifier to use as a search query.
namespace – The identifier type, one of cid, name, smiles, sdf, inchi, inchikey or formula.
searchtype – The advanced search type, one of substructure, superstructure or similarity.
as_dataframe – Automatically extract the Compound properties into a pandas DataFrame and return that.
**kwargs – Additional query parameters to pass to the API request.
- Returns:
List of
Compoundobjects, or a pandas DataFrame ifas_dataframe=True.
- pubchempy.get_substances(identifier: str | int | list[str | int], namespace: str = 'sid', as_dataframe: bool = False, **kwargs: QueryParam) list[Substance] | pd.DataFrame¶
Retrieve the specified substance records from PubChem.
- Parameters:
identifier – The substance identifier to use as a search query.
namespace – The identifier type, one of sid, name or sourceid/<source name>.
as_dataframe – Automatically extract the Substance properties into a pandas DataFrame and return that.
**kwargs – Additional query parameters to pass to the API request.
- Returns:
List of
Substanceobjects, or a pandas DataFrame ifas_dataframe=True.
- pubchempy.get_assays(identifier: str | int | list[str | int], namespace: str = 'aid', **kwargs: str | int | float | bool | list[str] | None) list[Assay]¶
Retrieve the specified assay records from PubChem.
- Parameters:
identifier – The assay identifier to use as a search query.
namespace – The identifier type.
**kwargs – Additional query parameters to pass to the API request.
- Returns:
List of
Assayobjects.
- pubchempy.get_properties(properties: str | list[str], identifier: str | int | list[str | int], namespace: str = 'cid', searchtype: str | None = None, as_dataframe: bool = False, **kwargs: QueryParam) list[dict[str, t.Any]] | pd.DataFrame¶
Retrieve the specified compound properties from PubChem.
- Parameters:
properties – The properties to retrieve.
identifier – The compound identifier to use as a search query.
namespace – The identifier type.
searchtype – The advanced search type, one of substructure, superstructure or similarity.
as_dataframe – Automatically extract the properties into a pandas DataFrame.
**kwargs – Additional query parameters to pass to the API request.
- pubchempy.get_synonyms(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) list[dict[str, Any]]¶
Retrieve synonyms (alternative names) for the specified records from PubChem.
Synonyms include systematic names, common names, trade names, registry numbers, and other identifiers associated with compounds, substances, or assays.
- Parameters:
identifier – The identifier to use as a search query.
namespace – The identifier type (e.g., cid, name, smiles for compounds).
domain – The PubChem domain to search (compound or substance).
searchtype – The advanced search type, one of substructure, superstructure or similarity.
**kwargs – Additional parameters to pass to the request.
- Returns:
List of dictionaries containing synonym information for each matching record. Each dictionary contains the record identifier and a list of synonyms.
Objects¶
The PubChem database is organized into three main record types:
Substances: Raw chemical records deposited by data contributors.
Compounds: Standardized and deduplicated chemical records derived from substances.
Assays: Experimental data from biological screening and testing.
PubChemPy has classes to represent each of these record types.
- class pubchempy.Compound(record: dict[str, Any])¶
Represents a standardized chemical structure record from PubChem.
The PubChem Compound database contains standardized and deduplicated chemical structures derived from the Substance database. Each Compound is uniquely identified by a CID (Compound Identifier) and represents a unique chemical structure with calculated properties, descriptors, and associated experimental data.
Examples
>>> compound = Compound.from_cid(2244) # Aspirin >>> print(f"Formula: {compound.molecular_formula}") Formula: C9H8O4 >>> print(f"IUPAC: {compound.iupac_name}") IUPAC: 2-acetyloxybenzoic acid >>> print(f"MW: {compound.molecular_weight}") MW: 180.16
Initialize a Compound with a record dict from the PubChem PUG REST service.
- Parameters:
record – Compound record returned by the PubChem PUG REST service.
Note
Most users will not need to instantiate a Compound instance directly from a record. The
from_cid()class method and theget_compounds()function offer more convenient ways to obtain Compound instances, as they also handle the retrieval of the record from PubChem.- classmethod from_cid(cid: int, **kwargs: str | int | float | bool | list[str] | None) Compound¶
Retrieve the Compound record for the specified CID.
- Parameters:
cid – The PubChem Compound Identifier (CID) to retrieve.
**kwargs – Additional parameters to pass to the request.
Example
c = Compound.from_cid(6819)
- to_dict(properties: list[str] | None = None) dict[str, Any]¶
Return a dict containing Compound property data.
Optionally specify a list of the desired properties to include. If
propertiesis not specified, all properties are included, with the following exceptions:synonyms,aidsandsidsare not included unless explicitly specified. This is because they each require an extra request to the PubChem API to retrieve.canonical_smilesandisomeric_smilesare not included by default, as they are deprecated and have been replaced byconnectivity_smilesandsmilesrespectively.- Parameters:
properties – List of desired properties.
- Returns:
Dictionary of compound data.
- to_series(properties: list[str] | None = None) pd.Series¶
Return a pandas
Seriescontaining Compound data.Optionally specify a list of the desired properties to include as columns. If
propertiesis not specified, all properties are included, with the following exceptions:synonyms,aidsandsidsare not included unless explicitly specified. This is because they each require an extra request to the PubChem API to retrieve.canonical_smilesandisomeric_smilesare not included by default, as they are deprecated and have been replaced byconnectivity_smilesandsmilesrespectively.- Parameters:
properties – List of desired properties.
- property cid: int | None¶
The PubChem Compound Identifier (CID).
Note
When searching using a SMILES or InChI query that is not present in the PubChem Compound database, an automatically generated record may be returned that contains properties that have been calculated on the fly. These records will not have a CID property.
- property synonyms: list[str] | None¶
Ranked list of all the names associated with this Compound.
Requires an extra request. Result is cached.
- property sids: list[int] | None¶
List of Substance Identifiers associated with this Compound.
Requires an extra request. Result is cached.
- property aids: list[int] | None¶
List of Assay Identifiers associated with this Compound.
Requires an extra request. Result is cached.
- property molecular_formula: str | None¶
Molecular formula.
The molecular formula represents the number of atoms of each element in a compound. It does not contain any information about connectivity or structure.
- property molecular_weight: float | None¶
Molecular weight in g/mol.
The molecular weight is the sum of all atomic weights of the constituent atoms in a compound, measured in g/mol. In the absence of explicit isotope labelling, averaged natural abundance is assumed. If an atom bears an explicit isotope label, 100% isotopic purity is assumed at this location.
- property canonical_smiles: str | None¶
Canonical SMILES, with no stereochemistry information (deprecated).
Deprecated since version 1.0.5:
canonical_smilesis deprecated, useconnectivity_smilesinstead.
- property isomeric_smiles: str | None¶
Isomeric SMILES.
Deprecated since version 1.0.5:
isomeric_smilesis deprecated, usesmilesinstead.
- property connectivity_smiles: str | None¶
Connectivity SMILES string.
A canonical SMILES string that includes connectivity information only. It excludes stereochemical and isotopic information.
Replaces the deprecated
canonical_smilesproperty.
- property smiles: str | None¶
Absolute SMILES string (isomeric and canonical).
A canonical SMILES string that includes both stereochemical and isotopic information. This provides the most complete linear representation of the molecular structure.
Replaces the deprecated
isomeric_smilesproperty.
- property inchi: str | None¶
Standard IUPAC International Chemical Identifier (InChI).
The InChI provides a unique, standardized representation of molecular structure that is not dependent on the software used to generate it. It includes connectivity, stereochemistry, and isotopic information in a layered format. This standard version does not allow for user selectable options in dealing with stereochemistry and tautomer layers.
- property inchikey: str | None¶
Standard InChIKey.
A hashed version of the full standard InChI, consisting of 27 characters divided into three blocks separated by hyphens. The InChIKey provides a fixed-length identifier that is more suitable for database indexing and web searches than the full InChI string.
- property iupac_name: str | None¶
Preferred IUPAC name.
The chemical name systematically determined according to IUPAC (International Union of Pure and Applied Chemistry) nomenclature rules. This is the preferred systematic name among the available IUPAC naming styles (Allowed, CAS-like Style, Preferred, Systematic, Traditional).
- property xlogp: float | None¶
XLogP octanol-water partition coefficient.
A computationally generated octanol-water partition coefficient that measures the hydrophilicity or hydrophobicity of a molecule. Higher values indicate more lipophilic (fat-soluble) compounds, while lower values indicate more hydrophilic (water-soluble) compounds.
- property exact_mass: float | None¶
Exact mass in Da (Daltons).
The mass of the most likely isotopic composition for a single molecule, corresponding to the most intense ion/molecule peak in a mass spectrum. This differs from molecular weight in that it uses the exact masses of specific isotopes rather than averaged atomic weights.
- property monoisotopic_mass: float | None¶
Monoisotopic mass in Da (Daltons).
The mass of a molecule calculated using the mass of the most abundant isotope of each element. This provides a single, well-defined mass value useful for high-resolution mass spectrometry applications.
- property tpsa: float | None¶
Topological Polar Surface Area (TPSA).
The topological polar surface area computed using the algorithm described by Ertl et al. TPSA is a commonly used descriptor for predicting drug absorption, as it correlates well with passive molecular transport through membranes. Values are typically expressed in square Ångströms.
- property complexity: float | None¶
Molecular complexity rating.
A measure of molecular complexity computed using the Bertz/Hendrickson/ Ihlenfeldt formula. This descriptor quantifies the structural complexity of a molecule based on factors such as the number of atoms, bonds, rings, and branching patterns.
- property h_bond_donor_count: int | None¶
Number of hydrogen-bond donors in the structure.
Counts functional groups that can donate hydrogen bonds, such as -OH, -NH, and -SH groups. This descriptor is important for predicting drug-like properties and molecular interactions.
- property h_bond_acceptor_count: int | None¶
Number of hydrogen-bond acceptors in the structure.
Counts functional groups that can accept hydrogen bonds, such as oxygen and nitrogen atoms with lone pairs. This descriptor is important for predicting drug-like properties and molecular interactions.
- property rotatable_bond_count: int | None¶
Number of rotatable bonds.
Counts single bonds that can freely rotate, excluding bonds in rings and terminal bonds to hydrogen or methyl groups.
- property fingerprint: str | None¶
Raw padded and hex-encoded structural fingerprint from PubChem.
Returns the raw padded and hex-encoded fingerprint as returned by the PUG REST API. This is the underlying data used to generate the human-readable binary fingerprint via the
cactvs_fingerprintproperty. Most users should usecactvs_fingerprintinstead for substructure analysis and similarity calculations.The PubChem fingerprint data is 881 bits in length. Binary data is stored in one byte increments. This fingerprint is, therefore, 111 bytes in length (888 bits), which includes padding of seven bits at the end to complete the last byte. A four-byte prefix, containing the bit length of the fingerprint (881 bits), increases the stored PubChem fingerprint size to 115 bytes (920 bits). This is then hex-encoded, resulting in a 230-character string.
More information at: ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt
- property cactvs_fingerprint: str | None¶
PubChem CACTVS structural fingerprint as 881-bit binary string.
Returns a binary fingerprint string where each character is a bit representing the presence (1) or absence (0) of specific chemical substructures and features. The 881-bit fingerprint is organized into sections covering:
Section 1: Hierarchical element counts (1-115)
Section 2: Rings in a canonical ring set (116-163)
Section 3: Simple atom pairs (164-218)
Section 4: Simple atom nearest neighbors (219-242)
Section 5: Detailed atom neighborhoods (243-707)
Section 6: Simple SMARTS patterns (708-881)
This fingerprint enables efficient substructure searching, similarity calculations, and chemical clustering.
More information at: ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt
- property heavy_atom_count: int | None¶
Number of heavy atoms (non-hydrogen atoms).
Counts all atoms in the molecule except hydrogen. This is a basic descriptor of molecular size and is used in various chemical calculations and molecular property predictions.
- property isotope_atom_count: int | None¶
Number of atoms with enriched isotopes.
Counts atoms that are specified with non-standard isotopes (e.g., ²H, ¹³C). Most organic molecules have a value of 0 unless they are isotopically labeled for research or analytical purposes.
- property atom_stereo_count: int | None¶
Total number of atoms with tetrahedral (sp³) stereochemistry.
Counts atoms that have tetrahedral stereochemistry. This includes both defined and undefined stereocenters in the molecule.
- property defined_atom_stereo_count: int | None¶
Number of atoms with defined tetrahedral (sp³) stereochemistry.
Counts stereocenters where the absolute configuration is explicitly specified (e.g. R or S). This excludes stereocenters where the configuration is unknown or unspecified.
- property undefined_atom_stereo_count: int | None¶
Number of atoms with undefined tetrahedral (sp³) stereochemistry.
Counts stereocenters where the absolute configuration is not specified or is unknown. These represent potential stereocenters that could have either R or S configuration, but this is not explicitly defined.
- property volume_3d: float | None¶
Analytic volume of the first diverse conformer.
The 3D molecular volume calculated for the default (first diverse) conformer. This descriptor provides information about the space occupied by the molecule in three dimensions.
- property conformer_rmsd_3d: float | None¶
Conformer sampling RMSD in Å.
The root-mean-square deviation of atomic positions between different conformers in the conformer model. This measures the structural diversity of the generated conformer ensemble.
- property effective_rotor_count_3d: int | None¶
Number of effective rotors in the 3D structure.
A count of rotatable bonds that significantly contribute to conformational flexibility. This is often less than the total rotatable bond count as it excludes rotors that have restricted rotation due to steric or electronic effects.
- property pharmacophore_features_3d: list[str] | None¶
3D pharmacophore features present in the molecule.
A list of pharmacophore feature types identified in the 3D structure, such as hydrogen bond donors, acceptors, aromatic rings, and hydrophobic regions. These features are important for drug-target interactions.
- class pubchempy.Atom(aid: int, number: int, x: float | None = None, y: float | None = None, z: float | None = None, charge: int = 0)¶
Class to represent an atom in a
Compound.Initialize with an atom ID, atomic number, coordinates and optional charge.
- Parameters:
aid – Atom ID.
number – Atomic number.
x – X coordinate.
y – Y coordinate.
z – Z coordinate.
charge – Formal charge on atom.
- aid¶
The atom ID within the owning Compound.
- number¶
The atomic number for this atom.
- x¶
The x coordinate for this atom.
- y¶
The y coordinate for this atom.
- z¶
The z coordinate for this atom. Will be
Nonein 2D Compound records.
- charge¶
The formal charge on this atom.
- class pubchempy.Bond(aid1: int, aid2: int, order: BondType = BondType.SINGLE, style: int | None = None)¶
Class to represent a bond between two atoms in a
Compound.Initialize with begin and end atom IDs, bond order and bond style.
- Parameters:
aid1 – Begin atom ID.
aid2 – End atom ID.
order – Bond order.
style – Bond style annotation.
- aid1¶
ID of the begin atom of this bond.
- aid2¶
ID of the end atom of this bond.
- order¶
Bond order.
- style¶
Bond style annotation.
- class pubchempy.Substance(record: dict[str, Any])¶
Represents a raw chemical record as originally deposited to PubChem.
The PubChem Substance database contains chemical records in their original deposited form, before standardization or processing. As a result, it contains duplicates, mixtures, and some records that don’t make chemical sense. This means that Substance records contain fewer calculated properties, however they do have additional information about the original source that deposited the record.
During PubChem’s standardization process, Substances are processed to create standardized Compound records. Multiple Substances may map to the same Compound if they represent the same unique chemical structure. Some Substances may not map to any Compound if they cannot be standardized.
Examples
>>> substance = Substance.from_sid(12345) >>> print(f"Source: {substance.source_name}") Source: KEGG >>> print(f"Depositor ID: {substance.source_id}") Depositor ID: C10159 >>> print(f"Standardized to CID: {substance.standardized_cid}") Standardized to CID: 169683
Initialize a Substance with a record dict from the PubChem PUG REST service.
- Parameters:
record – Substance record returned by the PubChem PUG REST service.
Note
Most users will not need to instantiate a Substance instance directly from a record. The
from_sid()class method and theget_substances()function offer more convenient ways to obtain Substance instances, as they also handle the retrieval of the record from PubChem.- classmethod from_sid(sid: int, **kwargs: str | int | float | bool | list[str] | None) Substance¶
Retrieve the Substance record for the specified SID.
- Parameters:
sid – The PubChem Substance Identifier (SID).
**kwargs – Additional parameters to pass to the request.
Example
s = Substance.from_sid(12345)
- property record: dict[str, Any]¶
The full substance record returned by the PubChem PUG REST service.
- to_dict(properties: list[str] | None = None) dict[str, Any]¶
Return a dict containing Substance property data.
Optionally specify a list of the desired properties to include. If
propertiesis not specified, all properties are included, with the following exceptions:cidsandaidsare not included unless explicitly specified. This is because they each require an extra request to the PubChem API to retrieve.- Parameters:
properties – List of desired properties.
- Returns:
Dictionary of substance data.
- to_series(properties: list[str] | None = None) pd.Series¶
Return a pandas
Seriescontaining Substance data.Optionally specify a list of the desired properties to include as columns. If
propertiesis not specified, all properties are included, with the following exceptions:cidsandaidsare not included unless explicitly specified. This is because they each require an extra request to the PubChem API to retrieve.- Parameters:
properties – List of desired properties.
- property standardized_cid: int | None¶
The CID of the Compound that was standardized from this Substance.
May not exist if this Substance was not standardizable.
- property standardized_compound: Compound | None¶
The
Compoundthat was standardized from this Substance.Requires an extra request. Result is cached. May not exist if this Substance was not standardizable.
- property deposited_compound: Compound | None¶
A
Compoundderived from the unstandardized Substance.This
Compoundis produced from the unstandardized Substance record as deposited. It will not have acidand will be missing most properties.
- class pubchempy.Assay(record: dict[str, Any])¶
Represents a biological assay record from the PubChem BioAssay database.
The PubChem BioAssay database contains experimental data from biological screening and testing programs. Each assay record describes the experimental conditions, methodology, and results for testing chemical compounds against biological targets.
BioAssay records include:
Assay protocol and experimental conditions
Target information (proteins, genes, pathways)
Activity outcome definitions and thresholds
Results data linking compounds to biological activities
Source information and literature references
Assays are identified by their AID (Assay Identifier) and can be retrieved using the
from_aid()class method. The assay data provides the experimental context for understanding compound bioactivity data stored in PubChem.Initialize an Assay with a record dict from the PubChem PUG REST service.
- Parameters:
record – Assay record returned by the PubChem PUG REST service.
Note
Most users will not need to instantiate an Assay instance directly from a record. The
from_aid()class method and theget_assays()function offer more convenient ways to obtain Assay instances, as they also handle the retrieval of the record from PubChem.- classmethod from_aid(aid: int, **kwargs: str | int | float | bool | list[str] | None) Assay¶
Retrieve the Assay record for the specified AID.
- Parameters:
aid – The PubChem Assay Identifier (AID).
**kwargs – Additional parameters to pass to the request.
Example
a = Assay.from_aid(1234)
- to_dict(properties: list[str] | None = None) dict[str, Any]¶
Return a dict containing Assay property data.
Optionally specify a list of the desired properties to include. If
propertiesis not specified, all properties are included.- Parameters:
properties – List of desired properties.
- Returns:
Dictionary of assay data.
- property project_category: ProjectCategory | None¶
Category to distinguish projects funded through MLSCN, MLPCN or other.
Possible values include mlscn, mlpcn, mlscn-ap, mlpcn-ap, literature-extracted, literature-author, literature-publisher, rnaigi.
- property results: list[dict[str, Any]]¶
A list of dictionaries containing details of the results from this Assay.
Identifier functions¶
- pubchempy.get_cids(identifier: str | int | list[str | int], namespace: str = 'name', domain: str = 'compound', searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) list[int]¶
Retrieve Compound Identifiers (CIDs) for the specified query from PubChem.
CIDs are unique numerical identifiers assigned to each standardized compound record in the PubChem Compound database. This function is useful for converting between different identifier types (names, SMILES, InChI, etc.) and CIDs.
- Parameters:
identifier – The identifier to use as a search query.
namespace – The identifier type (e.g. name, smiles, inchi, formula).
domain – The PubChem domain to search (compound, substance, or assay).
searchtype – The advanced search type, one of substructure, superstructure or similarity.
**kwargs – Additional parameters to pass to the request.
- Returns:
List of CIDs (integers) that match the search criteria. Empty list if no matches found.
- pubchempy.get_sids(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) list[int]¶
Retrieve Substance Identifiers (SIDs) for the specified query from PubChem.
SIDs are unique numerical identifiers assigned to each substance record in the PubChem Substance database. This function is useful for finding which substance records are associated with a given compound or other identifier.
- Parameters:
identifier – The identifier to use as a search query.
namespace – The identifier type (e.g., cid, name, smiles for compounds).
domain – The PubChem domain to search (compound, substance, or assay).
searchtype – The advanced search type, one of substructure, superstructure or similarity.
**kwargs – Additional parameters to pass to the request.
- Returns:
List of SIDs (integers) that match the search criteria. Empty list if no matches found.
- pubchempy.get_aids(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) list[int]¶
Retrieve Assay Identifiers (AIDs) for the specified query from PubChem.
AIDs are unique numerical identifiers assigned to each biological assay record in the PubChem BioAssay database. This function is useful for finding which assays have tested a given compound or substance.
- Parameters:
identifier – The identifier to use as a search query.
namespace – The identifier type (e.g., cid, name, smiles).
domain – The PubChem domain to search (compound, substance, or assay).
searchtype – The advanced search type, one of substructure, superstructure or similarity.
**kwargs – Additional parameters to pass to the request.
- Returns:
List of AIDs (integers) that match the search criteria. Empty list if no matches found.
Request functions¶
- pubchempy.download(outformat: str, path: str | PathLike, identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', operation: str | None = None, searchtype: str | None = None, overwrite: bool = False, **kwargs: str | int | float | bool | list[str] | None) None¶
Format can be XML, ASNT/B, JSON, SDF, CSV, PNG, TXT.
- pubchempy.request(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', operation: str | None = None, output: str = 'JSON', searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) HTTPResponse¶
Construct API request from parameters and return the response.
Full specification at https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest
- pubchempy.get(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', operation: str | None = None, output: str = 'JSON', searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) bytes¶
Request wrapper that automatically handles async requests.
- pubchempy.get_json(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', operation: str | None = None, searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) dict[str, Any] | None¶
Request wrapper that automatically parses JSON response into a python dict.
This function suppresses NotFoundError and returns None if no results are found.
- pubchempy.get_sdf(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', operation: str | None = None, searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) str | None¶
Request wrapper that automatically extracts SDF from the response.
This function suppresses NotFoundError and returns None if no results are found.
pandas functions¶
Each of the search functions, get_compounds(), get_substances() and get_properties() has an as_dataframe parameter. When set to True, these functions automatically extract properties from each result in the list into a pandas DataFrame and return that instead of the results themselves.
If you already have a list of Compounds or Substances, the functions below allow a DataFrame to be constructed easily.
Constants¶
- pubchempy.API_BASE = 'https://pubchem.ncbi.nlm.nih.gov/rest/pug'¶
Base URL for the PubChem PUG REST API.
- pubchempy.ELEMENTS: dict[int, str] = {1: 'H', 2: 'He', 3: 'Li', 4: 'Be', 5: 'B', 6: 'C', 7: 'N', 8: 'O', 9: 'F', 10: 'Ne', 11: 'Na', 12: 'Mg', 13: 'Al', 14: 'Si', 15: 'P', 16: 'S', 17: 'Cl', 18: 'Ar', 19: 'K', 20: 'Ca', 21: 'Sc', 22: 'Ti', 23: 'V', 24: 'Cr', 25: 'Mn', 26: 'Fe', 27: 'Co', 28: 'Ni', 29: 'Cu', 30: 'Zn', 31: 'Ga', 32: 'Ge', 33: 'As', 34: 'Se', 35: 'Br', 36: 'Kr', 37: 'Rb', 38: 'Sr', 39: 'Y', 40: 'Zr', 41: 'Nb', 42: 'Mo', 43: 'Tc', 44: 'Ru', 45: 'Rh', 46: 'Pd', 47: 'Ag', 48: 'Cd', 49: 'In', 50: 'Sn', 51: 'Sb', 52: 'Te', 53: 'I', 54: 'Xe', 55: 'Cs', 56: 'Ba', 57: 'La', 58: 'Ce', 59: 'Pr', 60: 'Nd', 61: 'Pm', 62: 'Sm', 63: 'Eu', 64: 'Gd', 65: 'Tb', 66: 'Dy', 67: 'Ho', 68: 'Er', 69: 'Tm', 70: 'Yb', 71: 'Lu', 72: 'Hf', 73: 'Ta', 74: 'W', 75: 'Re', 76: 'Os', 77: 'Ir', 78: 'Pt', 79: 'Au', 80: 'Hg', 81: 'Tl', 82: 'Pb', 83: 'Bi', 84: 'Po', 85: 'At', 86: 'Rn', 87: 'Fr', 88: 'Ra', 89: 'Ac', 90: 'Th', 91: 'Pa', 92: 'U', 93: 'Np', 94: 'Pu', 95: 'Am', 96: 'Cm', 97: 'Bk', 98: 'Cf', 99: 'Es', 100: 'Fm', 101: 'Md', 102: 'No', 103: 'Lr', 104: 'Rf', 105: 'Db', 106: 'Sg', 107: 'Bh', 108: 'Hs', 109: 'Mt', 110: 'Ds', 111: 'Rg', 112: 'Cn', 113: 'Nh', 114: 'Fl', 115: 'Mc', 116: 'Lv', 117: 'Ts', 118: 'Og', 252: 'Lp', 253: 'R', 254: '*', 255: '*'}¶
Dictionary mapping atomic numbers to their element symbols.
This dictionary includes 118 standard chemical elements from Hydrogen (1) to Oganesson (118), plus special atom types used by PubChem for non-standard entities like dummy atoms, R-group labels, and lone pairs.
- pubchempy.PROPERTY_MAP: dict[str, str] = {'atom_stereo_count': 'AtomStereoCount', 'bond_stereo_count': 'BondStereoCount', 'canonical_smiles': 'CanonicalSMILES', 'charge': 'Charge', 'complexity': 'Complexity', 'conformer_count_3d': 'ConformerCount3D', 'conformer_model_rmsd_3d': 'ConformerModelRMSD3D', 'conformer_rmsd_3d': 'ConformerModelRMSD3D', 'connectivity_smiles': 'ConnectivitySMILES', 'covalent_unit_count': 'CovalentUnitCount', 'defined_atom_stereo_count': 'DefinedAtomStereoCount', 'defined_bond_stereo_count': 'DefinedBondStereoCount', 'effective_rotor_count_3d': 'EffectiveRotorCount3D', 'exact_mass': 'ExactMass', 'feature_acceptor_count_3d': 'FeatureAcceptorCount3D', 'feature_anion_count_3d': 'FeatureAnionCount3D', 'feature_cation_count_3d': 'FeatureCationCount3D', 'feature_count_3d': 'FeatureCount3D', 'feature_donor_count_3d': 'FeatureDonorCount3D', 'feature_hydrophobe_count_3d': 'FeatureHydrophobeCount3D', 'feature_ring_count_3d': 'FeatureRingCount3D', 'h_bond_acceptor_count': 'HBondAcceptorCount', 'h_bond_donor_count': 'HBondDonorCount', 'heavy_atom_count': 'HeavyAtomCount', 'inchi': 'InChI', 'inchikey': 'InChIKey', 'isomeric_smiles': 'IsomericSMILES', 'isotope_atom_count': 'IsotopeAtomCount', 'iupac_name': 'IUPACName', 'molecular_formula': 'MolecularFormula', 'molecular_weight': 'MolecularWeight', 'monoisotopic_mass': 'MonoisotopicMass', 'rotatable_bond_count': 'RotatableBondCount', 'smiles': 'SMILES', 'tpsa': 'TPSA', 'undefined_atom_stereo_count': 'UndefinedAtomStereoCount', 'undefined_bond_stereo_count': 'UndefinedBondStereoCount', 'volume_3d': 'Volume3D', 'x_steric_quadrupole_3d': 'XStericQuadrupole3D', 'xlogp': 'XLogP', 'y_steric_quadrupole_3d': 'YStericQuadrupole3D', 'z_steric_quadrupole_3d': 'ZStericQuadrupole3D'}¶
Dictionary mapping property names to their PubChem API equivalents.
Allows properties to optionally be specified as underscore_separated, consistent with Compound attributes.
- class pubchempy.CompoundIdType(*values)¶
Compound record type.
- DEPOSITED = 0¶
Original Deposited Compound
- STANDARDIZED = 1¶
Standardized Form of a Deposited Compound
- COMPONENT = 2¶
Component of a Standardized Compound
- NEUTRALIZED = 3¶
Neutralized Form of a Standardized Compound
- MIXTURE = 4¶
Substance that is a component of a mixture
- TAUTOMER = 5¶
Predicted Tautomer Form
- IONIZED = 6¶
Predicted Ionized pKa Form
- UNKNOWN = 255¶
Unknown Compound Type
- class pubchempy.BondType(*values)¶
Bond Type Information.
- SINGLE = 1¶
Single Bond
- DOUBLE = 2¶
Double Bond
- TRIPLE = 3¶
Triple Bond
- QUADRUPLE = 4¶
Quadruple Bond
- DATIVE = 5¶
Dative Bond
- COMPLEX = 6¶
Complex Bond
- IONIC = 7¶
Ionic Bond
- UNKNOWN = 255¶
Unknown/Unspecified Connectivity
- class pubchempy.CoordinateType(*values)¶
Coordinate Set Type Distinctions.
- TWO_D = 1¶
2D Coordinates
- THREE_D = 2¶
3D Coordinates (should also indicate units, below)
- SUBMITTED = 3¶
Depositor Provided Coordinates
- EXPERIMENTAL = 4¶
Experimentally Determined Coordinates
- COMPUTED = 5¶
Computed Coordinates
- STANDARDIZED = 6¶
Standardized Coordinates
- AUGMENTED = 7¶
Hybrid Original with Computed Coordinates (e.g., explicit H)
- ALIGNED = 8¶
Template used to align drawing
- COMPACT = 9¶
Drawing uses shorthand forms (e.g., COOH, OCH3, Et, etc.)
- UNITS_ANGSTROMS = 10¶
(3D) Coordinate units are Angstroms
- UNITS_NANOMETERS = 11¶
(3D) Coordinate units are nanometers
- UNITS_PIXEL = 12¶
(2D) Coordinate units are pixels
- UNITS_POINTS = 13¶
(2D) Coordinate units are points
- UNITS_STDBONDS = 14¶
(2D) Coordinate units are standard bond lengths (1.0)
- UNITS_UNKNOWN = 255¶
Coordinate units are unknown or unspecified
- class pubchempy.ProjectCategory(*values)¶
To distinguish projects funded through MLSCN, MLPCN or other.
- MLSCN = 1¶
Assay depositions from MLSCN screen center
- MLPCN = 2¶
Assay depositions from MLPCN screen center
- MLSCN_AP = 3¶
Assay depositions from MLSCN assay provider
- MLPCN_AP = 4¶
Assay depositions from MLPCN assay provider
- JOURNAL_ARTICLE = 5¶
To be deprecated and replaced by options 7, 8 & 9
- ASSAY_VENDOR = 6¶
Assay depositions from assay vendors
- LITERATURE_EXTRACTED = 7¶
Data from literature, extracted by curators
- LITERATURE_AUTHOR = 8¶
Data from literature, submitted by author of articles
- LITERATURE_PUBLISHER = 9¶
Data from literature, submitted by journals/publishers
- RNAIGI = 10¶
RNAi screenings from RNAi Global Initiative
- OTHER = 255¶
Other project category
Exceptions¶
- exception pubchempy.ResponseParseError¶
Bases:
PubChemPyErrorPubChem response is uninterpretable.
- exception pubchempy.PubChemHTTPError(code: int, msg: str, details: list[str])¶
Bases:
PubChemPyErrorGeneric error class to handle HTTP error codes.
Initialize with HTTP status code, message, and additional details.
- Parameters:
code – HTTP status code.
msg – Error message.
details – Additional error details from PubChem API.
- exception pubchempy.BadRequestError(code: int, msg: str, details: list[str])¶
Bases:
PubChemHTTPError400: Request is improperly formed (e.g. syntax error in the URL or POST body).
Initialize with HTTP status code, message, and additional details.
- Parameters:
code – HTTP status code.
msg – Error message.
details – Additional error details from PubChem API.
- exception pubchempy.NotFoundError(code: int, msg: str, details: list[str])¶
Bases:
PubChemHTTPError404: The input record was not found (e.g. invalid CID).
Initialize with HTTP status code, message, and additional details.
- Parameters:
code – HTTP status code.
msg – Error message.
details – Additional error details from PubChem API.
- exception pubchempy.MethodNotAllowedError(code: int, msg: str, details: list[str])¶
Bases:
PubChemHTTPError405: Request not allowed (e.g. invalid MIME type in the HTTP Accept header).
Initialize with HTTP status code, message, and additional details.
- Parameters:
code – HTTP status code.
msg – Error message.
details – Additional error details from PubChem API.
- exception pubchempy.ServerError(code: int, msg: str, details: list[str])¶
Bases:
PubChemHTTPError500: Some problem on the server side (e.g. a database server down).
Initialize with HTTP status code, message, and additional details.
- Parameters:
code – HTTP status code.
msg – Error message.
details – Additional error details from PubChem API.
- exception pubchempy.UnimplementedError(code: int, msg: str, details: list[str])¶
Bases:
PubChemHTTPError501: The requested operation has not (yet) been implemented by the server.
Initialize with HTTP status code, message, and additional details.
- Parameters:
code – HTTP status code.
msg – Error message.
details – Additional error details from PubChem API.
- exception pubchempy.ServerBusyError(code: int, msg: str, details: list[str])¶
Bases:
PubChemHTTPError503: Too many requests or server is busy, retry later.
Initialize with HTTP status code, message, and additional details.
- Parameters:
code – HTTP status code.
msg – Error message.
details – Additional error details from PubChem API.
- exception pubchempy.TimeoutError(code: int, msg: str, details: list[str])¶
Bases:
PubChemHTTPError504: The request timed out, from server overload or too broad a request.
See Avoiding TimeoutError for more information.
Initialize with HTTP status code, message, and additional details.
- Parameters:
code – HTTP status code.
msg – Error message.
details – Additional error details from PubChem API.
Changes¶
As of v1.0.3, the
atomsandbondsproperties onCompoundsnow return lists ofAtomandBondobjects, rather than dicts.As of v1.0.2, search functions now return an empty list instead of raising a
NotFoundErrorexception when no results are found.NotFoundErroris still raised when attempting to create aCompoundusing thefrom_cidclass method with an invalid CID.