API documentation

This part of the documentation is automatically generated from the PubChemPy source code and comments. It contains comprehensive information on every function, class and method available in the PubChemPy library.

Search functions

pubchempy.get_compounds(identifier: str | int | list[str | int], namespace: str = 'cid', searchtype: str | None = None, as_dataframe: bool = False, **kwargs: QueryParam) list[Compound] | pd.DataFrame

Retrieve the specified compound records from PubChem.

Parameters:
  • identifier – The compound identifier to use as a search query.

  • namespace – The identifier type, one of cid, name, smiles, sdf, inchi, inchikey or formula.

  • searchtype – The advanced search type, one of substructure, superstructure or similarity.

  • as_dataframe – Automatically extract the Compound properties into a pandas DataFrame and return that.

  • **kwargs – Additional query parameters to pass to the API request.

Returns:

List of Compound objects, or a pandas DataFrame if as_dataframe=True.

pubchempy.get_substances(identifier: str | int | list[str | int], namespace: str = 'sid', as_dataframe: bool = False, **kwargs: QueryParam) list[Substance] | pd.DataFrame

Retrieve the specified substance records from PubChem.

Parameters:
  • identifier – The substance identifier to use as a search query.

  • namespace – The identifier type, one of sid, name or sourceid/<source name>.

  • as_dataframe – Automatically extract the Substance properties into a pandas DataFrame and return that.

  • **kwargs – Additional query parameters to pass to the API request.

Returns:

List of Substance objects, or a pandas DataFrame if as_dataframe=True.

pubchempy.get_assays(identifier: str | int | list[str | int], namespace: str = 'aid', **kwargs: str | int | float | bool | list[str] | None) list[Assay]

Retrieve the specified assay records from PubChem.

Parameters:
  • identifier – The assay identifier to use as a search query.

  • namespace – The identifier type.

  • **kwargs – Additional query parameters to pass to the API request.

Returns:

List of Assay objects.

pubchempy.get_properties(properties: str | list[str], identifier: str | int | list[str | int], namespace: str = 'cid', searchtype: str | None = None, as_dataframe: bool = False, **kwargs: QueryParam) list[dict[str, t.Any]] | pd.DataFrame

Retrieve the specified compound properties from PubChem.

Parameters:
  • properties – The properties to retrieve.

  • identifier – The compound identifier to use as a search query.

  • namespace – The identifier type.

  • searchtype – The advanced search type, one of substructure, superstructure or similarity.

  • as_dataframe – Automatically extract the properties into a pandas DataFrame.

  • **kwargs – Additional query parameters to pass to the API request.

pubchempy.get_synonyms(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) list[dict[str, Any]]

Retrieve synonyms (alternative names) for the specified records from PubChem.

Synonyms include systematic names, common names, trade names, registry numbers, and other identifiers associated with compounds, substances, or assays.

Parameters:
  • identifier – The identifier to use as a search query.

  • namespace – The identifier type (e.g., cid, name, smiles for compounds).

  • domain – The PubChem domain to search (compound or substance).

  • searchtype – The advanced search type, one of substructure, superstructure or similarity.

  • **kwargs – Additional parameters to pass to the request.

Returns:

List of dictionaries containing synonym information for each matching record. Each dictionary contains the record identifier and a list of synonyms.

Objects

The PubChem database is organized into three main record types:

  • Substances: Raw chemical records deposited by data contributors.

  • Compounds: Standardized and deduplicated chemical records derived from substances.

  • Assays: Experimental data from biological screening and testing.

PubChemPy has classes to represent each of these record types.

class pubchempy.Compound(record: dict[str, Any])

Represents a standardized chemical structure record from PubChem.

The PubChem Compound database contains standardized and deduplicated chemical structures derived from the Substance database. Each Compound is uniquely identified by a CID (Compound Identifier) and represents a unique chemical structure with calculated properties, descriptors, and associated experimental data.

Examples

>>> compound = Compound.from_cid(2244)  # Aspirin
>>> print(f"Formula: {compound.molecular_formula}")
Formula: C9H8O4
>>> print(f"IUPAC: {compound.iupac_name}")
IUPAC: 2-acetyloxybenzoic acid
>>> print(f"MW: {compound.molecular_weight}")
MW: 180.16

Initialize a Compound with a record dict from the PubChem PUG REST service.

Parameters:

record – Compound record returned by the PubChem PUG REST service.

Note

Most users will not need to instantiate a Compound instance directly from a record. The from_cid() class method and the get_compounds() function offer more convenient ways to obtain Compound instances, as they also handle the retrieval of the record from PubChem.

classmethod from_cid(cid: int, **kwargs: str | int | float | bool | list[str] | None) Compound

Retrieve the Compound record for the specified CID.

Parameters:
  • cid – The PubChem Compound Identifier (CID) to retrieve.

  • **kwargs – Additional parameters to pass to the request.

Example

c = Compound.from_cid(6819)

property record: dict[str, Any]

The full compound record returned by the PubChem PUG REST service.

to_dict(properties: list[str] | None = None) dict[str, Any]

Return a dict containing Compound property data.

Optionally specify a list of the desired properties to include. If properties is not specified, all properties are included, with the following exceptions:

synonyms, aids and sids are not included unless explicitly specified. This is because they each require an extra request to the PubChem API to retrieve.

canonical_smiles and isomeric_smiles are not included by default, as they are deprecated and have been replaced by connectivity_smiles and smiles respectively.

Parameters:

properties – List of desired properties.

Returns:

Dictionary of compound data.

to_series(properties: list[str] | None = None) pd.Series

Return a pandas Series containing Compound data.

Optionally specify a list of the desired properties to include as columns. If properties is not specified, all properties are included, with the following exceptions:

synonyms, aids and sids are not included unless explicitly specified. This is because they each require an extra request to the PubChem API to retrieve.

canonical_smiles and isomeric_smiles are not included by default, as they are deprecated and have been replaced by connectivity_smiles and smiles respectively.

Parameters:

properties – List of desired properties.

property cid: int | None

The PubChem Compound Identifier (CID).

Note

When searching using a SMILES or InChI query that is not present in the PubChem Compound database, an automatically generated record may be returned that contains properties that have been calculated on the fly. These records will not have a CID property.

property elements: list[str]

List of element symbols for atoms in this Compound.

property atoms: list[Atom]

List of Atoms in this Compound.

property bonds: list[Bond]

List of Bonds in this Compound.

property synonyms: list[str] | None

Ranked list of all the names associated with this Compound.

Requires an extra request. Result is cached.

property sids: list[int] | None

List of Substance Identifiers associated with this Compound.

Requires an extra request. Result is cached.

property aids: list[int] | None

List of Assay Identifiers associated with this Compound.

Requires an extra request. Result is cached.

property coordinate_type: str | None

Whether this Compound has 2D or 3D coordinates.

property charge: int

Formal charge on this Compound.

property molecular_formula: str | None

Molecular formula.

The molecular formula represents the number of atoms of each element in a compound. It does not contain any information about connectivity or structure.

property molecular_weight: float | None

Molecular weight in g/mol.

The molecular weight is the sum of all atomic weights of the constituent atoms in a compound, measured in g/mol. In the absence of explicit isotope labelling, averaged natural abundance is assumed. If an atom bears an explicit isotope label, 100% isotopic purity is assumed at this location.

property canonical_smiles: str | None

Canonical SMILES, with no stereochemistry information (deprecated).

Deprecated since version 1.0.5: canonical_smiles is deprecated, use connectivity_smiles instead.

property isomeric_smiles: str | None

Isomeric SMILES.

Deprecated since version 1.0.5: isomeric_smiles is deprecated, use smiles instead.

property connectivity_smiles: str | None

Connectivity SMILES string.

A canonical SMILES string that includes connectivity information only. It excludes stereochemical and isotopic information.

Replaces the deprecated canonical_smiles property.

property smiles: str | None

Absolute SMILES string (isomeric and canonical).

A canonical SMILES string that includes both stereochemical and isotopic information. This provides the most complete linear representation of the molecular structure.

Replaces the deprecated isomeric_smiles property.

property inchi: str | None

Standard IUPAC International Chemical Identifier (InChI).

The InChI provides a unique, standardized representation of molecular structure that is not dependent on the software used to generate it. It includes connectivity, stereochemistry, and isotopic information in a layered format. This standard version does not allow for user selectable options in dealing with stereochemistry and tautomer layers.

property inchikey: str | None

Standard InChIKey.

A hashed version of the full standard InChI, consisting of 27 characters divided into three blocks separated by hyphens. The InChIKey provides a fixed-length identifier that is more suitable for database indexing and web searches than the full InChI string.

property iupac_name: str | None

Preferred IUPAC name.

The chemical name systematically determined according to IUPAC (International Union of Pure and Applied Chemistry) nomenclature rules. This is the preferred systematic name among the available IUPAC naming styles (Allowed, CAS-like Style, Preferred, Systematic, Traditional).

property xlogp: float | None

XLogP octanol-water partition coefficient.

A computationally generated octanol-water partition coefficient that measures the hydrophilicity or hydrophobicity of a molecule. Higher values indicate more lipophilic (fat-soluble) compounds, while lower values indicate more hydrophilic (water-soluble) compounds.

property exact_mass: float | None

Exact mass in Da (Daltons).

The mass of the most likely isotopic composition for a single molecule, corresponding to the most intense ion/molecule peak in a mass spectrum. This differs from molecular weight in that it uses the exact masses of specific isotopes rather than averaged atomic weights.

property monoisotopic_mass: float | None

Monoisotopic mass in Da (Daltons).

The mass of a molecule calculated using the mass of the most abundant isotope of each element. This provides a single, well-defined mass value useful for high-resolution mass spectrometry applications.

property tpsa: float | None

Topological Polar Surface Area (TPSA).

The topological polar surface area computed using the algorithm described by Ertl et al. TPSA is a commonly used descriptor for predicting drug absorption, as it correlates well with passive molecular transport through membranes. Values are typically expressed in square Ångströms.

property complexity: float | None

Molecular complexity rating.

A measure of molecular complexity computed using the Bertz/Hendrickson/ Ihlenfeldt formula. This descriptor quantifies the structural complexity of a molecule based on factors such as the number of atoms, bonds, rings, and branching patterns.

property h_bond_donor_count: int | None

Number of hydrogen-bond donors in the structure.

Counts functional groups that can donate hydrogen bonds, such as -OH, -NH, and -SH groups. This descriptor is important for predicting drug-like properties and molecular interactions.

property h_bond_acceptor_count: int | None

Number of hydrogen-bond acceptors in the structure.

Counts functional groups that can accept hydrogen bonds, such as oxygen and nitrogen atoms with lone pairs. This descriptor is important for predicting drug-like properties and molecular interactions.

property rotatable_bond_count: int | None

Number of rotatable bonds.

Counts single bonds that can freely rotate, excluding bonds in rings and terminal bonds to hydrogen or methyl groups.

property fingerprint: str | None

Raw padded and hex-encoded structural fingerprint from PubChem.

Returns the raw padded and hex-encoded fingerprint as returned by the PUG REST API. This is the underlying data used to generate the human-readable binary fingerprint via the cactvs_fingerprint property. Most users should use cactvs_fingerprint instead for substructure analysis and similarity calculations.

The PubChem fingerprint data is 881 bits in length. Binary data is stored in one byte increments. This fingerprint is, therefore, 111 bytes in length (888 bits), which includes padding of seven bits at the end to complete the last byte. A four-byte prefix, containing the bit length of the fingerprint (881 bits), increases the stored PubChem fingerprint size to 115 bytes (920 bits). This is then hex-encoded, resulting in a 230-character string.

More information at: ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt

property cactvs_fingerprint: str | None

PubChem CACTVS structural fingerprint as 881-bit binary string.

Returns a binary fingerprint string where each character is a bit representing the presence (1) or absence (0) of specific chemical substructures and features. The 881-bit fingerprint is organized into sections covering:

  • Section 1: Hierarchical element counts (1-115)

  • Section 2: Rings in a canonical ring set (116-163)

  • Section 3: Simple atom pairs (164-218)

  • Section 4: Simple atom nearest neighbors (219-242)

  • Section 5: Detailed atom neighborhoods (243-707)

  • Section 6: Simple SMARTS patterns (708-881)

This fingerprint enables efficient substructure searching, similarity calculations, and chemical clustering.

More information at: ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt

property heavy_atom_count: int | None

Number of heavy atoms (non-hydrogen atoms).

Counts all atoms in the molecule except hydrogen. This is a basic descriptor of molecular size and is used in various chemical calculations and molecular property predictions.

property isotope_atom_count: int | None

Number of atoms with enriched isotopes.

Counts atoms that are specified with non-standard isotopes (e.g., ²H, ¹³C). Most organic molecules have a value of 0 unless they are isotopically labeled for research or analytical purposes.

property atom_stereo_count: int | None

Total number of atoms with tetrahedral (sp³) stereochemistry.

Counts atoms that have tetrahedral stereochemistry. This includes both defined and undefined stereocenters in the molecule.

property defined_atom_stereo_count: int | None

Number of atoms with defined tetrahedral (sp³) stereochemistry.

Counts stereocenters where the absolute configuration is explicitly specified (e.g. R or S). This excludes stereocenters where the configuration is unknown or unspecified.

property undefined_atom_stereo_count: int | None

Number of atoms with undefined tetrahedral (sp³) stereochemistry.

Counts stereocenters where the absolute configuration is not specified or is unknown. These represent potential stereocenters that could have either R or S configuration, but this is not explicitly defined.

property bond_stereo_count: int | None

Bond stereocenter count.

property defined_bond_stereo_count: int | None

Defined bond stereocenter count.

property undefined_bond_stereo_count: int | None

Undefined bond stereocenter count.

property covalent_unit_count: int | None

Covalently-bonded unit count.

property volume_3d: float | None

Analytic volume of the first diverse conformer.

The 3D molecular volume calculated for the default (first diverse) conformer. This descriptor provides information about the space occupied by the molecule in three dimensions.

property conformer_rmsd_3d: float | None

Conformer sampling RMSD in Å.

The root-mean-square deviation of atomic positions between different conformers in the conformer model. This measures the structural diversity of the generated conformer ensemble.

property effective_rotor_count_3d: int | None

Number of effective rotors in the 3D structure.

A count of rotatable bonds that significantly contribute to conformational flexibility. This is often less than the total rotatable bond count as it excludes rotors that have restricted rotation due to steric or electronic effects.

property pharmacophore_features_3d: list[str] | None

3D pharmacophore features present in the molecule.

A list of pharmacophore feature types identified in the 3D structure, such as hydrogen bond donors, acceptors, aromatic rings, and hydrophobic regions. These features are important for drug-target interactions.

class pubchempy.Atom(aid: int, number: int, x: float | None = None, y: float | None = None, z: float | None = None, charge: int = 0)

Class to represent an atom in a Compound.

Initialize with an atom ID, atomic number, coordinates and optional charge.

Parameters:
  • aid – Atom ID.

  • number – Atomic number.

  • x – X coordinate.

  • y – Y coordinate.

  • z – Z coordinate.

  • charge – Formal charge on atom.

aid

The atom ID within the owning Compound.

number

The atomic number for this atom.

x

The x coordinate for this atom.

y

The y coordinate for this atom.

z

The z coordinate for this atom. Will be None in 2D Compound records.

charge

The formal charge on this atom.

property element: str

The element symbol for this atom.

to_dict() dict[str, Any]

Return a dictionary containing Atom data.

set_coordinates(x: float, y: float, z: float | None = None) None

Set all coordinate dimensions at once.

property coordinate_type: str

Whether this atom has 2D or 3D coordinates.

class pubchempy.Bond(aid1: int, aid2: int, order: BondType = BondType.SINGLE, style: int | None = None)

Class to represent a bond between two atoms in a Compound.

Initialize with begin and end atom IDs, bond order and bond style.

Parameters:
  • aid1 – Begin atom ID.

  • aid2 – End atom ID.

  • order – Bond order.

  • style – Bond style annotation.

aid1

ID of the begin atom of this bond.

aid2

ID of the end atom of this bond.

order

Bond order.

style

Bond style annotation.

to_dict() dict[str, Any]

Return a dictionary containing Bond data.

class pubchempy.Substance(record: dict[str, Any])

Represents a raw chemical record as originally deposited to PubChem.

The PubChem Substance database contains chemical records in their original deposited form, before standardization or processing. As a result, it contains duplicates, mixtures, and some records that don’t make chemical sense. This means that Substance records contain fewer calculated properties, however they do have additional information about the original source that deposited the record.

During PubChem’s standardization process, Substances are processed to create standardized Compound records. Multiple Substances may map to the same Compound if they represent the same unique chemical structure. Some Substances may not map to any Compound if they cannot be standardized.

Examples

>>> substance = Substance.from_sid(12345)
>>> print(f"Source: {substance.source_name}")
Source: KEGG
>>> print(f"Depositor ID: {substance.source_id}")
Depositor ID: C10159
>>> print(f"Standardized to CID: {substance.standardized_cid}")
Standardized to CID: 169683

Initialize a Substance with a record dict from the PubChem PUG REST service.

Parameters:

record – Substance record returned by the PubChem PUG REST service.

Note

Most users will not need to instantiate a Substance instance directly from a record. The from_sid() class method and the get_substances() function offer more convenient ways to obtain Substance instances, as they also handle the retrieval of the record from PubChem.

classmethod from_sid(sid: int, **kwargs: str | int | float | bool | list[str] | None) Substance

Retrieve the Substance record for the specified SID.

Parameters:
  • sid – The PubChem Substance Identifier (SID).

  • **kwargs – Additional parameters to pass to the request.

Example

s = Substance.from_sid(12345)

property record: dict[str, Any]

The full substance record returned by the PubChem PUG REST service.

to_dict(properties: list[str] | None = None) dict[str, Any]

Return a dict containing Substance property data.

Optionally specify a list of the desired properties to include. If properties is not specified, all properties are included, with the following exceptions:

cids and aids are not included unless explicitly specified. This is because they each require an extra request to the PubChem API to retrieve.

Parameters:

properties – List of desired properties.

Returns:

Dictionary of substance data.

to_series(properties: list[str] | None = None) pd.Series

Return a pandas Series containing Substance data.

Optionally specify a list of the desired properties to include as columns. If properties is not specified, all properties are included, with the following exceptions:

cids and aids are not included unless explicitly specified. This is because they each require an extra request to the PubChem API to retrieve.

Parameters:

properties – List of desired properties.

property sid: int

The PubChem Substance Idenfitier (SID).

property synonyms: list[str] | None

A ranked list of all the names associated with this Substance.

property source_name: str

The name of the PubChem depositor that was the source of this Substance.

property source_id: str

Unique ID for this Substance from the PubChem depositor source.

property standardized_cid: int | None

The CID of the Compound that was standardized from this Substance.

May not exist if this Substance was not standardizable.

property standardized_compound: Compound | None

The Compound that was standardized from this Substance.

Requires an extra request. Result is cached. May not exist if this Substance was not standardizable.

property deposited_compound: Compound | None

A Compound derived from the unstandardized Substance.

This Compound is produced from the unstandardized Substance record as deposited. It will not have a cid and will be missing most properties.

property cids: list[int]

A list of all CIDs for Compounds that were standardized from this Substance.

Requires an extra request. Result is cached.

property aids: list[int]

A list of all AIDs for Assays associated with this Substance.

Requires an extra request. Result is cached.

class pubchempy.Assay(record: dict[str, Any])

Represents a biological assay record from the PubChem BioAssay database.

The PubChem BioAssay database contains experimental data from biological screening and testing programs. Each assay record describes the experimental conditions, methodology, and results for testing chemical compounds against biological targets.

BioAssay records include:

  • Assay protocol and experimental conditions

  • Target information (proteins, genes, pathways)

  • Activity outcome definitions and thresholds

  • Results data linking compounds to biological activities

  • Source information and literature references

Assays are identified by their AID (Assay Identifier) and can be retrieved using the from_aid() class method. The assay data provides the experimental context for understanding compound bioactivity data stored in PubChem.

Initialize an Assay with a record dict from the PubChem PUG REST service.

Parameters:

record – Assay record returned by the PubChem PUG REST service.

Note

Most users will not need to instantiate an Assay instance directly from a record. The from_aid() class method and the get_assays() function offer more convenient ways to obtain Assay instances, as they also handle the retrieval of the record from PubChem.

classmethod from_aid(aid: int, **kwargs: str | int | float | bool | list[str] | None) Assay

Retrieve the Assay record for the specified AID.

Parameters:
  • aid – The PubChem Assay Identifier (AID).

  • **kwargs – Additional parameters to pass to the request.

Example

a = Assay.from_aid(1234)

property record: dict[str, Any]

The full assay record returned by the PubChem PUG REST service.

to_dict(properties: list[str] | None = None) dict[str, Any]

Return a dict containing Assay property data.

Optionally specify a list of the desired properties to include. If properties is not specified, all properties are included.

Parameters:

properties – List of desired properties.

Returns:

Dictionary of assay data.

property aid: int

The PubChem Assay Idenfitier (AID).

property name: str

The short assay name, used for display purposes.

property description: str

Description.

property project_category: ProjectCategory | None

Category to distinguish projects funded through MLSCN, MLPCN or other.

Possible values include mlscn, mlpcn, mlscn-ap, mlpcn-ap, literature-extracted, literature-author, literature-publisher, rnaigi.

property comments: list[str]

Comments and additional information.

property results: list[dict[str, Any]]

A list of dictionaries containing details of the results from this Assay.

property target: list[dict[str, Any]] | None

A list of dictionaries containing details of the Assay targets.

property revision: int

Revision identifier for textual description.

property aid_version: int

Incremented when the original depositor updates the record.

Identifier functions

pubchempy.get_cids(identifier: str | int | list[str | int], namespace: str = 'name', domain: str = 'compound', searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) list[int]

Retrieve Compound Identifiers (CIDs) for the specified query from PubChem.

CIDs are unique numerical identifiers assigned to each standardized compound record in the PubChem Compound database. This function is useful for converting between different identifier types (names, SMILES, InChI, etc.) and CIDs.

Parameters:
  • identifier – The identifier to use as a search query.

  • namespace – The identifier type (e.g. name, smiles, inchi, formula).

  • domain – The PubChem domain to search (compound, substance, or assay).

  • searchtype – The advanced search type, one of substructure, superstructure or similarity.

  • **kwargs – Additional parameters to pass to the request.

Returns:

List of CIDs (integers) that match the search criteria. Empty list if no matches found.

pubchempy.get_sids(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) list[int]

Retrieve Substance Identifiers (SIDs) for the specified query from PubChem.

SIDs are unique numerical identifiers assigned to each substance record in the PubChem Substance database. This function is useful for finding which substance records are associated with a given compound or other identifier.

Parameters:
  • identifier – The identifier to use as a search query.

  • namespace – The identifier type (e.g., cid, name, smiles for compounds).

  • domain – The PubChem domain to search (compound, substance, or assay).

  • searchtype – The advanced search type, one of substructure, superstructure or similarity.

  • **kwargs – Additional parameters to pass to the request.

Returns:

List of SIDs (integers) that match the search criteria. Empty list if no matches found.

pubchempy.get_aids(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) list[int]

Retrieve Assay Identifiers (AIDs) for the specified query from PubChem.

AIDs are unique numerical identifiers assigned to each biological assay record in the PubChem BioAssay database. This function is useful for finding which assays have tested a given compound or substance.

Parameters:
  • identifier – The identifier to use as a search query.

  • namespace – The identifier type (e.g., cid, name, smiles).

  • domain – The PubChem domain to search (compound, substance, or assay).

  • searchtype – The advanced search type, one of substructure, superstructure or similarity.

  • **kwargs – Additional parameters to pass to the request.

Returns:

List of AIDs (integers) that match the search criteria. Empty list if no matches found.

Request functions

pubchempy.download(outformat: str, path: str | PathLike, identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', operation: str | None = None, searchtype: str | None = None, overwrite: bool = False, **kwargs: str | int | float | bool | list[str] | None) None

Format can be XML, ASNT/B, JSON, SDF, CSV, PNG, TXT.

pubchempy.request(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', operation: str | None = None, output: str = 'JSON', searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) HTTPResponse

Construct API request from parameters and return the response.

Full specification at https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest

pubchempy.get(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', operation: str | None = None, output: str = 'JSON', searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) bytes

Request wrapper that automatically handles async requests.

pubchempy.get_json(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', operation: str | None = None, searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) dict[str, Any] | None

Request wrapper that automatically parses JSON response into a python dict.

This function suppresses NotFoundError and returns None if no results are found.

pubchempy.get_sdf(identifier: str | int | list[str | int], namespace: str = 'cid', domain: str = 'compound', operation: str | None = None, searchtype: str | None = None, **kwargs: str | int | float | bool | list[str] | None) str | None

Request wrapper that automatically extracts SDF from the response.

This function suppresses NotFoundError and returns None if no results are found.

pandas functions

Each of the search functions, get_compounds(), get_substances() and get_properties() has an as_dataframe parameter. When set to True, these functions automatically extract properties from each result in the list into a pandas DataFrame and return that instead of the results themselves.

If you already have a list of Compounds or Substances, the functions below allow a DataFrame to be constructed easily.

pubchempy.compounds_to_frame(compounds: list[Compound] | Compound, properties: list[str] | None = None) pd.DataFrame

Create a DataFrame from a Compound list.

Optionally specify the desired Compound properties to include as columns in the pandas DataFrame.

pubchempy.substances_to_frame(substances: list[Substance] | Substance, properties: list[str] | None = None) pd.DataFrame

Create a DataFrame from a Substance list.

Optionally specify a list of the desired Substance properties to include as columns in the pandas DataFrame.

Constants

pubchempy.API_BASE = 'https://pubchem.ncbi.nlm.nih.gov/rest/pug'

Base URL for the PubChem PUG REST API.

pubchempy.ELEMENTS: dict[int, str] = {1: 'H', 2: 'He', 3: 'Li', 4: 'Be', 5: 'B', 6: 'C', 7: 'N', 8: 'O', 9: 'F', 10: 'Ne', 11: 'Na', 12: 'Mg', 13: 'Al', 14: 'Si', 15: 'P', 16: 'S', 17: 'Cl', 18: 'Ar', 19: 'K', 20: 'Ca', 21: 'Sc', 22: 'Ti', 23: 'V', 24: 'Cr', 25: 'Mn', 26: 'Fe', 27: 'Co', 28: 'Ni', 29: 'Cu', 30: 'Zn', 31: 'Ga', 32: 'Ge', 33: 'As', 34: 'Se', 35: 'Br', 36: 'Kr', 37: 'Rb', 38: 'Sr', 39: 'Y', 40: 'Zr', 41: 'Nb', 42: 'Mo', 43: 'Tc', 44: 'Ru', 45: 'Rh', 46: 'Pd', 47: 'Ag', 48: 'Cd', 49: 'In', 50: 'Sn', 51: 'Sb', 52: 'Te', 53: 'I', 54: 'Xe', 55: 'Cs', 56: 'Ba', 57: 'La', 58: 'Ce', 59: 'Pr', 60: 'Nd', 61: 'Pm', 62: 'Sm', 63: 'Eu', 64: 'Gd', 65: 'Tb', 66: 'Dy', 67: 'Ho', 68: 'Er', 69: 'Tm', 70: 'Yb', 71: 'Lu', 72: 'Hf', 73: 'Ta', 74: 'W', 75: 'Re', 76: 'Os', 77: 'Ir', 78: 'Pt', 79: 'Au', 80: 'Hg', 81: 'Tl', 82: 'Pb', 83: 'Bi', 84: 'Po', 85: 'At', 86: 'Rn', 87: 'Fr', 88: 'Ra', 89: 'Ac', 90: 'Th', 91: 'Pa', 92: 'U', 93: 'Np', 94: 'Pu', 95: 'Am', 96: 'Cm', 97: 'Bk', 98: 'Cf', 99: 'Es', 100: 'Fm', 101: 'Md', 102: 'No', 103: 'Lr', 104: 'Rf', 105: 'Db', 106: 'Sg', 107: 'Bh', 108: 'Hs', 109: 'Mt', 110: 'Ds', 111: 'Rg', 112: 'Cn', 113: 'Nh', 114: 'Fl', 115: 'Mc', 116: 'Lv', 117: 'Ts', 118: 'Og', 252: 'Lp', 253: 'R', 254: '*', 255: '*'}

Dictionary mapping atomic numbers to their element symbols.

This dictionary includes 118 standard chemical elements from Hydrogen (1) to Oganesson (118), plus special atom types used by PubChem for non-standard entities like dummy atoms, R-group labels, and lone pairs.

pubchempy.PROPERTY_MAP: dict[str, str] = {'atom_stereo_count': 'AtomStereoCount', 'bond_stereo_count': 'BondStereoCount', 'canonical_smiles': 'CanonicalSMILES', 'charge': 'Charge', 'complexity': 'Complexity', 'conformer_count_3d': 'ConformerCount3D', 'conformer_model_rmsd_3d': 'ConformerModelRMSD3D', 'conformer_rmsd_3d': 'ConformerModelRMSD3D', 'connectivity_smiles': 'ConnectivitySMILES', 'covalent_unit_count': 'CovalentUnitCount', 'defined_atom_stereo_count': 'DefinedAtomStereoCount', 'defined_bond_stereo_count': 'DefinedBondStereoCount', 'effective_rotor_count_3d': 'EffectiveRotorCount3D', 'exact_mass': 'ExactMass', 'feature_acceptor_count_3d': 'FeatureAcceptorCount3D', 'feature_anion_count_3d': 'FeatureAnionCount3D', 'feature_cation_count_3d': 'FeatureCationCount3D', 'feature_count_3d': 'FeatureCount3D', 'feature_donor_count_3d': 'FeatureDonorCount3D', 'feature_hydrophobe_count_3d': 'FeatureHydrophobeCount3D', 'feature_ring_count_3d': 'FeatureRingCount3D', 'h_bond_acceptor_count': 'HBondAcceptorCount', 'h_bond_donor_count': 'HBondDonorCount', 'heavy_atom_count': 'HeavyAtomCount', 'inchi': 'InChI', 'inchikey': 'InChIKey', 'isomeric_smiles': 'IsomericSMILES', 'isotope_atom_count': 'IsotopeAtomCount', 'iupac_name': 'IUPACName', 'molecular_formula': 'MolecularFormula', 'molecular_weight': 'MolecularWeight', 'monoisotopic_mass': 'MonoisotopicMass', 'rotatable_bond_count': 'RotatableBondCount', 'smiles': 'SMILES', 'tpsa': 'TPSA', 'undefined_atom_stereo_count': 'UndefinedAtomStereoCount', 'undefined_bond_stereo_count': 'UndefinedBondStereoCount', 'volume_3d': 'Volume3D', 'x_steric_quadrupole_3d': 'XStericQuadrupole3D', 'xlogp': 'XLogP', 'y_steric_quadrupole_3d': 'YStericQuadrupole3D', 'z_steric_quadrupole_3d': 'ZStericQuadrupole3D'}

Dictionary mapping property names to their PubChem API equivalents.

Allows properties to optionally be specified as underscore_separated, consistent with Compound attributes.

class pubchempy.CompoundIdType(*values)

Compound record type.

DEPOSITED = 0

Original Deposited Compound

STANDARDIZED = 1

Standardized Form of a Deposited Compound

COMPONENT = 2

Component of a Standardized Compound

NEUTRALIZED = 3

Neutralized Form of a Standardized Compound

MIXTURE = 4

Substance that is a component of a mixture

TAUTOMER = 5

Predicted Tautomer Form

IONIZED = 6

Predicted Ionized pKa Form

UNKNOWN = 255

Unknown Compound Type

class pubchempy.BondType(*values)

Bond Type Information.

SINGLE = 1

Single Bond

DOUBLE = 2

Double Bond

TRIPLE = 3

Triple Bond

QUADRUPLE = 4

Quadruple Bond

DATIVE = 5

Dative Bond

COMPLEX = 6

Complex Bond

IONIC = 7

Ionic Bond

UNKNOWN = 255

Unknown/Unspecified Connectivity

class pubchempy.CoordinateType(*values)

Coordinate Set Type Distinctions.

TWO_D = 1

2D Coordinates

THREE_D = 2

3D Coordinates (should also indicate units, below)

SUBMITTED = 3

Depositor Provided Coordinates

EXPERIMENTAL = 4

Experimentally Determined Coordinates

COMPUTED = 5

Computed Coordinates

STANDARDIZED = 6

Standardized Coordinates

AUGMENTED = 7

Hybrid Original with Computed Coordinates (e.g., explicit H)

ALIGNED = 8

Template used to align drawing

COMPACT = 9

Drawing uses shorthand forms (e.g., COOH, OCH3, Et, etc.)

UNITS_ANGSTROMS = 10

(3D) Coordinate units are Angstroms

UNITS_NANOMETERS = 11

(3D) Coordinate units are nanometers

UNITS_PIXEL = 12

(2D) Coordinate units are pixels

UNITS_POINTS = 13

(2D) Coordinate units are points

UNITS_STDBONDS = 14

(2D) Coordinate units are standard bond lengths (1.0)

UNITS_UNKNOWN = 255

Coordinate units are unknown or unspecified

class pubchempy.ProjectCategory(*values)

To distinguish projects funded through MLSCN, MLPCN or other.

MLSCN = 1

Assay depositions from MLSCN screen center

MLPCN = 2

Assay depositions from MLPCN screen center

MLSCN_AP = 3

Assay depositions from MLSCN assay provider

MLPCN_AP = 4

Assay depositions from MLPCN assay provider

JOURNAL_ARTICLE = 5

To be deprecated and replaced by options 7, 8 & 9

ASSAY_VENDOR = 6

Assay depositions from assay vendors

LITERATURE_EXTRACTED = 7

Data from literature, extracted by curators

LITERATURE_AUTHOR = 8

Data from literature, submitted by author of articles

LITERATURE_PUBLISHER = 9

Data from literature, submitted by journals/publishers

RNAIGI = 10

RNAi screenings from RNAi Global Initiative

OTHER = 255

Other project category

Exceptions

exception pubchempy.PubChemPyError

Bases: Exception

Base class for all PubChemPy exceptions.

exception pubchempy.ResponseParseError

Bases: PubChemPyError

PubChem response is uninterpretable.

exception pubchempy.PubChemHTTPError(code: int, msg: str, details: list[str])

Bases: PubChemPyError

Generic error class to handle HTTP error codes.

Initialize with HTTP status code, message, and additional details.

Parameters:
  • code – HTTP status code.

  • msg – Error message.

  • details – Additional error details from PubChem API.

exception pubchempy.BadRequestError(code: int, msg: str, details: list[str])

Bases: PubChemHTTPError

400: Request is improperly formed (e.g. syntax error in the URL or POST body).

Initialize with HTTP status code, message, and additional details.

Parameters:
  • code – HTTP status code.

  • msg – Error message.

  • details – Additional error details from PubChem API.

exception pubchempy.NotFoundError(code: int, msg: str, details: list[str])

Bases: PubChemHTTPError

404: The input record was not found (e.g. invalid CID).

Initialize with HTTP status code, message, and additional details.

Parameters:
  • code – HTTP status code.

  • msg – Error message.

  • details – Additional error details from PubChem API.

exception pubchempy.MethodNotAllowedError(code: int, msg: str, details: list[str])

Bases: PubChemHTTPError

405: Request not allowed (e.g. invalid MIME type in the HTTP Accept header).

Initialize with HTTP status code, message, and additional details.

Parameters:
  • code – HTTP status code.

  • msg – Error message.

  • details – Additional error details from PubChem API.

exception pubchempy.ServerError(code: int, msg: str, details: list[str])

Bases: PubChemHTTPError

500: Some problem on the server side (e.g. a database server down).

Initialize with HTTP status code, message, and additional details.

Parameters:
  • code – HTTP status code.

  • msg – Error message.

  • details – Additional error details from PubChem API.

exception pubchempy.UnimplementedError(code: int, msg: str, details: list[str])

Bases: PubChemHTTPError

501: The requested operation has not (yet) been implemented by the server.

Initialize with HTTP status code, message, and additional details.

Parameters:
  • code – HTTP status code.

  • msg – Error message.

  • details – Additional error details from PubChem API.

exception pubchempy.ServerBusyError(code: int, msg: str, details: list[str])

Bases: PubChemHTTPError

503: Too many requests or server is busy, retry later.

Initialize with HTTP status code, message, and additional details.

Parameters:
  • code – HTTP status code.

  • msg – Error message.

  • details – Additional error details from PubChem API.

exception pubchempy.TimeoutError(code: int, msg: str, details: list[str])

Bases: PubChemHTTPError

504: The request timed out, from server overload or too broad a request.

See Avoiding TimeoutError for more information.

Initialize with HTTP status code, message, and additional details.

Parameters:
  • code – HTTP status code.

  • msg – Error message.

  • details – Additional error details from PubChem API.

exception pubchempy.PubChemPyDeprecationWarning

Bases: Warning

Warning category for deprecated features.

Changes

  • As of v1.0.3, the atoms and bonds properties on Compounds now return lists of Atom and Bond objects, rather than dicts.

  • As of v1.0.2, search functions now return an empty list instead of raising a NotFoundError exception when no results are found. NotFoundError is still raised when attempting to create a Compound using the from_cid class method with an invalid CID.