MolVS: Molecule Validation and Standardization

MolVS is a molecule validation and standardization tool, written in Python using the RDKit chemistry framework.

Building a collection of chemical structures from different sources can be difficult due to differing representations, drawing conventions and mistakes. MolVS can standardize chemical structures to improve data quality, help with de-duplication and identify relationships between molecules.

There are sensible defaults that make it easy to get started:

>>> from molvs import standardize_smiles
>>> standardize_smiles('[Na]OC(=O)c1ccc(C[S+2]([O-])([O-]))cc1')
'[Na+].O=C([O-])c1ccc(CS(=O)=O)cc1'

Each standardization module is also available separately, allowing the development of custom standardization processes.

Features

  • Normalization of functional groups to a consistent format.
  • Recombination of separated charges.
  • Breaking of bonds to metal atoms.
  • Competitive reionization to ensure strongest acids ionize first in partially ionize molecules.
  • Tautomer enumeration and canonicalization.
  • Neutralization of charges.
  • Standardization or removal of stereochemistry information.
  • Filtering of salt and solvent fragments.
  • Generation of fragment, isotope, charge, tautomer or stereochemistry insensitive parent structures.
  • Validations to identify molecules with unusual and potentially troublesome characteristics.

API documentation

Comprehensive API documentation with information on every function, class and method. This is automatically generated from the MolVS source code and comments.