MolVS: Molecule Validation and Standardization¶
MolVS is a molecule validation and standardization tool, written in Python using the RDKit chemistry framework.
Building a collection of chemical structures from different sources can be difficult due to differing representations, drawing conventions and mistakes. MolVS can standardize chemical structures to improve data quality, help with de-duplication and identify relationships between molecules.
There are sensible defaults that make it easy to get started:
>>> from molvs import standardize_smiles >>> standardize_smiles('[Na]OC(=O)c1ccc(C[S+2]([O-])([O-]))cc1') '[Na+].O=C([O-])c1ccc(CS(=O)=O)cc1'
Each standardization module is also available separately, allowing the development of custom standardization processes.
- Normalization of functional groups to a consistent format.
- Recombination of separated charges.
- Breaking of bonds to metal atoms.
- Competitive reionization to ensure strongest acids ionize first in partially ionize molecules.
- Tautomer enumeration and canonicalization.
- Neutralization of charges.
- Standardization or removal of stereochemistry information.
- Filtering of salt and solvent fragments.
- Generation of fragment, isotope, charge, tautomer or stereochemistry insensitive parent structures.
- Validations to identify molecules with unusual and potentially troublesome characteristics.
A step-by-step guide to getting started with MolVS.
Comprehensive API documentation with information on every function, class and method. This is automatically generated from the MolVS source code and comments.