Modern virtual screening has expanded to billions of synthetically accessible molecules, but advances in scale have outpaced improvements in accuracy. Docking remains a core bottleneck: pose prediction and binding-affinity estimation still struggle with generalization, especially for out-of-distribution targets and chemotypes that matter most in real discovery campaigns. At Deep Origin, we are developing physics-informed machine-learning models that close this gap by unifying structural signals, energetic priors, and large-scale biochemical data.
Here, we present our latest results demonstrating substantial gains in pose accuracy, affinity rank-ordering, and early enrichment across retrospective benchmarks, including challenging similarity-filtered splits meant to penalize memorization. In parallel, prospective screening efforts against difficult targets such as CD73 show high hit rates and broad chemotype diversity, validating our models’ real-world predictive power. These capabilities sit within an end-to-end computational discovery pipeline—spanning docking, property prediction, virtual screening, reinforcement learning, and molecular dynamics—available both through a Python API and Balto, our natural-language interface for molecular modeling.
Together, these advances illustrate how physics-informed ML can unlock new chemical space and accelerate the discovery of small-molecule therapeutics.
Originally presented at Drug Discovery Chemistry 2025 (Barcelona).