The application of AI in drug discovery relies on high-quality, reproducible training datasets. Traditional screening campaigns focus on identifying potent hits, but ML-driven drug discovery requires comprehensive potency evaluation across entire compound libraries.
The application of AI in drug discovery relies on high-quality, reproducible training datasets. Traditional screening campaigns focus on identifying potent hits, but ML-driven drug discovery requires comprehensive potency evaluation across entire compound libraries. Here, we introduce a partial concentration-response curve (pCRC) approach that estimates potency using just two data points per compound. We onboarded a panel of 65 diverse kinases and screened 7000 compounds against the panel at ATP concentrations near Kₘ to minimize modality bias, achieving a mean robust Z’ of 0.74 across all targets. A direct comparison of 100 fragments tested in both 2-point pCRC and conventional 11-point CRC formats demonstrated excellent correlation, confirming that our pCRC methodology produces high-quality data suitable for ML model training. The integration of our automation platform, including SPT Labtech’s dragonfly® discovery, with automated data pipelines enabled the generation of 221,000 high-quality ML-ready data points per day, accelerating the development of foundation model training for drug discovery.
To access the full webinar, complete the form on the left.