Webinar
Generating Bespoke Datasets for Foundation Models in AI Drug Discovery

The application of AI in drug discovery relies on high-quality, reproducible training datasets. Traditional screening campaigns focus on identifying potent hits, but ML-driven drug discovery requires comprehensive potency evaluation across entire compound libraries.

Complete the form to view the webinar
About this webinar

The application of AI in drug discovery relies on high-quality, reproducible training datasets. Traditional screening campaigns focus on identifying potent hits, but ML-driven drug discovery requires comprehensive potency evaluation across entire compound libraries. Here, we introduce a partial concentration-response curve (pCRC) approach that estimates potency using just two data points per compound. We onboarded a panel of 65 diverse kinases and screened 7000 compounds against the panel at ATP concentrations near Kₘ to minimize modality bias, achieving a mean robust Z’ of 0.74 across all targets. A direct comparison of 100 fragments tested in both 2-point pCRC and conventional 11-point CRC formats demonstrated excellent correlation, confirming that our pCRC methodology produces high-quality data suitable for ML model training. The integration of our automation platform, including SPT Labtech’s dragonfly® discovery, with automated data pipelines enabled the generation of 221,000 high-quality ML-ready data points per day, accelerating the development of foundation model training for drug discovery.

Key learning objectives:

  • Recognize the importance of robust data for AI model training in drug discovery.
  • Understand how automation is critical for data quality and throughput to enable AI / ML model training in drug discovery
  • Learn about the partial concentration response curve (pCRC) approach and its benefits.
  • Explore how their automation platform speeds up the development of foundation model training in drug discovery.

To access the full webinar, complete the form on the left.