Creating effective and stable drug formulations shouldn’t be guesswork. It’s deeply rooted in science. Yet, many formulators still rely on labor-intensive experimental trial-and-error approaches to probe solubilities and mitigate crystallization risks. Worse, without a deep understanding of why a formulation works, scientists can’t predict how changes might affect its bioavailability, shelf life, or manufacturability.
So, how do you move from uncertainty to true understanding?
amofor’s answer lies in physical simulations. With just five solubility points, you can map your molecule’s unique “intermolecular interaction landscape” – a fingerprint that reveals how it bonds, repels, or aligns with solvents, polymers, and excipients. It guides you to craft a clear experimental roadmap.
In this blog post, we’ll explore how physical simulations turn minimal data into maximum insight and why this approach is a game-changer for formulation scientists.
From Physical Simulation to Molecular Fingerprints
Modern pharmaceutical R&D generates vast amounts of data: High-throughput experiments produce solubility data, crystallization data, stability data, and more. With the emergence of “big data” approaches,” scientists now aim to systematically analyze all these data points to extract meaningful patterns for future formulation projects. But more data alone isn’t enough. Data must be of high integrity, physically meaningful, and rigorously validated. The challenge? Extracting meaningful insights without drowning in noise.
This is where physics-based models like PC-SAFT (Perturbed-Chain Statistical Associating Fluid Theory)offer a distinct advantage. Unlike many black box-like approaches, PC-SAFT uses established thermodynamic principles to decode molecular interactions – hydrogen bonding, polarity, and Van der Waals forces – into quantifiable parameters. It predicts how molecules will behave across temperatures, pressures, and solvent blends. This drastically reduces the data required to achieve robust predictions. Why does this matter? When a new API (Active Pharmaceutical Ingredient) enters this intermolecular interaction landscape, its solubility behavior depends on how it aligns with various solvents or excipients. This interplay influences every stage of drug development, from purification and crystallization to spray drying, tablet compression, and final stability.
With a robust in silico model in place, the next hurdle is practical: translating this complexity into actionable parameters. That’s where the five solubility points approach comes into play.
The 5-Point Solubility Method
At the heart of this approach is a simple yet powerful idea: solubilities in organic solvents encode all the physicochemical information needed to understand an API’s intermolecular interactions.
Measuring five solubility points, enables you to:
- Capture API’s unique “fingerprint”
- Accurately predict its behavior across countless conditions
To build this model, the amofor team assembled an extensive dataset of solvent solubilities from public sources and internal measurements on generic compounds. Each data point was carefully scrutinized for physical meaning to prevent the inaccuracy from artefacts (e.g., avoiding data where polymorphism or solvate formation played a role). The result is a universal database of over 2,500 solubility points across 150 APIs (small molecules) and 60 solvents – an invaluable resource for training the PC-SAFT model. The power of this PC-SAFT method lies in the large-scale datasets it relies upon. By using solubility data from a wide range of solvents, we created a “network” of interaction points that can be used to derive the individual molecular forces at play. This network approach allows for the extraction of accurate interaction strengths, even for substances with limited experimental data.
The process begins by adapting known solubility data for a series of known compounds in various solvents. These solubility values are then used to generate PC-SAFT parameters considering molecular forces such as Van der Waals forces, hydrogen bonding, and dipole interactions. By doing so, the strength of these individual forces can be directly quantified for each new substance based on the data network. As a result, predictions of solubility and molecular behavior are significantly more reliable.
As an example, the following diagram illustrates the parameter fit of celecoxib, a typical model compound with poor water solubility and bioavailability and how it aligns with the other data from our database.
We are now immediately able to quantify and judge the quality of the PC-SAFT parameters based on the training data and check if the model quality is sufficient for the following predictions.
Why five points? Each solubility measurement anchors a specific aspect of the API’s interaction landscape – polarity, hydrogen bonding, Van der Waals forces or mixed-solvent synergies. Together, these points provide a comprehensive picture, enabling the model to predict solubilities with an average relative deviation of just 23-24%, far outperforming other approaches.
“We’re cutting time and cost by focusing on the science behind the data“
Christian Lübbert, CEO of amofor
Why This Method Surpasses Traditional Structure-Based Models
When compared to models that rely on the chemical structure of a molecule, our data-driven PC-SAFT models offer several significant advantages. Traditional structure-based models attempt to predict molecular behavior by breaking down the chemical structure into functional groups, which are then assigned typical interaction strengths. While this approach can work for simpler molecules, it often encounters significant issues when dealing with larger, more complex molecules.
In complex molecules, the contributions from individual functional groups can deviate significantly from structure-related predictions due to steric hindrance or unusual electronic effects. For example, in large molecules with bulky groups, the predicted interactions based on group contributions may no longer accurately reflect the actual behavior of the molecule in solution. These steric and electronic effects can cause unexpected shifts in molecular behavior, making structure-based predictions unreliable for the larger molecules developed in the industry at the moment.
On the other hand, data-driven PC-SAFT models account for these complexities by analyzing the behavior of molecules across a broad range of solvents and conditions. The interactions are derived directly from experimental solubility data, ensuring that factors like steric hindrance and electronic effects are implicitly incorporated into the model. As a result, this model is better equipped to handle complex molecules where traditional approaches fall short.
Practical Applications for Formulation Scientists
1. Solvent Screening in Days, Not Weeks
Need the best solvent or solvent mix for a manufacturing process? Measure five solubility points in diverse solvents. PC-SAFT calculates API parameters (size, shape, polarity, hydrogen-bonding capabilities) and predicts solubility in untested solvents or mixtures.
2. Data Integrity and Outlier Detection
An often-overlooked advantage of amofor’s solubility prediction software is the ability to detect questionable experimental data. If a measured solubility point lies far outside the predictions generated by well-validated model parameters, it can flag experimental errors (e.g., inaccurate measurement or mislabeling) or unexpected behavior (e.g., different crystal forms). Thus, even as you fit these five data points, you’re also cross-checking the validity of your experiments. This built-in quality control helps scientists pinpoint flawed data before it disrupts the entire development process.
3. A Tailored Solution for Every Company
With our SOLCALC software, companies can unlock their internal data treasures by combining it with the PC-SAFT framework to build a proprietary company-specific model that evolves with your pipeline.
Implementation Steps:
- Import experimental solubility data: Even large datasets can be handled efficiently by SOLCALC.
- Train your model: The software calibrates parameters to your proprietary APIs and workflows.
- Predict with precision: Screen solvents, optimize blends, or troubleshoot stability for your molecules.
Large pharma companies with extensive internal databases find this especially valuable to reduce the need for additional experimentation and transform their data from a static archive into a dynamic, predictive asset.
An invitation to connect with us
Combining targeted experimentation with rigorous physical modeling marks a transformative shift in drug formulation practices. Many pieces discuss the value of big data or machine learning in drug formulation, but few propose such a streamlined path to achieving accurate predictions. This blend of conciseness (fewer experiments) and robustness (physics-based modeling) not only distinguishes this approach but also underscores its potential to reshape current drug formulation approaches.
More questions about our forecast model and its applications?
Contact our experts and schedule a free demo to see how your formulation team can adopt this method!