Could ignoring octanol/water LogP correlation impact the overall effectiveness of pharmaceutical discovery?

Computational drug discovery began with Hansch, Leo, and Fujita [1] reporting that multiple regression applied to octanol/water log P and other parameters from physical organic chemistry yielded correlations with biological activities. Drug discovery chemists quickly confirmed these observations (usually by means of graph paper, for until around 1980 any computer within large pharma research would belong to the statistics department and communicate via punch cards and line printer). It proved that such QSARs based on octanol/water logP existed within currently interesting biological data sets more often than not, and occasionally could be successfully extrapolated [2].

Based on these experiences, when 3D-QSAR was originally commercialized, it had been intended that the steric and electrostatic fields would be accompanied by log P and molar refractivity (MR) columns. It was generally understood that ligand action depended on transport to the (then entirely conceptual) receptor as well as appropriately activating the receptor, and that logP somehow modeled ligand transport behavior. Furthermore one of the earliest uses of PCA in computational drug discovery [3] showed clearly that the differences among “orientation-averaged” ligand properties presumably responsible for differences in ligand transport were 95% summarized by two abstract parameters, and much more conveniently (with little information loss) by octanol/water log P and MR. So the intent was that the fields would model the obviously orientation-sensitive aspects of ligand-receptor binding and that logP and MR would handle transport, thus together addressing the most important sources of variation within an arbitrary set of biological activity data.

Very much to my surprise, in summary, adding the CLOGP/CMR columns had negligible influence on these already successful topomer CoMFA correlations. The average q^2 rose only from .547 to .579. Even more interesting, the CLOG/CMR columns by themselves had little correlative power, with less than half of the fifteen data sets yielding a q^2 above 0.2. The highest CLOGP/CMR q^2 of .422 was very likely caused by a high correlation between the CLOGP/CMR and the fields, because the associated regression coefficients had very unusual large negative values that revert to the expected slightly positive values when both fields and CLOG/CMR are used. (More details are given in a recently submitted manuscript.)

So it seems that, unlike the biological data sets of twenty years ago, today’s biological data sets include very little dependence on orientation-averaged ligand properties, as summarized by CLOGP/CMR. This transition seems quite natural once the changing nature of biological assays is recalled. When I joined SK&F in 1971, almost all biological assays except for some antimicrobial screening were performed on whole animals or organs. Today it is not uncommon to delay any animal testing of a compound until pre-clinical activities are being contemplated. There are many benefits from thus emphasizing in vitro over in vivo investigations, in particular a much lower cost per unit test and at least the promise of a much more precise understanding of underlying therapeutic mechanisms yielding better clinical therapies.

But of course the overall productivity of drug discovery research has been disappointing. Again, there were around 40,000 compounds in SK&F’s collection in 1971, after a vigorous effort to solicit appropriate screening samples from academic chemists (sample sizes of less than a gram being of little value with the assays then in use). However those 40,000 compounds included many therapeutically significant SK&F NCE’s, such as chlorpromazine, diazide, several cephalosporins, the earliest H^2-blockers, and amphetamine (having ID# 1). Today there are perhaps two million “drug-like” compounds among vendors’ catalogs but orders of magnitude less than (2 million / 40 thousand) NCE’s, summarized across several decades and the entire industry. Not good.

There are surely many reasons why productivity has so diminished. However it seems significant to me that today’s biological data sets, resulting entirely from in vitro tests, are showing so little influence from drug transport factors. Compounds which are very potent and so promising in vitro (especially with a little DMSO to hasten (well, enable) dissolution) must be drawing attention from less receptor-potent and -selective candidates that would be better transported. But drug transport surely matters therapeutically. So, fellow CADD specialists, let us not ignore logP and MR during the incessant candidate evaluation and selection as we seek more and better therapies!

Dick Cramer

1) C. Hansch, A. Leo, T. Fujita. J. Am. Chem. Soc. 1964, 86, 1616-1626.

2) Cramer, R.D., Snader, K.M., Willis, C.R., Chakrin, L.W., Thomas, J., and Sutton, B.M. Antiallergic Pyranenamines. J. Med. Chem. 1979, 22, 714-725.

3)Cramer, R.D. J.Am.Chem.Soc. 1980, 102, 1837-1849.

4)Cramer, R. D. J. Med. Chem. 2003, 46, 374-389. 

One Response to “Could ignoring octanol/water LogP correlation impact the overall effectiveness of pharmaceutical discovery?” Track this thread by RSS

  1. sbowlus Says:

    Hi, Dick,

    LogP (generally) makes a small contribution to CoMFA- like models, because that is all it should do. LogP is a bulk property, and has little to do with binding. Certainly one of my assumptions when building a 3D model is that binding affinity is the limiting factor in the overall QSAR. As in classical kinetic experiments, we can only model the slow (limiting) step. Adding a “transport” term to a binding-limited QSAR model just muddies the water. In most systems, the transfer from “binding-limited” regimes to “transport-limited” regimes is not smoothly continuous, and the contributions of various descriptors will change as we go across the descriptor space. For a simple linear QSAR, this means, at the least, that the function will be biphasic.

    Note also that in classical Hansch-like QSAR, the logp appears primarily in the form of the pi constant, which (although it has global ramifications) is treated primarily as a local effect. Coming from an ag background, where we had lots of in vivo data to model, making these distinctions was crucial, and I spent lots of time with my chemists puzzling over how to do this. One method that was very nice was Phil Magee’s “parameter focusing” approach to define the “interesting” parts of descriptor space.

    The apparent lack of productivity of modeling approaches to drug discovery is largely a failure of expectations, which can be traced more to organizational and political arguments than to technological failure. When I made the transition from research to software sales support (late ’90s), I was struck that most of the concerns raised by both chemists and managers were the same issues raised when I moved from the bench to the terminal in the early ’80s.

Leave Your Comments ...

You must be logged in to post a comment.