Daphne Koller on Drug Discovery and AI

This transcript is generated with the help of AI and is lightly edited for clarity.

DAPHNE KOLLER:

15 years ago, you had breast cancer. Now you no longer have breast cancer because it’s not one disease. You might have a BRCA positive breast cancer, or you might have a HER2+ breast cancer, or a triple-negative. And each of those has a different therapeutic intervention that has much greater efficacy because it ties back to some underlying core biology — which is different in different people. And you can only do that with a lot of very granular data. And the only way to interpret that granular data is with the tools of AI because the human mind just simply cannot encompass that complexity.

REID:

Hi, I’m Reid Hoffman.

ARIA:

And I’m Aria Finger.

REID:

We want to know what happens, if in the future, everything breaks humanity’s way. What we can possibly get right if we leverage technology like AI and our collective effort effectively.

ARIA:

We’re speaking with technologists, ambitious builders, and deep thinkers across many fields—AI, geopolitics, media, healthcare, education, and more.

REID:

These conversations showcase another kind of guest. Whether it’s Inflection’s Pi or OpenAI’s GPT-4 or other AI tools, each episode we use AI to enhance and advance our discussion.

ARIA:

In each episode, we seek out the brightest version of the future and learn what it’ll take to get there.

REID:

This is Possible.

ARIA:

Today we want to hone in on the role of AI, machine learning, and the future of drug discovery.

REID:

The role that AI can play in drug discovery will radically shift how we research, test, and deliver new medicines. We are already seeing progress in how machine learning can improve drug discovery. Pharmaceutical companies are teaming up with AI firms or creating in-house AI technology to improve the research, speed, and accuracy of drug development. But it is people who are both fluent in biology and AI that are really making this happen. Our guest today fits that bill. She’s at the center of the Venn diagram of machine learning and biomedicine.

ARIA:

Daphne Koller is a renowned academic, computer scientist, and entrepreneur. After serving 18 years at Stanford University as the Rajeev Motwani Professor of computer science, she pivoted to the start-up space. In 2012, she co-founded and served as co-CEO of Coursera. These days, she is focusing on her newest venture, insitro, a machine learning-driven drug discovery company. Koller’s accomplishments have earned her numerous accolades, including the MacArthur Foundation “Genius Grant,” and memberships in the National Academy of Engineering and the American Academy of Arts and Sciences.

REID:

We loved getting into the weeds with her around large language models, machine learning, and applying that to biomedicine for a healthier future. Here’s our conversation with Daphne Koller.

REID:

So you witnessed a few cycles of AI and machine learning. What did the conversations about AI look like back in your earliest memory and engagement? And were there any prophecies that have actually been fulfilled in this most recent cycle?

DAPHNE KOLLER:

You know, I’ve been in this field for a long time and I’ve lived through several AI winters and actually graduated in the course of an AI winter. When I graduated with a PhD in a field that wasn’t really AI at the time, but is now all of AI — which is the field of machine learning — you weren’t allowed to say you were doing AI. It was kind of like fringe. It was what my teenage daughter says: sus. You know, you could say we’re doing cognitive computing or statistical learning theory or some weird thing like that. But AI was completely fringe. And that was because in 1980s there was this big boom and everyone was like, “AI is going to solve everything. In five years, we’re going to have intelligent agents.” This is 1985, mind you. So obviously it didn’t come true in five years. And so that has made me a little bit wary of making predictions. I tend to like to under promise and over deliver. I’ve come around to this point of view — which I think is due to Bill Gates, although there’s many authors who claim credit for this — which is that people tend to overestimate the value of any technology in the two year timeframe and underestimate it in the 10 year timeframe. And I think it’s important to keep that in mind as we make hyperbolic promises.

ARIA:

So Daphne, my oldest son calls me sus constantly, so you and I have that in common [laughs]. But so you know, you were talking about AI and how it’s gone through sort of AI winters and then hype cycles. But of course today we’re not only talking about AI, we’re talking about drug discovery too. And I would love for you to orient our listeners and us. For those who don’t know, what was the typical drug discovery and development process like for the last, you know, four to five years and is it markedly different today?

DAPHNE KOLLER:

So maybe we should start by orienting all of the listeners to what drug discovery and development is like — what are the main steps in the process? And then we can talk about places where there has been improvements and places where there hasn’t. So the beginning of the process begins with a biological insight — a therapeutic hypothesis. It looks something like if we modulate this target in the context of this patient population, then the following benefits might accrue as a patient. So that is the beginning — that’s the first third, if you will. And then that moves into, well, okay, so we want to then modulate this target. How do we do that? What is the right chemical matter, the right drug, the right substance to actually affect that intervention? And that gives you a compound — a molecule — which you put into a patient. And then the third part is now we do clinical development where you try out the drug in patients and hope that it actually achieves the thing that we set out to achieve.

DAPHNE KOLLER:

The further you get into this process, the more expensive it gets, with the clinical development being by far the most expensive piece and also the part where most of the failures come up — to the point that depending on which estimate you believe, that last piece from the molecules that enter that last stage of clinical development, the molecules that emerge with a regulatory success — that success rate is five to 10%. So, and people often attribute that to, well, that last part is where failures happen, but that’s not true. Where failures really happen is at the beginning of the process because most of the things that fail in clinical development is because our therapeutic hypothesis was just wrong. We don’t understand the biology; we go after things that just have no meaningful impact on the disease. And that sure enough, it manifests in that last stage — whereas, which is when we do the experiment in humans — but the failure is attributable to things that happened a lot earlier in the process. So I think that’s just an important thing to sort of understand is that phenomenon that is driving the ridiculous costs that we have in the drug discovery and development process where a fully loaded program costs, I think at this point, north of $2.6 billion. The reason isn’t that that end-to-end process costs $2.6 billion, it’s that when you attribute to a program that’s successful, the cost of all of the ones that failed — that is what actually creates a ridiculous price point.

ARIA:

Daphne, thank you. That context is really helpful and wild. I mean, when you think about how common it is to find out when you’re millions, even billions of dollars into this process that a particular therapeutic hypothesis is simply wrong. I mean, wow. So moving on. Can you tell us about the impact of AI today and how it’s changing our understanding of biology as well as the drug development process?

DAPHNE KOLLER:

So first of all, one of the things that changed is our ability to generate chemical matter. That middle piece is drastically changing. There is an increase in scalable technologies to screen large numbers of molecules against a target. To vet them relative to different assays. And also increasingly so — although early stages — to use AI to design molecules with a particular purpose in mind. I think that is still at early stages. We can’t say that there has been a drug that has benefited from that technology and has been approved, but I think there is an increasing belief that AI technologies with the right data set fed into them can really accelerate that middle piece of the process — taking a target and turning it into a drug. The success of AlphaFold has certainly contributed to that goal.

Pi:

Hi, I’m Pi. Allow me to offer some context on that subject. AlphaFold, the AI software developed by DeepMind and Google, has garnered widespread acclaim for its unparalleled accuracy in predicting protein structures based solely on amino acid sequences. The breakthrough marks a pivotal moment in computational biology.

DAPHNE KOLLER:

I mean, AlphaFold was a real tour-de-force in AI as applied to biology in general, and drug discovery in particular, because it took a problem that was recognized by everybody as being really hard. And where many people had beaten their head against that particular wall with very limited success and all of a sudden came an AI technology that really transformed that landscape. If you can understand the structure of a protein, you can start to think about how different molecules might entrap to that protein and, and create a chemical matter that achieves the right effect. And I think that’s really causing a major shift in that landscape. But I would say, only at the beginnings of that journey.

ARIA:

And how about other areas of the drug discovery process that are rife with potential to leverage AI?

DAPHNE KOLLER:

Where I think there’s been considerably less progress is on the development of new therapeutic hypotheses that leverage AI or, or in general technologies to really uncover novel biological insights that provide us with something that will truly be a disease modifying intervention. And that’s been hard to push forward because I think the — while there’s technologies that allow us to measure biology at increasing scale and increasing fidelity, how does one interrogate those data? How does one extract meaningful insights from those data and push those forward into drug discovery programs? — I think has been an area that has been developing more slowly than something that has a much clearer, better defined outcome, which is, “I have a target, I want to do something that modulates it.” That’s fairly well defined. “I want to come up with a novel insight in biology.” What does that even mean? And how do you know when you’ve succeeded? And I think that’s been one of the challenges in getting people to take on something that is a much harder and if you will, squishier problem.

REID:

So where do you see the principal supercharging that’s coming from ML? Like where do you see the accelerations? And which accelerations, you know, society should be counting on and which do you think are kind of fictional in current discourse?

DAPHNE KOLLER:

So in terms of things that I think are very likely to work and we’re already on the path, I would say the taking of a target and turning it into chemical matter — that is already on the path. We’ve seen tremendous successes in the protein space. Proteins…the language of proteins is very similar to natural language. And so the same AI methods work pretty well there, but maybe equally or even more importantly, the amount of data that one can collect in protein space is much, much greater than what we have in almost any other application. The biggest, I think, opportunity, however, lies in the space of the uncovering of new therapeutic hypotheses. And this is where we’re benefiting from an incredible opportunity, which is that similar, parallel to the transformation that we’ve seen in AI, we’ve seen a parallel transformation in the collection of biological data across multiple modalities.

DAPHNE KOLLER:

When I started working in this field 25 years ago, experiments were incredibly hard and you did them all in yeast or in drosophila or in worms. So their translatability to humans was modest to non-existent, and the experiments were excruciatingly long and painful. In the last 10 years, there has been an explosion in the development of new tools for basically biological data creation. And that involves — that includes things like the ability to generate cells of any lineage in our body, whether it’s a neuron or a cardiomyocyte or a hepatocyte or a macrophage. We can take a cell from you, Reid, or from me and generate a cell of that type that carries our genetics and hence our propensity to disease. And we can start to observe how different genetics manifest in different cellular characteristics. We can edit those cells using CRISPR and create an even further propensity to disease or maybe a lower propensity to disease.

DAPHNE KOLLER:

And again, judge what that translates into at the cellular level. We can measure cells in ways that we would never have anticipated. We can look at a single cell and assess its activity level across every gene in the genome. Increasingly we can image it at subcellular resolution and see exactly where proteins localize in the cell, how they’re moving in real time. All of these are ways of looking at biology that we would never have envisioned as possible as recently as 10 years ago. So that creates the opportunity to measure biology at scale and that is then input into these sophisticated AI methods that need all this data to feed on, to then interrogate what interventions in biology do to the underlying biological system, which of them cause disease and which of them revert disease. And I think that is an incredible opportunity that is only now being unlocked.

DAPHNE KOLLER:

Now, we need to recognize that that opportunity is going to take time before it manifests in therapeutic interventions, simply because the timelines to — going from an insight to a drug are only compressible up to a point. Because ultimately you can — sure, you can generate hypotheses faster, you can even generate drugs faster, but you can’t make human biology move faster than it does. So when you put the drug into a human, if it takes five years for cognitive decline to manifest in the current ways in which we read it, you can’t abbreviate that beyond a certain point. So you need to be very thoughtful, very deliberate in how you engage in some of those experiments in humans, which are an absolutely necessary part of the drug development process.

DAPHNE KOLLER:

So now coming to the last part of your question of, you know, what is science fiction? There’s two types of predictions that I think I would really be very cautious about making, which is, there are predictions about, “Oh, we’re going to have a hundred approved drugs in three years.” That just — the field just doesn’t work that way. There’s timelines to how long things take because they should take that long because of the need to be really careful and thoughtful before you do experiments in people. The other one that I would say is a little bit hyperbolic is this notion of a fully end-to-end AI discovered drug. I don’t even know what that means. I tell the computer that I’m working on ALS and out comes what? A molecule with a clinical development plan without any human intervention in the middle? I just don’t even understand how you would even define that. Everything therefore becomes a continuum between how much the computer did and how much the human did, which I think is as it should be. And you will always have a person helping to prioritize, helping to guide, helping to shape, maybe not always, but for a long time. That will be a critical part of how we discover and develop medicines.

REID:

That is an awesome answer. And before Aria goes into any follow, I want to do at least one quick follow-up — which is, you know, part of what you kind of gestured at is a combination of, or what we’re seeing with, you know, large language models and the prediction of tokens and all the rest is that there’s a language parallel. And that part of what we see within the protein structure in biology is certain language parallels which allows, you know, kind of development there. One of the other things that you and I have both seen in the general LLM world is this whole discussion of synthetic data and other ways of generating language data. What do you see about the possibilities of that within the biology space? Because obviously if we could generate interesting synthetic data across a variety of fronts that would accelerate our use of AI even in the therapeutic hypothesis.

DAPHNE KOLLER:

So that’s a really great question, Reid. I would point out that this notion of synthetic data for the purpose of training AI models is something that a lot of people are suspicious about and rightly so, because if you’re generating something entirely from nothing and then using that to train the model that generated it, you’re really just reinforcing what the model already knew. And I think there’s a lot of people who worry about that. Where I think there is an opportunity both in language as well as in biology, is in taking data points where you have some measurements and enhancing them. So you’re not doing something entirely from hallucination but are taking data that you get and creating from data that you don’t have. An example of that that we’ve used in our own work in the biology space has to do again with multimodal data and taking measurements that were collected, say from cells or from human individuals across one data modality. And then using AI to generate additional data modalities that were not measured — maybe couldn’t even be measured simultaneously with the other data or at least not at scale — and using that for training.

DAPHNE KOLLER:

So I’ll give a number of examples where we’ve used that to make this concrete. So one of the most abundant data modalities in clinical care is histopathology. When you have a suspected cancer, you go, you get a biopsy, that biopsy image is stained, and then you put that under a microscope and then you get to see what, what’s in the sample — where are there tumor cells and immune cells, and so on and so forth. That is called an H&E slide and it is used as part of not only oncology, but multiple other ways in which tissues taken out of the body and pathologists look at it. We have been able to show that you can take one of those slides and read off of it entire transcriptional profiles with, you know, fairly reasonable accuracy, which means you can look at that sample and say, “Here in this sample are the genes that are active and the ones that are not active and at what levels.”

DAPHNE KOLLER:

So these are actually quantitative measurements. We can absolutely use that to train a machine learning model on which you can understand how different cancers expressed themselves. And you can do it in a very, very scalable way because you have these so-called H&E histopathology images for hundreds of thousands of patients, millions of patients perhaps. You don’t have these gene expression profiles for millions of patients, but you can use the imputed, the generated, the synthetic gene expression profiles that are generated from the histopathology to train the AI models. This is just one example of this sort of notion of generating data for the purpose of AI, but you’re generating in a way that is anchored to some kind of underlying reality as opposed to just generating data from thin air and then hoping that it captures the underlying ground truth.

REID:

Yeah, makes total sense and agree. Where do you think the most interesting AI implications will come on the therapeutic hypothesis, like the ability to kind of say this might be something that could be interestingly effective and we should look at this a little bit more?

DAPHNE KOLLER:

I love that question. I would say that perhaps the biggest impact would be from frankly a complete redefinition of our taxonomy of disease. Right now we define disease using very coarse-grained symptomatology, some of which dates back over a century. And you say someone has Alzheimer’s disease. That Alzheimer’s disease diagnosis is based on the fact that they’re acting demented and sometimes there’s some molecular measurements associated with that. But as of this point, not broadly used as part of clinical care. It is almost certainly the case that Alzheimer’s disease is not one disease. There’s multiple biologies that give rise to the disease that are converging on ultimately the, you know, death of neurons as a consequence. But the actual biology that drives that is often different in different people. As long as we define disease in this way, we are going to have to resort to a lowest common denominator in many cases as the way of intervening in that disease.

DAPHNE KOLLER:

And I’ll give an example from a space where that transformation is already well underway, which is oncology. 15 years ago you had breast cancer, now you no longer have breast cancer because it’s not one disease. You might have a BRCA positive breast cancer, you might have a HR+ breast cancer, or a triple-negative. And each of those has a different therapeutic intervention that has much greater efficacy because it ties back to some underlying core biology, which is different in different people. That transformation is what moved us away from the lowest common denominator, which is chemotherapy to something that is much more effective, even if targeted in a smaller patient population. We need to do that for Alzheimer’s disease, we need to do that for diabetes, for cardiovascular disease. And the more precise we get, the more targeted and high efficacy will be our therapeutic interventions. And you can only do that with a lot of very granular data. And the only way to interpret that granular data is with the tools of AI, because the human mind just simply cannot encompass that complexity.

ARIA:

So I’m sure actually a lot of people listening who don’t have personal experience with breast cancer might not even know that — might not even know that even for that disease that there’s actually a myriad of things going on and that we’re not treating it the same way. And I feel like the promise in so much of this has been personalized medicine. Like are we going to be, you know, creating drugs not for groups of people, but for individual people. And you talked about sort of this explosion over the last 10 years, and so, how close are we to narrowing that gap? Is that still science fiction and years and years away, or is that something that AI can help us with?

DAPHNE KOLLER:

So I think it’s important to distinguish personalized medicine from precision medicine. Precision medicine identifies underlying biologies that are much more well-defined than the coarse-grain symptomatology, and devises therapeutic interventions for those. And arguably the first of those is actually a breast cancer drug called Herceptin, which targets HER2+ breast cancer. Now, HER2+ positive breast cancer is not a personalized medicine. A lot of women have HER2+ positive breast cancer. But what we now understand is that that is a category and we can give that to an individual because of our ability to diagnose the fact that this is the subtype of breast cancer that they have. So I think that is a much more feasible and reachable goal in other therapeutic areas as well, because you can narrow in on specific biologies and you can create drugs for a group of patients.

DAPHNE KOLLER:

I wouldn’t call it personalized, which often means N=1 medicine, because I think that’s still a journey. There’s only been one or two N=1 medicines and they’ve been incredibly expensive, incredibly complex, and have only been able to intervene in situations where there is a very well-defined one gene, genetic lesion, that has driven the disease that you’re looking to basically revert that one thing. Most diseases are not that — they are the combination of a complex configuration of factors that interact with each other. We’re not going to — I think anytime soon — create N=1 drugs for that type of complex disease. Where I think personalization does come in and I think that’s where the field is going, is in combinations. And having on an individualized basis saying, “Of the slate that I have of 15 cancer drugs or you know, over time maybe 15 Alzheimer’s drugs, which are the three that I will give to you, which might be a different three that I would give to somebody else?”

DAPHNE KOLLER:

And that I think is something that is feasible by an appropriate design of therapeutics that where the mechanism is understood and the drug is engineered to that mechanism, combined with an appropriate understanding of the biology that is driving a disease in a particular individual, so that I can understand that it is this particular combination that’s going to help this, this patient. And so all of that requires a disentangling of the complexity of human biology and human disease, which is where I think honestly the biggest opportunity in drug discovery and development comes — is in that disentanglement of human biology.

REID:

So one of the things that’s obviously you’ve already gestured that, but I’m curious, curious about is — when you get to this kind of combine potentially these three therapeutic approaches in a kind of personalized way to this particular person and their particular challenge, which is more tracked by underlying genetics physiology, you know, like kind of their — what’s going on with them versus the pathology of the exhibited symptoms, you know, Alzheimer’s, et cetera. How is it that we’re going to close the data feedback loop for that? Is that something we’re doing in ML? Is that something we’re still doing by kind of classic human intervention, some combination of — what’s the loop look like there?

DAPHNE KOLLER:

I think that’s a great question and has a lot to do with how we transform our healthcare delivery system because right now, a woefully limited amount of data is collected from individual patients. And what is even worse, the data that are collected from individual patients is often not recorded in its full form. There’s often just like an aggregate, typically subjective summary by a clinician that is recorded in someone’s healthcare record. And so we don’t even have the data based on which we can make that determination of what is the right therapeutic intervention for a person. So we really need to transform the way in which we collect data from patients and the way in which we store that information and make it accessible to sophisticated algorithmic methods. On the data collection side, I think that technology just keeps getting better and better. MRIs are now getting increasingly cheaper and more mobile.

DAPHNE KOLLER:

There’s portable MRIs that are now being generated. The quality’s getting better, it’s getting increasingly easier to collect high quality data from patients. It’s just that the healthcare system hasn’t appreciated the value of these data and hence integrated those instruments into clinical care. So I think that is a transformation that will have to happen. And ironically, I would say I worry that it’s actually more likely to happen in countries outside the U.S. As long as we’re in a world where you get paid on a per procedure basis, there’s just so little incentive for a care delivery system to try and really be thoughtful about collecting enough data so we can give the exact right treatment to a patient, because there’s nothing in your incentives that makes that the desirable outcome. I’m not saying, I mean everyone in the system where a lot of people are really well meaning and they want to do the right thing by patients, but as long as you can’t reimburse a diagnostic procedure that’s going to best identify the right therapeutic intervention, there’s — no one’s going to do it because no one’s going to pay for it. And so I think that actually what’s more likely that these data-driven approaches will first be implemented in countries where there is an agreement of incentives between delivery of care and payment for that delivery — where there is a better kind of alignment of what is best for patients and what is best for the system, which is a sad statement.

ARIA:

I mean, that makes me so depressed. [Laughs]. I’m just like, imagine if we could capture all of this data, like you were saying, this data can springboard us to so many advancements and—

DAPHNE KOLLER:

But no one will pay for it right now.

ARIA:

All right, we got to work on that. [Laughs]. So you’ve, you’ve named so many things that you’re excited about, whether it’s even just, you know, the improvement in the last 15 years in mammography. Are there certain advancements that you are sort of most excited about that might not have gotten enough public attention? Or like what are we going to be seeing that you’re so excited about that is going to come over the next few years?

DAPHNE KOLLER:

So I think that the collection of increasingly higher content and more quantitative data from human patients — which is becoming increasingly prevalent — and the alignment of that with underlying molecular causes of disease — whether it be genetics on the one hand or different types of omic modalities on the other — are going to allow us to really map those causal pathways between, “This is the pathogenesis of a disease as it starts with the genetics goes through molecular changes in the body and ends up in these much more organism level changes.” And, “You can’t fix what you don’t understand,” I think, is really the fundamental statement here, and you can’t understand what you don’t measure. So it starts from measuring, it moves through understanding, and then that’s going to give rise to fixing. And I think that is the transformation that we’re going to see in the next 15 to 20 years that’s going to give us a completely new family of therapeutic interventions that are effectively designed as opposed to by happenstance, happen to be efficacious.

REID:

So one of the things, you know, I’ve had the delight of talking to you over a number of years is — I know you’re a systems thinker, you know, obviously a rigorous thinker. So what do you think we need to do to change the economic incentives for this collective data collection to be sharing the on target stuff that will allow the whole field to advance? You know, I’m sure you’ve thought about, “Well, look, if I could wave a wand or if X would happen, this would be great.” And so that’s essentially the question I’m gesturing at.

DAPHNE KOLLER:

There’s many things I would do with a wand, which I think are not feasible. So I’m going to scope it down to something that I think is more feasible, which is I think even within the U.S., there are pockets of alignments of incentives where one could I think demonstrate the value of collecting data so as to inform patient care. And I think by doing that you could potentially actually demonstrate in a relatively short timeframe, relatively short being — short in, you know, what is the life cycle of human health — which is the ability, for example, to appropriately select the right therapeutic intervention from among the ones that exist towards a better patient outcome. And if you have enough alignment of incentives so that you could demonstrate that by doing so you abbreviate, you cut short what is often a multi-year patient journey to find the right drug and they become healthier, and as such need less healthcare down the line, you could demonstrate that in a matter of I think a few short years. And that will create the right momentum around increased collection of data around these patients that would then drive the longer term ability to interrogate those data and come up with an even better set of therapeutic interventions. So you go from something that creates immediate, relatively short term value to something that then drives innovation in the subsequent stages of development. That would be my hope.

ARIA:

It seems like this is also one of those places where a lot of people talk about privacy, especially with AI, and it’s one of the places where we need to do something for the common good that some people might not be comfortable with. I don’t know if you have thoughts on that.

DAPHNE KOLLER:

I absolutely do have thoughts on. It’s something that I’ve thought about a lot. I think that if you ask anyone who’s actually been a patient of a grievous disease, whether they are interested in sharing their data in an appropriately anonymized and protected way in a research environment towards the common good of finding better therapeutics, you’ll find that you will have a very high success rate in terms of people saying, “Of course I’d love to be helpful,” and yes, you need to make sure that it’s done with the appropriate parameters around, you know, anonymity and so on and so forth. But most people would like to be helpful. Right now there is not even a way for someone to be asked that question or to provide a response to that question. Think about this, by contrast to the fact that when you go into the DMV today, you can mark on a little form that you’d like to be an organ donor and everyone has that opportunity.

DAPHNE KOLLER:

That is a much bigger decision than donating your data. You’re actually donating your organs. I have the ability to mark that, but I have no place to mark the facts that I’d like my data to be used for the advancement of science and clinical care. Why is that? Why is there not a centralized repository where data from patients who consent can be collected, appropriately curated and harmonized, and provided to the scientific community with appropriate safeguards for the development of new biological insights and new therapeutic interventions? You asked me about the magic wanding. There’s a magic wand right there. A magic wand would be someone investing what is not a very large amount in a centralized healthcare data repository — which by the way, other countries have, not all of them, but a lot of them do. And we have nothing remotely like that in the U.S. That’s really a travesty.

REID:

I a thousand percent agree.

ARIA:

So Daphne, in every episode we invite AI into the conversation and we ask GPT-4o about machine learning empowered drug discovery and treatment in 2034. And then we asked GPT-4o to illustrate these transformations via a fictional story of a researcher in Cincinnati. And you’ve read this story and it’s in the show notes for listeners who want to follow along. And so again, I did not create this, so you can hate it or love it. It won’t reflect on me. What were your reflections on the story? What would you change? Like what seemed promising?

DAPHNE KOLLER:

So I think the story is actually not bad at all. I think it captures some important components including the ability to identify drugs for specific subtypes of disease — like in the subtype of genetic epilepsy. I thought that was compelling. I liked the ability to also generate new therapeutics by design, if you will, in the antibiotic resistance piece. Where I think there was a bit of a gap was in the lack of connectivity between those two, which is I think coming back to the identification of novel therapeutic hypotheses based on an understanding of this sort of fine parsing of disease mechanisms. So I think sort of saying here is an understanding of some biological process that underlies the disease, and now we’re going to close the loop and design a drug towards that goal — that wasn’t present in the story. And I think that’s where most of the opportunities are going to lie, simply because — much as I love drug repurposing — it is rare that a repurposed drug is really the ideal solution to a new disease. And so I think it’s you oftentimes do need to make new medicines as you under, have a better understanding of the underlying biology. So that being said, it was a pretty I mean they did a pretty good job.

ARIA:

Awesome, thank you.

REID:

One of the fun things that I love about doing podcasts — even with people I know like you, Daphne — is that I actually learned some things in the prep that I didn’t know before. Like I had known that you’ve done serial entrepreneurship and various things that are, you know, kind of impactful at scale to society and humanity. What I didn’t know is — it might have started with a kind of pivotal conversation with your postdoc advisor, Stuart Russell, shifting from conceptual work to building the things that matter at scale in the world. If that’s true, tell us about that conversation.

DAPHNE KOLLER:

Yeah, so I’m glad you’re bringing that up, Reid. And it was a fairly pivotal conversation, and I owe Stuart a great debt of gratitude for asking me that question. So this was a conversation that I had relatively early in my postdoc. I think I’d been there for a few weeks. And you know, my thesis — which you know, ironically in some sense was an award-winning thesis — was incredibly conceptual and theoretical and had, there was a lot of algebraic topology in there and some really cool theorems and whatever. But part of that conversation that I had with Stuart was he took me to lunch and said, “so you got this award for your thesis. If I gave you like a team of really talented undergraduates here at Berkeley, what would you have them take from your thesis and implement?” And I think I literally sat there with my jaw hanging open, because no one had actually asked me that question before.

DAPHNE KOLLER:

And if I had to, you know, be honest about this, the answer was none of it. I think it was really beautiful conceptually, but if I were really going to solve that problem, I probably wouldn’t do this. And so that I think was a turning point for me in my thinking of, “okay, what would it actually take to take the ideas that I had and use them to make an impact on the world?” And that drove me to really embark on the path of machine learning, which I thought was more applied. And then applied machine learning in the surface of robotics and computer vision, which at the time, even then — I mean today’s different — but back then wasn’t all that useful to people. And I mean, it was more applied. You could actually write code and do something, but it wasn’t going to be used to help, you know, blind people see or anything like that anytime soon back in the, you know, mid ’90s or even early 2000s.

DAPHNE KOLLER:

And that led me to biology and medicine as something that potentially could have an even greater impact. And I would say that the thing that led me to make that final transition was actually in the around 2009, 2010 era — the final transition being away from academia into industry — where one of my PhD students and I wrote a paper — one of the earliest machine learning applications to histopathology, which we discussed earlier — and we recognized even at the time and were able to show numerically that a computer was much better able to make diagnostic predictions on patient’s outcome from these images that were collected from these biopsy samples. And a human clinician was able to do — except if they were like the best in the field at some of the best cancer care centers in the world, but your average clinician was considerably outperformed by an AI algorithm.

DAPHNE KOLLER:

And this is even back with 2009, 2010 era. And we went around and talked to a whole bunch of companies and said, “Look, here’s this thing,” like companies that make diagnostic devices, “Here’s this thing, do you not want to use it to make outcomes for patients better?” And they were like, “So what’s your product that you’re selling us?” And I was like, “I’m not trying to sell you a product, but we have this technology.” And they said, “Come back when you have a product.” Because we realized that you can’t really do tech transfer with a paper, you can do tech transfer with a product. The PhD student that worked on this actually went to Harvard and continued the journey — as a faculty member — continued the journey of trying to make an impact within academia and similarly was unable to do that. And so he ultimately ended up founding a company to do exactly that, histopath for diagnosis.

DAPHNE KOLLER:

I ended up going to Coursera because I felt like if I was going to make an impact, I needed to actually build a product. And so when we launched the first Stanford massive open online courses, and I realized that if I wasn’t going to take that forward, it just wasn’t going to happen because you need to actually build something in order for people to use it. So that to me was kind of like that final step in the journey of: if I was going to make an impact, I needed to actually have my hands dirty and do it myself.

ARIA:

I mean, I love that story. [Laughs] So I’m sure you get this too — and Reid, I’m sure you get this as well. Like, people come up to me all the time and they’re like, “Hey, why are you so excited about AI? You talk about it all the time. Like, my life hasn’t changed. Like, I don’t, I don’t see any goodness coming. Like what, what is the excitement?” And so my question for you is, if members of the general population, companies, even the scientific community, if they’re reluctant to embrace AI, like what do you say to them? Where is your optimism centered that you’re like, “Well, not a huge amount has changed, but it’s about to,” or, “You don’t know so much has changed.” Like, where does that optimism come?

DAPHNE KOLLER:

So I think at this point a lot has changed and oftentimes it’s just underneath the hood. AI is underneath the hood and people don’t even realize that it’s there in, you know, everything from optical character recognition — I’m not talking back like going back five or seven years, even before this latest generation of, you know, LLMs and so on Google translate something that existed, you know, several years ago pre-LLMs. Now of course it’s much, much better. But I think people just didn’t realize it was there. It wasn’t as blatantly obvious as all of the new chat GPT and other LLM models. I think now people realize that it’s there and at this point the transformation is so incredibly rapid that I think it’s just even hard to keep track of. And I think it’s just that we’ve gotten used to the magic. The fact that now translation is entirely seamless and highly, highly accurate.

DAPHNE KOLLER:

I mean, people are like taking it for granted, but that’s AI underneath the hood. The fact that we’re able to do searches in our photo collection based on a verbal description of what is in the photo — again, completely inconceivable 10 years ago. And now it’s just like, of course we can do that. I think it’s honestly, there’s been this weird irony in AI that it’s always a moving target — that whenever AI achieves success at a certain thing, it becomes not AI because obviously computers can do it, so it’s not AI. So I think it’s more about the matter of people’s mindset than about the actual reality of the technology. And I will bet you that when a combination of AI and the human invent or discover a new medicine, people will also say it’s not AI. [Laughs] One other thing that I think makes me optimistic is the realization that in AI we’ve really been living on an exponential curve for probably the last few decades.

DAPHNE KOLLER:

And the thing about exponential curves is that they’re deceptive things. When you’re at the early stages of the curve and people typically look at last year versus this year and they interpolate as a linear line, the progress that you anticipate you’ll make in 10 years looks pathetically meager. And so that’s where the field has been for a lot of the last years. And then as the exponential curve begins to move up, even linear interpolations like, “Oh my God, we’re going to be in an incredible place in the last few years.” But even then, people don’t realize that this is truly an exponential curve. So I think oftentimes people ask me, you know, “Where do you think the world will be in five years?” And my answer is, “I have no idea.” Because when you have an exponential curve, slight differences in how quickly that exponent is moving give rise to dramatically different outcomes in terms of when you’re going to achieve certain milestones. And so, but what I do know or I firmly believe, is that we are on this exponential curve. So where we’ll be in five years is dramatically different than where we are today. And the capabilities that we’ll end up with at that point are probably way beyond our ability to conceive of that amount of change in that short amount of time.

REID:

I cannot more strongly agree. One of the questions that our conversation brings to mind that I think is actually extremely important is, what should be the developmental dance in AI between academia and businesses? Because the traditional academic mindset is pretty off on this stuff because they don’t get the scale compute, they don’t understand the use of scale data, they’re generally not using scale teams. And yet they go, “Well, we train these really bright people, we do these broad hypotheses.” Our normal thing has been the science and invention leading that goes into industry, but now it’s a very tangled web and you have played on both sides. And part of the most urgent question is not to say, “Look, it’s changing.” The question is: What should we help it change to, such that it’s better for society and humanity and, and how should academia play and how should industry play? What you’re thinking about that as particular when it comes to AI, but it also could obviously be the whole medical therapeutic drug discovery and development industry as well?

DAPHNE KOLLER:

You know, it’s a, it’s an interesting time to be an academic in AI, because you’re absolutely right. They have access to a lot fewer resources than industry does. Compute — even more than compute, data. The ability to actually create a team of people working jointly towards a problem, which is really in some ways an antithesis to a lot of academic research where it’s like the PI and a student maybe with some support from other people. Whereas what you need is much more of a team effort, which is much easier to muster in an industry environment. All of those make it challenging to do the kind of scientific advancements in academia that you see in industry. And sure enough, when you look at where the big progress in AI has come up in the last few years — I mean the transform model didn’t originate in academia, it originated in industry, right?

DAPHNE KOLLER:

Certainly there’s still room for academia. There’s room in maybe some problems that are sort of at the frontier that don’t have immediate practical ramifications, but will. I’ll give you an example in the work that we do — I think there’s an enormous role that causality has to play when you’re talking about making interventions in the world. And pattern recognition that basically generates associational correlations are not, it’s not going to get you there. You need to be able to make predictions about if I intervene in the world, what would happen? And I think there is an important role for foundational work there that maybe does belong in academia. So I think that’s one role. I also think that there is a really important role in training our next generation of researchers in ways that are oftentimes different than the kind of mindset that industry provides.

DAPHNE KOLLER:

And I will tell you that in looking at a lot of the younger generation of researchers to come out in the last few years, a lot of them are — they’ve lost the ability to think deeply about problems because the problems have become easier to solve. So when I grew up — and now I sound like one of those old people, “Oh, in my day,” you know, but even so — when AI was really hard and data was really hard to come by, you really needed to geek out the insights from the data that you had and you needed to think deeply about how to model things and how to extract the max amount of value. When data is abundant and AI models are the kind of thing you can download from GitHub, and you just toss your data in and out comes something that is, in most cases, pretty plausible, people just do that. And they lose the other muscle, of thinking really deep and hard about hard problems, often atrophies a little bit. And I think a model in which academia really teaches people to think about problems that are truly hard, that aren’t solvable by tossing data into an existing model and you know, something sensible typically comes out. I think that frontier is something that is really within the purview of academia and I hope that more effort is focused on that.

ARIA:

Daphne, you have managed to build a company that is at the centerpiece of two different disciplines and that requires two different cultures coming together. Like how do you do that? How do you build that company?

DAPHNE KOLLER:

That is probably one of the hardest and yet most important things that we’ve had to do, because you take even incredibly smart, well-meaning collaborative people from these two different disciplines. You put them in the room together, they will end up talking entirely past each other. Not only because the jargon is different and the concepts are different, but because their way of thinking, the way in which an engineer approaches the world and a discovery scientist approaches the world, are completely different ways of thinking about what’s interesting and what matters the most. And so that has been an incredibly important build for us and it involves setting up the right set of values to begin with. So we have values that speak to, for example, we create together, not in silos. We engage with each other openly, constructively and with respect — because oftentimes you find that people with different cultures come in with a great amount of hubris about what their discipline offers and diminishing the contributions of a discipline that they’re less familiar with.

DAPHNE KOLLER:

And we really basically set up a culture where that is just a complete no-no and people come in with tremendous humility and respect for what everybody can offer and bring to the table. And I think we’ve done that probably better than anywhere else that I’ve seen in terms of creating a culture where people really do come together. And not only come up with better solutions, but come up with problems that neither of them would’ve come up with on their own. And when I look at the future, given how quickly the technology is shifting, both in AI as well as in the life sciences, our competitive advantage as an organization isn’t going to be on any single technology that we have today because that technology will likely be obsolete in five years. It isn’t creating the right culture where people are able to create something as a synthesis of these two disciplines. That is the competitive advantage that will I think make us unique and differentiated in the coming years.

REID:

Rapid-fire questions. Is there a movie song or book that fills you with optimism for the future?

DAPHNE KOLLER:

You know, I think optimism is challenging these days given the overall state of the world, but one that I like is a book called Not the End of the World: How We Can Be the First Generation to Build a Sustainable Planet. I think climate change is something that scares me a lot, I think scares a lot of us a lot. And this book presented I think a somewhat more optimistic perspective, and importantly one that’s grounded in data, which I really appreciate.

ARIA:

And is there a question — it could be personal or professional — that you wish people would ask you more often?

DAPHNE KOLLER:

I actually wish people would ask me fewer questions, because I get asked a lot of questions. And I like to hear what people — other people think because I learn from that.

REID:

Yes. All right, so where do you see progress or momentum outside of your industry that inspires you?

DAPHNE KOLLER:

I think the intersection of the life sciences and AI broadly construed — even outside of healthcare — has a tremendous amount of opportunity because I think we finally have the ability, by combining the ability to interrogate biology at unprecedented fidelity and scale, interrogate what we see using computational tools and then engineer biology to do something that it normally otherwise do. We have the opportunity to transform the environment, agriculture and many other things as well.

ARIA:

So Daphne, can you leave us with a final thought on what you think is possible to achieve if everything breaks humanity’s way in the next 15 years? And what’s our first step to get there?

DAPHNE KOLLER:

We’re making huge advancements in science across multiple frontiers in biology, in AI, in physics with fusion. And I think we have the opportunity to take all of those advancements and completely reshape the world to be a much better place and address many of the issues that currently plague us. Such as, hopefully global warming and in human health and many other things. The pessimistic — the pessimist in me, says that we can’t get out of our own way. And so it’s not about having things break our way, it’s about our ability to take those incredibly lucky breaks and do the right thing with them.

REID:

All right, Daphne. Awesome, as expected. And an intellectual tour-de-force.

ARIA:

Daphne, thank you so much.

DAPHNE KOLLER:

Thank you.

REID:

Possible is produced by Wonder Media Network. It’s hosted by Aria Finger and me, Reid Hoffman. Our showrunner is Shaun Young. Possible is produced by Katie Sanders, Edie Allard, Sara Schleede, Adrien Behn, and Paloma Moreno Jiménez. Jenny Kaplan is our executive producer and editor.

ARIA:

Special thanks to Surya Yalamanchili, Saida Sapieva, Ian Alas, Greg Beato, Ben Relles, and Parth Patil. And a big thanks to Karrie Huang, Steve Meadows, Gwynne Oosterbaan, and Little Monster Media Company.