On AI and genetic risk factors: a new scientific horizon
Imagine the following situation: you have a friend, 42 years old, and she sends you a message that she would like to talk with you about something. It turns out that her mother died in her 60s from breast cancer and her elder sister from the same cause at age 47. She has just had a genetic screening and found that she is heterozygous for a mutation in the BRAC1 gene. This is a gene known to have variants that strongly promote breast cancer and one such was probably the cause of her mother and sister’s deaths. The risk factor (probability) of her getting breast cancer in the next 10 years is “only” 25-30% (still much greater than most women in her age group) but rises to over 50% for carriers in their 70s. (See “Risk factors”: an introduction) She is considering getting a double mastectomy as a preventive measure.
She has already discussed the pros and cons with her partner, who reluctantly agreed to the operation but she wants to discuss it with you as a good friend who knows her well. She is not asking for advice per se let alone for you to make the decision for her but she wants your perspective. What are you likely to recommend? And how will you reach your decision? There are clearly multiple factors to consider but above all there is the uncertainty inherent in the risk assessment if she does nothing. After all, a 50% chance of developing breast cancer in her 70s means that 50% of the women in her situation will get through that decade without coming down with it. And a mastectomy would be irreversible. Should she or shouldn’t she do this?
This situation is obviously a difficult one and we will come back to it in a moment. First, though, let us take a brief detour to look at the biology, the puzzling fact that mutant genes do not always show their effects in the people that carry them. There is a genetic term for this situation: penetrance. The per cent penetrance of a genetic variant is the per cent of people who show the mutant effect. If someone, however, does not show the effect, one might ask: how does one know that they have it in their genome? The answer is that if some of their children show the effect (where the other parent did not carry the variant) or these days, from DNA sequencing, the variant was there in the genome but did not give the trait.
How does one explain lack of penetrance? In the early days of genetics, which really only began at the start of the 20th century, investigators paid little attention to it and may have seen it only rarely. Even earlier, Mendel did not discuss it in his few papers. But by the second decade of the 20th century, with more and more work in genetics, it was an inescapable fact. It had to mean that other factors could influence, indeed suppress, the development of the mutant trait. They might be from the environment (e.g. temperature, or one dietary element) or from effects of other genetic variants in the genome of the animal or plant under investigation. Most often these additional genetic factors were in genes that had not been identified. In those cases, investigators would label the modifying genes as effects produced by the “genetic background”.
In other cases, however, the genes responsible for the modifying effects had been identified. In some, it was even possible to understand exactly what was going on. Take for instance the basis of eye color in the fruit fly, an organism that had been recruited for genetics research in 1910. In this insect, the eye color is normally a dark red.
However, certain fruit fly mutants were known that had altered eye color. In one, the color was a bright red and the gene responsible was named “scarlet”, symbolized as “st”. Another mutant strain had brown eyes, and the altered gene was “brown” or “bwn”. Both were recessive and complete loss-of-function (l-o-f) mutants (see Should genes be seen as controllers or nudgers of biological development?). Hence the genetic constitution of scarlet-eyed flies was st/st and of brown-eyed flies, bwn/bwn. What happened if you made a fly mutant for both genes, so that it was st/st; bwn/bwn? It turned out that these flies had white eyes!
The explanation was soon apparent. Normal or “wild-type” flies have dark red eyes because they contain two pigments, one that was bright red, the other a dark brown; the brown pigment darkens the over-all color. Underlying these two pigments were two distinct biochemical pathways, each consisting of a sequence of enzymes that led respectively to production of one of the two pigments. Specifying each enzyme in each pathway was a particular gene. The wild-type st gene encoded an enzyme that helped make the brown pigment. The respective mutant gene involved a loss-of-function mutation eliminating that pigment, leading to the production of eyes that had only the bright red pigment. In contrast, the wild-type bwn gene specified an enzyme that made the bright red pigment and the l-o-f mutation in that gene eliminated that pigment, leading to a brown-eyed fly. If the fly cannot make either pigment, as in the double mutant, you get a white-eyed fly. It all makes sense as long as you remember that the gene name in each case reflects what happens when you do not make the other pigment.
This is a nice example of genetic variants having cumulative effects when they affect different pathways in development that nevertheless contribute to the same trait, in this case eye color. However, one can have genetic variants in different genes that affect the same pathway. In this situation, the two effects can be synergistic, as when two relatively slight l-o-f mutations together produce a strong loss of function of a trait. This is often the case in various diseases where there are cumulative effects from genetic variants, each of slight effect. It is these situations that often determine whether the mutation in a major gene, such as BRCA 1, has a big effect; the additive small effects increase the probability of the big gene effect becoming penetrant, to produce the mutant trait.
The above was a rather large digression into basic genetics from where we started out, the hypothetical dilemma of your friend considering major surgery for breast cancer. Her dilemma has existed for thousands of women in the past two decades or so. One thing that has changed for the better, however, is increased genomic knowledge. Due to sophisticated techniques, we now know the identity of many genes in the “background genotype” whose l-o-f mutations almost certainly contribute to the progression of many diseases. For the great majority of these diseases, we do not yet know what the wild-type forms of those genes do or how l-o-f mutations in them contribute to the disease, but just knowing their identities is a step forward. Growing knowledge of what specifically many of these genes do is helping to assess how to tailor treatments for specific diseases, including breast cancer.1
Recently a new approach, focused on relating penetrance to the kinds of mutations that occur, promises to accelerate progress even further. It involves a technique central to artificial intelligence (AI), namely machine-learning (ML). It involves massive data crunching of clinical records, genomic findings of genetic risk factors, and basic biology of those genes and their products. ML is a highly iterative process that involves the continual accretion of new data, to sort out what is significant from what is not or at least less so. It is described in two papers, one a research report, the other a commentary on it, in a recent issue of Science.2
The research report describes the results for 10 diseases: adult hypophosphatasia, arrhythmogenic right ventricular cardiomyopathia, familial breast cancer, familiar hypercholesterolemia, hypertrophic cardiomyelopathy, long QT syndrome, Lynch syndrome, monogenic diabetes, polycystic kidney disease and von Willebrand disease. In this study the health and genomic data for 1.347 million people were screened. For 1648 rare variants in 31 autosomal dominant genes that were disease-predisposing, each variant was classified as pathogenic (P), benign (B ) or previously unknown l-o-f, while the remainder were classified as variants of unknown significance (VUS).
The most striking finding was that penetrance of a mutant allele tended to correlate with strength of the symptoms associated with that variant for that particular disease. This might be expected but this is the first large-scale study to support this idea. Another significant finding was that half of the pathogenic dominant variants were associated with loss of stability of cellular contacts in the affected tissue. This is a potentially significant insight into the nature of pathogenic conditions more generally.
These are real advances in understanding these findings about these ten conditions, which collectively comprise a significant disease burden in human populations. However, beyond the specific findings, it constitutes progress in how to generally approach diseases with a complex genetic basis. Individuals diagnosed with one of these conditions and then faced with a difficult decision about treatment – which is often the case – need not be reduced to a coin flip or asking their friends for their intuitive advice. Being aware of the growing body of knowledge about the genetic risk factors involved in their condition, they can now at least ask their doctors for more specific information to better assess the odds and what they need to do. Such information won’t eliminate the uncertainties but the new information might considerably increase one’s sense of having an accurate view of the actual risks. In dealing with the mysteries of penetrance, this is real progress.
Ultimately, this progress is due to the wonderful growth in knowledge about the human genome that has accumulated over the last 25 years or so, plus the development and improvement of the techniques that are contributing to AI. In recent years, many fears about AI and its possibilities have been expressed, and much of this anxiety has a real foundation. However, in the worlds of basic science, for instance genomics and protein structures, and medicine, such as the study described here, AI has produced wonderful advances already.
See Sarhangi, S. et al. (2022). Breast cancer in the era of precision Medicine. Molecular Biology Reports 49: 10023 -10037. Doi/org/10.1007/s 11033-022-07571-2
For the one page summary of the research report, and the link to the full article, see Forrest, I.A. et al. (2025) Machine-learning-based preference of genetic variants. Science, 28 August, 2025, p. 894. For the commentary, see Raiken, H. and A. Stein (2025). Penetrance and variant consequences – Two sides of the same coin? 28 August, 2025, pp. 880-881.



