Assen Jablensky - Symposium 2003 - BMA Home -


Phenotype-Genotype Relationships in Psychiatric Disorders

Assen Jablensky
School of Psychiatry & Clinical Neurosciences
University of Western Australia

My topic is phenotype-genotype relationships in psychiatric disorders, and in talking about phenotypes I will use that term in a very broad sense. Height, weight, colour of skin, features like cognitive test performance, or MRI brain volumetry are examples of phenotypes, ranging from simple elementary entities which we can pinpoint and measure to more complex levels of phenomena to which we attach diagnoses and disease labels. I will first mention some important developments emerging in the aftermath of the human genome project, now that the human genome is fully sequenced and annotated. I shall then discuss briefly the problem of phenotype - genotype linkages, or as they are also called, phenogenetic relationships, in trying to persuade you that in psychiatric genetics, the phenotype is likely to be the major "rate limiting factor" for using the novel developments stemming from the human genome project, and discuss. I will then present data from our family studies of schizophrenia in Western Australia as a "proof of principle" that by paying greater attention to detail in phenotype characterisation we gain power for genetic studies. Finally, I shall refer to an initiative which is emerging on the horizon and has been recently floated in the literature: that after the human genome project, we need something like a global human phenome project.

Following the successful completion of the human genome project, the major question that arises is about the function of the identified genes - supposedly numbering between 30 and 40 thousand (some still believe there are about 50 000 functional genes in the human genome). What do they do, and how do they relate to human diseases? Several hundred mendelian, relatively rare single-gene diseases have already been dissected genetically with great success and efficiency, but we are now faced with the worldwide prevalence of the so-called common or complex diseases, ranging from asthma, cancers, cardiovascular disease, osteoporosis, arthritis, to schizophrenia and bipolar disorder. There are proposals that, to tackle their genetic complexity, we should collect enormously large samples of people affected with those diseases, as well as healthy controls, and systematically - like in the human genome project - sequence all the possible genes to find in their structure variants associated with morbidity. That would be a mammoth enterprise never tried on such a scale in any human endeavour. Another idea is to first identify all of the so-called single nucleotide polymorphisms (SNPs). These are genetic markers, very simple ones, which are spaced almost evenly across the genome. Their rate of individual variation is so high that a combination of such SNPs can identify the genetic uniqueness of any individual on the planet. It has been estimated that about 500,000 such SNPs are required to to provide the information necessary for constructing a so-called SNP haplotype map of the human specied. Haplotypes are combinations or patterns formed by such SNPs and the map would enable us to identify how such combinations relate to common diseases. There are also major technical developments, such as the DNA microarrays, which allow the identification of the expression of something like 11 or 12 thousand genes in a single sweep, using a to identify the gene expression on a miniaturised component, a microchip which is smaller than a fingernail. One of the fascinating future applications of this technology is the capacity to combine in a single study conventional positional cloning and association with microarrays to map genes as well as to measure their expression. However, all these novel developments bordering on science fiction are at present facing a major problem, which is that the amount of information generated (and which potentially can be generated) is orders of magnitude larger than whatever has been handled so far by our statistical models. This problem may not be unsurmountable but is creating difficulties, so that the statistical treatment of complexity remains an extremely important challenge.

In talking about genotype-phenotype relationships, and the collateral problem of gene-environment interactions, I think one should realise that that there exists an implicit tendency (also referred to as the "modern dogma") to interpret the results of the human genome project in terms of a simplistic view that the causation of disease is ultimately gene-based. We have had in the past such models as "one gene - one disease". Now, with the common diseases, we tend to acknowledge that the situation is not as simple as one-to-one, but probably is something like "many genes - one disease". I think that this is very likely to be wrong. First, some basic considerations about phenotypes and genotypes that have recently been stated with persuasive clarity by Weiss and Buchanan (2003). Evolution works by screening the phenotype and not the genotype, so that the genetic history of the species, as well as the formation of any specific traits that we are trying to analyse in our studies, results from either evolutionary pressure on the phenotype, or from lack of such pressure on the phenotype. Secondly, there are multiple non-genetic or epigenetic influences and processes, which means that after transcription, factors with stochastic properties and, therefore, behaving randomly contribute significantly to phenotype variation. These include - to mention just a few - alternative splicing; incomplete folding of the proteins produced by the genes which renders them dysfunctional (this is happening in Alzheimer's disease and in other neurodegenerative diseases); the concentrations of transcription factors in the cell may vary, depending on a number of parameters that are difficult to control; and finally there may be varying amounts of gene products which have inhibitory action on function.

Charging ahead, I should like to refer now to the results of a major epidemiological study, conducted by the World Health Organization, in which I had the privilege to play a role (Jablensky et al., 1992). It is known as the WHO Ten-country Study on Schizophrenia, in which identical methodologies were applied in geographically defined areas on four continents to identify all incident cases of schizophrenia, assess their clinical characteristics and re-examine them on annual follow-ups. This enabled us to calculate the incidence of schizophrenia per 10,000 people in the age range 15-54 in quite diverse populations and cultures. The incidence rates, of course, were found to vary but the band of this variation was rather narrow and the overall conclusion was that schizophrenia has a similar incidence in different populations. In fact, the similarity of both incidence rates and symptoms of the disease across countries such as Denmark, India, Nigeria, Russia, Japan etc. was so striking, that one could feel tempted to say "well, if such striking similarities exist, and we know that schizophrenia has a strong genetic component in its causation, so probably there are common genes causing schizophrenia across those populations". While I was initially also inclined to take such point of view, I now think that this kind of interpretation of the findings of the schizophrenia study, as well as of studies involving other complex diseases, is likely to be wrong for the following reasons. In complex diseases, the phenotype-genotype relationships are not just "many genotypes to one phenotype" but rather "many genotypes to many phenotypes". A given genotype can be associated with a range of phenotypes, and the same phenotype can be associated with many different genotypes. Complex diseases illustrate well the phenomenon called phylogenetic drift, which means that in the course of evolution the association of a disease with particular genotype becomes looser and looser if that disease tends to be non-fatal and to have a later onset. Diseases like dementias, diabetes, myocardial infarction and schizophrenia rarely, if at all, have their onset at birth. Schizophrenia typically manifests itself in late adolescence or early adulthood, but many other complex diseases have onset in middle age. This means that the selection pressure to eliminate or alter the disease-predisposing genotype becomes weaker and weaker in the course of evolution, so that the result is the uncoupling of the specific relationship between a particular set of primordial genotypes and a given phenotype. Moreover, what tends to occur increasingly in the course of time is the so-called phenotype conservation or phenotypic convergence. Being only weakly exposed to selection pressure, a phenotype like schizophrenia (described by its symptoms) may remain relatively unchanged against a background of an increasing genotype divergence associated with population migration. In the long run, the result would be that schizophrenia becomes associated with a number of different genotypes, some of them rare, some relatively common. If this hypothesis can be supported by adequate evidence and withstand critical testing, then genetic heterogeneity (both locus and allelic) of schizophrenia will indeed pose great challenges to our search for susceptibility genes.

At this point I should like to address a misconception which is revealed by the use of terms like "genes for schizophrenia". Weiss and Buchanan (2003) refer to this as the "Lamarckian illusion", i.e. naming the genes so as if the function through which they are discovered is their evolutionary reason for being. What we actually do in research into diseases like schizophrenia and their genetics is screening the population using our health services by clinical phenotypes. However, the clinical phenotypes we identify are very likely to represent just the tail end of the phenotype's true distribution which extends beyond the clinic and into the general population. At present we know very little about the nature of the "non-clinical" dimension of the phenotype that, in its severe form, manifests itself with the clinical features of schizophrenia. Is that dimension predominantly expressed as personality traits, as patterns of cognitive functioning, or as something else? It is the tip of the iceberg that we see in our health services and in our studies.

The "hopeful" scenario is that there are common genetic variants underlying the common diseases, but however difficult that is to prove, the alternative is also very likely. The accumulation of rare alleles will result in significant genetic heterogeneity: a disease may be common in the population precisely because multiple, different and rare genetic variants are involved. At present we do not have a critical test that could conclusively arbitrate between these two alternative models of complex diseases. Schizophrenia is one of the genetically complex diseases for reasons which are well known - its inheritance does not follow mendelian rules; it is very likely associated with multiple genes, each of a small effect; a contribution of environmental factors is likely but so far we have not been able to identify any specific such factor; and significant gene - gene environment, as well as gene - gene interactions are probably an important part of the picture. The search for susceptibility genes in schizophrenia, as well as in other similar disorders, follows a classical scheme. First we use genetic markers in a linkage analysis to find out which route phenotypes travel together with the disease across generations within families and how they map on the genome. Then we try to refine the linkage region by genotyping in it additional microsatellite markers (now also SNPs). Following that we conduct linkage disequilibrium analysis to estimate the statistical association between polymorphisms within the region and the disease in a sample of cases and to compare that to a sample of controls without the disease. Hopefully, we then find our biologically plausible candidate genes, using the rapidly accumulating information on the annotated genomic databases and conduct further association studies until mutations or variants in the genes are found. The next, most complex and difficult step, is to understand their role - what they do, are they expressed in the relevant tissue, and conduct functional studies, including animal models.

The study designs include multiply affected families, as well as designs based on nuclear families - affected sib pairs where two siblings share the same phenotype, or triads where we have two unaffected parents and an affected child. At present, close to thirty studies have carried out complete genome scans of families with schizophrenia and tentative findings of linkage have been reported on more than half of the human chromosomes. The findings are suggestive, but hardly any such finding has been definitively replicated, although there is a tendency in recent years for successful replication of at least some. As regards the case-control association studies using SNPs or SNP haplotypes in candidate genes, the problem they face is the difficulty in ruling out false positive results, due to the lack of strong prior hypotheses about the probability that a "true" association exists.

Another problem is that the majority the studies have so far used as the phenotype the clinical diagnosis of schizophrenia. Although the reliability of clinical diagnosis is now higher than it used to be prior to the introduction of the explicit diagnostic criteria in classifications such as ICD-10 and DSM-IV, it remains questionable whether such diagnoses are the best phenotype for genetic research. Increasing doubts and questions are emerging. To quote just one, "genes do not code for hallucinations and delusions or thought disorganisation per se…the biological effects of genes are likely to be more predictable in terms of the underlying abnormalities in brain function rather than in terms of a highly variable and subtle experiences of subjective experience of hallucinations or delusions" (Weinberger, 2002). What we need are phenotypes capturing structural abnormalities in the brain, particular brain dysfunction, and behavioural traits, rather than only the clinical diagnosis which is probably too general. We can postulate that the more precisely or narrowly a phenotype is defined, the more likely it is that the phenotype would be closer to the causal physiological pathways involving a limited number of genes. It is possible that our current diagnostic categories of schizophrenia, bipolar disorder, anxiety disorder etc. actually represent conflations of several interacting narrow phenotypes which operate at a deeper level, below the surface of clinical presentation. How can we increase the genetic informativeness of phenotypes for psychiatric research? First, we can try to reduce the amount of diagnostic misclassification in our sample (which some estimates suggest may be as high as 30% diagnostic error. Secondly, we may divide our sample into clinically more homogeneous groups using approaches like candidate symptoms (e.g. primary or idiopathic negative symptoms, target features like anhedonia, or cognitive dysmetria, a concept which has been proposed by Andreasen (1999). However, a more general pointer in the right direction is to maximise the risk ratio (or prevalence ratio, lambdas), which is the ratio between risk of disease in first-degree relatives (e.g. siblings) and risk of disease in the general population (Risch, 1990). For example, a sibling of a person with schizophrenia has about a 10-fold increase in risk of schizophrenia compared to a randomly selected person in the general population where the risk is about 1%. This gives a risk ratio of 10, and we will be looking for alternative phenotypes, correlated with schizophrenia, that would have at least that or even higher risk ratio. There are studies indicating that the lambda for the Continuous Performance Task / Identical Pairs (CPT-IP) is in the order of 30. Using such cognitive tasks, we may be able to define component traits, often referred to as endophenotypes (Gottesman and Gould, 2003) or schizophrenia-related variants (SRV), based on the hypothesis that currently known cognitive neuroanatomical and biobehavioural markers might account for more of the genetic variation than does the clinical diagnosis of schizophrenia (Cromwell, 1986). Such schizophrenia-related variants are associated with the presence of schizophrenia but play no part in its clinical diagnosis. They tend to emerge earlier than the onset of clinical symptoms, and we are likely to also find them among the mentally healthy relatives of the patients. For the time being, we can only hope, without having definitive evidence, that such SRVs involve the same biological pathways as the disease but are less remote from the relevant gene action than are our diagnostic categories. Since the early work of Kraepelin (1919), who not only identified schizophrenia as a disease but also provided the first evidence that it is basically a cognitive disorder, we have good reasons to look at cognitive dysfunction in schizophrenia as the source of possibly useful phenotypes.

I shall now try to reinforce some of these rather theoretical points by providing empirical data from the Western Australian Family Study of Schizophrenia. In this study, we have three aims: to explore alternative ways of refining the phenotype; to carry out a detailed, multi-level assessment of patients, relatives and controls; and conduct molecular genetics studies using a combination of clinical diagnosis and neurocognitive phenotypes. We have by now completed a genome scan or 116 families, including a total of 412 individuals, each having a full assessment involving a standardised clinical diagnostic assessment, personal and family history, neurological and physical examination, two sets of measures of temperament and personality traits, a neurocognitive test battery, and in a subset of those families, also brain potentials studies, saccadic eye movements, and structural MRI. We have DNA samples from all the members of those families. The neurocognitive assessment includes measures of prior and current intelligence, executive attention, verbal fluency and verbal memory, speed of neural processing, and measures of handedness and laterality. On most of these measures, we find - expectedly - that the probands with schizophrenia are very different from the normal controls, while the first-degree relatives of patients are somewhere in between the probands and the controls. That is to say that roughly 50% of the relatives are very similar to the patients in their neurocognitive performance without having any of the clinical features of the disorder. The problem we faced was how to analyse this large volume of multi-domain data - as multiple single tests, or as a multivariate composite picture combining in some all these measurements? Analysing them as a composite neurocognitive mosaic could be expected to produce a picture that reflects neurobiology better than individual variables. Measures such as CPT-IP have a high prevalence ratio but their effect size is small to modest, which makes individual tests less hopeful candidates for genetic analysis. In order combine all the variables in ways that result in complex patterns, we decided to use form of latent class analysis, known as grade of membership analysis (GoM), which generates a set of so-called pure types - latent classes described in terms of probabilities for the variables constituting the neurocognitive set of measurements. The advantage of the method is that it generates the pure types and also simultaneously estimates (by maximum likelihood) the extent of which each subject in the sample fits each of the pure type profiles. Every individual then gets a quantifiable grade of membership in more than one of those latent classes. By using this method, we obtained two major neurocognitive pure types, each including a subset of the probands with schizophrenia and a proportion of their biological relatives. The first type is characterised by very high probabilities that its members will be significantly impaired on general ability, the continuous performance tasks, verbal memory, verbal fluency, neural processing speed, and also have high scores on soft neurological signs. The second, non-deficit type shows little impairment on the majority of neurocognitive measures but, interestingly, exhibits almost 100% probability of being characterised by high schizotypy scores (on Raine's Schizotypal Personality Questionnaire), and harm avoidance and self-transcendence (on Cloninger's Temperament and Character Inventory). Heritability, in terms of familial aggregation, was highly significant for the deficit type and only marginally significant for the non-deficit type.

The clinical features of the patients who were classified into those two major types showed some striking differences. There was a high proportion of non-paranoid clinical subtypes (such as undifferentiated, hebephrenic and simple schizophrenia), while the non-deficit type included predominantly cases of paranoid schizophrenia. Our initial hypothesis was that the non-deficit type may represent an earlier stage of the disorder, with milder deficits which in the course of time become more severe. This hypothesis was rejected when we examined the length of illness in the two clusters. Nearly 50% of the non-deficit cases had over thirteen years length of illness compared to 23% of the deficit type. It is likely, therefore, that these two neurocognitive types arise differently from one another early in the course of the disorder, and that the deficit type is not a late stage of the non-deficit type. Furthermore, there was a tendency for the deficit type to require higher doses of both typical and atypical antipsychotic medications.

The "proof of principle", that we may be dealing with two genetically distinct forms of schizophrenia, comes from the genetic analysis. We conducted a whole-genome scan using 400 microsatellite markers and used a combined phenotype including clinical diagnosis (of those affected) and the composite neurocognitive profiles of all probands and family members (affected and unaffected) to stratify the sample by liability classes. The main finding was a linkage peak with a lo9d score close to 4 within a relatively narrow region on chromosome 6p, which is genome-wide significant. Importantly, this significant linkage was almost entirely explained by the cognitive deficit phenotype; the non-deficit phenotype did not show any linkage in this region. The other finding was on chromosome 10q where we obtained suggestive (bordering on significant) finding of linkage with the lod score of 3.5. Again, it was mainly the cognitive deficit phenotype that was linked to that region.

There is a growing list of potential candidate genes. To mention just a few, close to our area of linkage on chromosome 6 is the dysbindin gene, coding for a protein with a function in synapse formation and several groups, including ours, are now investigating this gene for association with schizophrenia. There is another gene (neuritin 1) located more distally in the same region which is also involved in synaptogenesis and neural plasticity. We are now sequencing the entire gene and will have results in the near future. However, I think that the main "proof of principle" comes from the partitioning of cognitive dysfunction in schizophrenia into subtypes, which results in correlated phenotypes useful for genetic studies. If we manage to integrate multiple correlated measurements into a composite cognitive trait, we obtain increased power to detect genetic linkage, since it is not only the individuals affected with the disease but also their clinically unaffected relatives that become genetically informative.

My conclusion is about a recent publication (Freimer and Sabatti, 2003) which signals something important and related to what I have been trying convey up to this point. It is a proposal for a Human Phenome Project. We need new strategies for a systematic study of phenotypes in order to identify variants associated with complex traits like schizophrenia. Such a project would enable us to inverse the mapping strategy from the traditional search for shared genotypes to a search for shared phenotypes, by analysing large, comprehensively phenotyped samples from the general population as well as samples ascertained for disease-related phenotypes. Such research could be modelled on the existing network of NIH clinical research centres. I have no doubt that these ideas will eventually get strong support in the United States. The question on which I will end my talk today is: will there be a prospect for an Australian Brain and Mind phenome research network?

© 2004, Brain and Mind Australia Inc. - Copyright Notice -