Practical Significance vs.
Statistical Significance in Astrological Research
by Jose E. Becerra
This paper was first published in Kosmos,
XVI, 2, now known as The International Astrologer, and a subscription to
this publication is a benefit of membership in
ISAR.
Click here to review commentaries on this study.
Abstract
The author examines the strength of proven astrological facts in astrology, and
consequently their relevance for astrological counseling. He advises to
indicate not only the “significance” of the result, but also the “strength of
association” it indicates, and gives a formula for it based on the amount of
cases in the sample and the theoretical frequency of the result.
The introduction of micro-computers in the daily practice of
astrological consultation is certainly an event that will continue to
revolutionize the field. The impetus provided by software developers and research-oriented astrologers is invaluable
and should eventually prove determinant in establishing the scientific basis of
astrology.
Although I share Richard Noelle’s enthusiasm and optimistic
outlook (1986) regarding the prospects for astrological research, certain
caveats are in order if we are to avoid the same pitfalls that flawed early
research efforts in other scientific fields.
Basic Principles
A basic principle in any data-computerization plan has been
summarized by the acronym GIGO: “garbage in = garbage out”! No matter how
sophisticated our computer and/or statistical methodology is, if our data are
not good, our analyses and subsequent conclusions cannot fare any better.
The quality of our data is dependent, in part, on the type
of hypotheses that we want to test. If we are interested in testing hypotheses that are restricted in their
application only to an experience, then almost any data that comes through our
hands would suffice. The conclusions drawn from such data might be valid, but certainly not generalizable. If, as is
the usual cases we are interested in both valid and generalizable results, the
strict rules of study design and sampling should be followed.
Examples
The dispute over the Gauquelins research serves to
illustrate this point. Very few
scientists have challenged the statistical methodology used (Bok & Jeroms,
1977, are an exception). The principal point of contention has been how
representative are the samples used by the Gauquelins and by the Committee for
the Scientific Investigation of Claims of the Paranormal (see account of the
dispute in “Brain/Mind Bulletin,” 1981). I personally side with the Gauquelins
in this issue. But it is because of my
trust in their sampling methodology and study design, and not because of any
significant statistical tests per se.
Another specific example might help to illustrate the
principles of good study design. A
basic question when proposing to conduct a study is: - Am I going to collect
prospective data (for instance follow up everyone born in a specific period) or
retrospective data (cases with some outcome, and referent controls)?
It is usually the second option that we have available. If so, in gathering the cases, I am dealing
with either prevalent cases (the outcome occurred before the study period, but
still persists) or incident cases (the outcome occurs for the first time during
the study period).
The danger of using prevalent cases has been described by
Nolle (19 ): “Further adjustment would be required to account
for…differing…survival rates affecting the number of “prevalent” people.” This is referred to in the epidemiological
literature as “survival bias”: Those who have died of the outcome under study
will not be available for our study and, therefore, weight be measuring only
the characteristics of the survivors of the outcome rather than the
characteristics of all who have that outcome.
The point I wish to make is that the stage of planning a
good study design cannot be obviated. The validity and generalizability of our
conclusions will ultimately rest on the quality of that design. And this is
true with or without a computer. The
“Congress of astrological Organizations Research Committee” put together some
time ago research guidelines that might serve as a starting point to be
modified according to experience. I do
not know the present status of this Committee. But it would seem wise to follow up on their work. A more detailed and scholarly presentation
of the issues involved has been compiled by Geoffrey Dean (1977).
Measures of Association
There is an important difference between causation and
association. I do not think that the
field of astrology is in the position of proving causation.
The present controversy on smoking and cancer shows that
even the most conclusive observational evidence does not absolutely prove
causation. Only an experimental design
(double-blind randomized trial) can tackle the issue of causation.
I do not know of any other than God and the Lords of Karma
who would be able to randomize the “exposure” to astrological influences. Nevertheless, within the observational
framework (including prospective and retrospective studies), we should be able
to present reasonable evidence associating astrological factors with a well
chosen outcome.
What are adequate measures of association? Before
considering them, I would like to point out the importance of choosing the
“right questions.” Few people would pretend that astrological influences
explain 100% of the occurrence of an outcome. Therefore, the right question would not be if planet A conjunct planet B
in sector X would explain outcome C, but, accepting a multifactorial causation,
how much weight should be attributed to astrological factors alone in the
etiology of outcome C. The fact that
astrological factors may have low relevance in certain outcomes does not disprove
astrology. Such evidence rather qualifies
the conditions on which astrological influences operate.
The measure of association most widely used – the one almost
exclusively used in most astrological research – is the Chi square test. It is usually reported as a “p value” : the
probability that the association found may be due to chance. A low p value (inferior to 0.05) indicates
that the association found has less than 2 in 20 chances of being
spurious. Therefore, if we do an
analysis of the house position of the Sun or of the Moon in a sample (24
possible exposures), it is perfectly possible to find a statistically
significant p value that, with 24 exposures, is nevertheless due to
chance. A way around this problem of
multiple comparisons is to find a p value that is so low as to make irrelevant
this objection. This was an important
part of the Gauquelins’ approach – in addition to a sound study design. They have proven, for instance, a
statistical association between the position of Mars and Olympic
champions. This approach, however, is
dependent o n both the strength of the association and the sample size. A p-value does not give us any information
as to which of these two components is mainly responsible for the statistical
significance.
What do I mean by “strength of association”? the Gauquelins found that, among sports
champions, scientists, actors and writers, 2,286 out of 8,737 (=26.2%) with typical sports champion’s personality
traits had Mars in the sectors following the horizon and the meridian, whereas 20.4% was the theoretical frequency
(see figure 1). The p value for this finding is well below 0.000001, that is
less than 1 in a million, of being attributable to chance. However the average “risk” of having typical
sports champion’s personality traits (the outcome), if one is born with this
“Mars effect” (the exposure), is approximated by the formula:
(2286/*8737-2296) x (100-20.4)/20.4 = 1.38
An astrologer counseling an individual with the Mars effect
can tell him/her that (s)he is approximately 0.4 times more likely to have
sports champion’s personality traits than others, a result of 1.0 being the
baseline of no special Marsian personality. It would be important to test gender differences with Mars involved,
however (see Becerra, 1986).
This 1.4 is what I refer to as “the strength of the
association.” It is the application of population-based research to the
individual. For comparison purposes,
the strength of association between smoking and lung cancer is in the order of
10; between cholesterol and coronary heart disease around 2.
Clinicians many times make the distinction between clinical
significance and statistical significance. If there is a very low strength of association (say 1.1), it is still
possible to find statistically significant results using a very large sample
size. Such results would be important
from the theoretical point of view of proving the existence of the effect, but
would be clinically useless because the clinician would not be able to use that
information on an individual basis to advise a patient. For instance, if the Gauquelins had obtained
a sample of 100 cases instead of the 8,737 used, they would not have come up
with statistically significant results. Therefore, for factors having a low
strength of association (usually defined as inferior to 1.5), the statistical
significance of any finding is almost solely depended on the sample size. Very
small p values, as the ones reported by the Gauquelins, while important to
prove an effect, are not “clinically” significant enough by themselves to aid
the astrologer in advising an individual. On the other hand, it is possible to have a strength of association of
10, and still the results need not be statistically significant. This is due to
small sample size that precludes adequate assessment of the significance of the
finding.
Conclusion
Therefore, it is important to report both the strength
of association and the statistical significance in any study. If the advent of
computers is going to dramatically change the practice of the science and art
of Astrology, research should be understood as something with both practical
and theoretical utility. A counseling
astrologer, like a physician, has the responsibility to advise individuals in
addition to promoting the cause of science. A strength of association around 2 should be used as a criterion of
practical usefulness at the individual level, and should be enough to guard us
against the temptation of searching for significant p-values by just increasing
the sample size and the cost of a study. |