Welche Gedanken müssen vor der Daten-Auswertung erfolgen?

Häufig treffen Personen, die in Daten-Auswertung geringe Erfahrungen haben, die Entscheidung, Ihre Daten einfach mal drauf los auszuwerten. In diesem Zusammenhang können wir als professionelle Statistik-Beratung nur davon abraten. Vor der Daten-Auswertung sollten sich einige Gedanken um die Auswertungsziele, dem inhaltlichen Hintergrund zur Auswertung etc. gemacht werden. Faraway (2005) hat hierzu einige Punkte in der Einführung seines Buches angesprochen, die sich Laien, aber auch eine professionelle Statistik-Beratung, zu Herzen nehmen sollten. Wir haben uns entschlossen, jene Empfehlungen wörtlich zu zitieren:

"Before You Start

Statistics starts with a Problem, proceeds with the collection of data, continues with the data Analysis and finisches with conclusion. It is a common mistake of inexperienced statisticians to plunge into a complex Analysis without paying attention to what the objectives are or even wether the data are appropriate for the proposed Analysis. Look before you Leap!

The formulation of a Problem is often more essential than ist solution which may be merely a matter of mathematical or experimental skill. Albert Einstein

  1. Understand the physical Background. Statisticians often work in collaboration with others and Need to understand something about the subject area. Regard this as an opportunity to learn something new rather than a chore.
  2. Understand the objective. Again, often you will be worjing with a collaborator who may not be clear about what objectives are. Beware of "fisching expedtions" - if you look hard enough, you will almost always find something, but that something are all that be a coincidence.
  3. Make sure you know what the client wants. You can often do quite different analyses on the same dataset. Sometimes statisticians perform an analysis  far more complicated than the Client really needed. You may find that simple descriptive statistics are all that are needed.
  4. Put the Problem into statistical Terms. This is a challenging step and where irreparable Errors are sometimes made. Once the Problem is translated into the language of statistics, the solution is often Routine. Difficulties with this step explain why artificial intelligence techniques have yet to make much Impact in application to statistics. Defining the Problem is hard to program.

That a statistical method can read in and process the data is not enough. The results of an inapt Analysis may be meaningless.

It is also important to understand how data were collected.

  • Are the data observational or experimental? Are the data a sample of convenience or were they obtained via a designedd sample Survey. How the data were collected has a crucial Impact on what conclusinos can be made.
  • Is there nonresponse? The data you do not see may be just as important as the data you do see.
  • Are there missing values? This is a common Problem that is  troublesome and time consuming to handle.
  • How are the data coded? In particular, how are the qualitative variables represented?
  • What are the Units of measurement?
  • Beware of data entry Errors and other corruption of the data. This Problem is all too comon - almost a certainty in any dataset of at least moderate size. Perfom some data sanity checks."


Faraway, J. J. (2005): Linear Models with R, 1. Auflage, Chapman & Hall, Boca Raton.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.