Vol.1-2/2002

Sense and nonsense of the measurement of mental strain and workload

On some problems of assessing, measuring and evaluating mental workload

On the reliability, validity, sensitivity and diagnosticity of heart rate and heart rate variability indices for measuring mental work strain

Determination of the condition related reliability of the effort scale

Examination of the sensitivity of nasa-tlx and nasa-tlx-zeis in a flight control task

Multivariate analysis of the condition related reliability of strain indicators

Diagnostic of job demands and cumulating strain consequences in call center jobs

Mental workload and it’s effects: The position of the German Federal Institute for Occupational Safety and Health

Analysis of human errors in order-picking systems with laboratory experiments

The load on the lumbar spine during vertical lifting and horizontal placing of objects

 

Vol.3/2002 Study in spatial knowledge acquisition large-scale real world environments by training in virtual environments

The effectiveness of various virtual reality elements in incident training courses

The application of a questionnaire (D-MEQ) for the identification of the diurnal type as criterion for shiftwork

Hand-Eye-Coordination and trraining during simulated monitor endoskopy

Study of human-seat interface pressure distribution depending on seat type, postures, and anthropometric characteristics

 

Vol.5/2002 Does successful product design have particular process characteristics?

Cause and Effect Analysis of the Leadership-Employee Relationship

The Job-Exposure-Matrix as a tool for work-related analysis of morbidity data from Health Insurance Institutions

The everlasting dream of the abolition of the assembly line: some thoughts on the past, present and future of a classical model of work organization

 

Sense and nonsense of the measurement of mental strain and workload (Vol. 1-2/2002)

Author: Heinz Schmidtke

Keywords:·strain, stress, testing methods, criteria for evaluation

Summary: Assured knowledge in the field of mental strain and workload can be found only if the object of research is approachable by measurement. A measurement of mental or informational strain has been impossible up to now because of the lack of a frame of reference for the extremely diversity of tasks. Muscular strain can be measured in N or N/m and the workload in kJ or Watt. But we are helpless if we are asked to nominate or measure the strain factors behind the solution of a mathematical problem or a translation task. Even a clear distinction between physiological and psychological strain seems to be problematical.

But not only in the field of strain we have measurement problems. Also we find problems in the measurement of mental workload. This can be seen by the titles of the articles in this issue. The authors speak from "acquisition", "determination" and "stipulation", they are using terms which should conceal the inability of a measurement in a physical-technical sense.
This is not a criticism but it demonstrates the insight of the authors that we are far away from a real measurement.

If we try to measure mental workload we need a working hypothesis about the reaction of workload on the central nervous system and other physiological functions. The second step is the selection of measurement tools.
Here  we are between the devil and the deep blue sea. In the last hundred years nearly al physiological functions which could have a covariance with workload have been tested upon their aptitude. The results were overmodest.
Neither the heart rate nor the cardiac arrhythmia has supplied information which are in a strong relation to type and duration of the preceded informational task.
Also the attempts to extract indicators out of the complex functions of the EEG have been not very successful. The only information with high reliability which could be derived from the spontaneous EEG of healthy subjects was the differentiation between open or closed eyes! We believe in cheaper ways of getting this result.

The third step is the selection of tasks. Many tasks used in experiments look quite artificial compared with real tasks in industry or office environment. Which insight we get from the result of an experiment that shows that a workload indicator will react positive with increasing task complexity and test duration and the greatest ratio of variance can be ascribed to differences between test subjects? The outcome of many experiments in this area is equal to the use of common sense. Frequently researchers have found correlation between a deflection of a workload indicator and working time but to be afflicted with a lot of statistical uncertainty. Using a watch for measuring working time would lead to much more precise results!

The last step is the selection of test subjects. If we look in literature usually subjects in workload studies are students. They can be recruited more easily than employees from industry. But it has to be questioned if it is acceptable to generalise test results attained with students because their motivation and emotional status seems to be quite different to these of employees in industry and office work.

Research in the area under discussion can be only successful if the measurement methods are validated on a defined criterion. Further on it must be clear which criterion has to be selected. The criterion (e.g. reasonability or tolerability) must be defined explicit. The measurement methods must be sufficient sensitive and methods gather only single aspects of workload lead to wrong conclusions.

It makes sense if in basic research it will be searched for methods which can differentiate with high reliability between different types of mental workload. It makes also sense to define the validation criteria very thoroughly.
But it makes no sense as an effect of careless literature inquiry to warm up measuring methods which have shown no contributions to problem solving. As a matter of fact it seems to be unaccountable to derive from the here criticised research approaches standards on the tolerable human workload by informational tasks.

Finally the prevalence of project research forces the researcher to publish preliminary reports, progress reports, abstracts, and final reports. This increases the individual list of publications enormous. But who believes that 20 or 30 publications of an author per year an be based on serious research even if the same content will be printed in different journals with a slightly modified title? Because as a general rule only such research will be sponsored which has positive results in terms of the financier. This constraint can lead to omit unsuitable data - called as artefacts - or to invent data. This fraudulent falsification of data particularly in biomedical science has been subject-matter during the last years. But the increasing quantity of publications has another negative effect: it stimulates desinformation. Even in a narrow special field the researcher can read frequently only titles. Unfortunately titles describe in many cases the content consciously incomplete or wrong.
A citation on the basis of titles can be beneficial for the impact factor of a journal but does not stimulate information transfer within the scientific community. The evaluation of research results on the basis of the impact factor of the publishing journal might be proved as the largest slip-up within science of present time.

Practical Relevance: The rising flow of publications can not deceive thereover that in ergonomics presently no valide methods for measurement of mental strain and workload are available even if the authors in youthful enthusiasm frequently mistake evaluation for measurement.


On some problems of assessing, measuring and evaluating mental workload (Vol. 1-2/2002)

Author: Friedhelm Nachreiner

Keywords: mental stress, mental strain, measurement, psychometric criteria

Summary: Based on the increasing importance of mental workload, both as a component of total workload and legal or quasi legal requirements in the EU, instruments for assessing, measuring or evaluating mental workload are required which allow for an objective, reliable, valid, sensitive and diagnostic assessment of the intended object of measurement, i.e. mental stress, mental strain or the effects of mental strain in the operator, e.g. mental fatigue, monotony or satiation (as defined in ISO 10075). It is argued that it is most important to make clear distinctions between these intended objects of measurement, because blurring the distinction will lead to unreliable and invalid results. This is demonstrated by reference to studies using generalizability theory (in this issue), where it could be shown that some of the proposed indicators of mental workload, e.g. heart rate variability, are not able to differentiate different kinds (diagnoticity) and levels (sensitivity) of cognitive workload, but instead allow for a differentiation among operators according to their (habitual) level of arousal. It should thus be made clear whether the intention is to measure and differentiate between people (e.g. according to their strain) or working conditions (with regard to the workload they impose upon the operator). These intentions require different approaches of evaluating the psychometric properties of measurement instruments, as described in ISO/CD 10075-3, and it is argued, that a generalizability-theoretical approach can be used to solve these problems and for estimating the psychometric properties of an instrument.

It is further argued that no general measure of mental stress or mental strain can be developed, since both are not unidimensional concepts, varying on a dimension from low to high, but that only for components, differentiating specific aspects or components of mental stress or strain adequate measurement instruments can be developed, as has been done for some of the effects of mental strain in the operator. It does not make sense, to allow for a compensation of cognitive complexity by time restrictions, since these seem to put demands on different ressources, which of course may interact. And it does not make sense to compensate or simply not to discriminate between fatigue and monotony.

As a consequence measurement of mental workload would first of all require some more elaborated theoretical analyses of mental workload which should allow for the development of adequate measurement instruments, if accompanied by more sophisticated measurement approaches.

Practical Relevance: This discussion of problems relating to the measurement of mental workload is intended to encourage potential users of instruments for measuring mental workload to critically evaluate the suitability and usability of these instruments and to prevent an unreflected use of such instruments.


On the reliability, validity, sensitivity and diagnosticity of heart rate and heart rate variability indices for measuring mental work strain (Vol. 1-2/2002)

Authors: Peter Nickel, Karin Eilers, Liane Seehase, Friedhelm Nachreiner

Keywords: heart rate variability, mental workload, reliability, validity, sensitivity, diagnosticity

Summary: Mental workload is directly connected with impairing effects of work on the operator and with the development of the operator’s capabilities and skills. Mental workload therefore is a relevant criterion to evaluate the quality of work system design.
Consequently, legal regulations in the EU require such an evaluation of mental workload which calls for suitable methods for this purpose (Nachreiner 2002).

Due to their practical utility and relative simplicity in both application and subsequent interpretation, when compared with alternative psychophysiological techniques, heart rate (HR) and its derivates are currently the most often used methods for the assessment of mental workload. HR on the other hand is a parameter, among others, in the complex homeostatic cardiovascular system reflecting energetic, thermic, respiratory, emotional and hypothetically even cognitive processes. HRV measures are thus used in order to be able to decompose these processes which affect the cardiovascular regulation and to separate effects of mental workload. Currently, the 0.1 Hz component of HRV is considered an attractive, valid, and therefore nearly standard indicator of mental strain (Boucsein & Backs 2000; Mulder et al. 2000). However, based on a closer inspection of the relevant literature some severe doubts about the psychometric properties of this and other indicators can be raised (Wilson et al. 1995).

Therefore, the psychometric properties of different HR and HRV measures as indicators of mental strain and/or an indirect measure of mental stress imposed upon the operator by different tasks have been investigated systematically by four experimental studies, with this paper giving an overview over the results of these studies.

Using a generalizability theory approach various HR and HRV measures of ten participants during rest periods and subsequent vigilance task performance for about two hours with two different types of discrimination, three levels of discriminability at six different times of the day, were analyzed in a study by Eilers (1999). According to the results of the analysis of the rest periods these measures showed an acceptable reliability as indicators of person parameters. However, none of these measures showed an effect for type of task (and thus no diagnosticity) or level of difficulty (and thus no sensitivity), but significant effects for work vs. rest, time on task, and time of day only. These results are not in agreement with theoretical assumptions and do not correspond to results of performance or perceived effort data.

Thus there is no support for any validity of these measures for task related mental workload effects.

In the study by Lasner (1997) two signal detection tasks at two different levels of difficulty were presented for 40 minutes each to ten participants. The results for the 0.1 Hz component of HRV did not show effects for time on task, type of task (consistent vs.
varied mapping), or level of difficulty, whereas these variations were reflected in performance measures. Furthermore, an ANCOVA including a base line task as a covariate revealed a complete disappearance of variance for ‘level’ of workload. Variance in the HRV measure was completely due to interindividual, but not to intersituational (workload or mental stress) differences, thus there was again no evidence for sensitivity or diagnosticity of HRV for mental workload.

14 tasks (from the STRES battery; AGARD 1989) differing in type and level of workload with subsequent rest periods (5 minutes each) were performed by 14 participants in a repeated measurement design in the experiment by Nickel (2001). The ANOVA for the 0.1 component of HRV supported sensitivity for the discrimination between task vs. rest periods, but again showed no support for a more fine grained discrimination between different levels of difficulty within the same type of work task. Furthermore there was no support for any diagnosticity of this measure to discriminate different tasks and thus types of workload. These results were again inconsistent with with theoretical assumptions, performance or and perceived difficulty data.

Since it appeared that most of the tasks of this experiment - with one exception - might have been influenced by "pacing" effects the choice reaction task (machine-paced), the grammatical reasoning task in a self-paced and in a reconstructed machine-paced version (in order to induce different levels of pacing) were presented to ten participants in another experiment. ANOVA for the 0.1 Hz component showed an effect for pacing only but not for type of task, with the two tasks with comparable cognitive demands but different pacing conditions leading to different effects in the HRV measure. However, analyses of the performance data and the perceived difficulties showed the expected effect for different types of tasks.

Therefore, HRV might indicate emotional strain (or stress reactions) or general activation (through time pressure) rather than mental or especially cognitive strain. It is thus concluded that HR and HRV measures are neither valid, nor sensitive, nor diagnostic for the assessment of mental strain. Since these measures do not meet conventional psychometric requirements they should not be used in mental and especially cognitive workload evaluation.

Practical Relevance: For practical purposes, HR and HRV measures may indicate dysfunctional emotional strain or stress reactions. However, using these measures for evaluating mental workload of a work system may lead to aversive consequences when highly demanding tasks - without producing emotional or stress reactions - are evaluated as acceptable, whereas in fact they may lead to mental fatigue and thus to errors due to cognitive strain. An evaluation of the emotional strain only is thus definitely not sufficient for an ergonomic evaluation of work systems; in fact it might be misleading and dangerous, since errors with severe consequences might otherwise result.


Determination of the condition related reliability of the effort scale (Vol. 1-2/2002)

Author: Martin Schütte

Keywords: Generalizability theory, load combinations, effort scale

Summary: Increasing use of modern technologies has led to work tasks primarily requiring the correct absorption, processing and conversion of information. Due to this, psychological, especially mental demands are most relevant for resulting workload. At the same time legal regulations (direction for machines or for VDU work) postulate the determination and evaluation of psychological demands. For measurement of mental workload, questionnaires are often used. Nevertheless, selection of an adequate method is difficult because in most cases no data concerning the range of application of an instrument exist. Therefore, two new criteria are proposed describing the condition related precision of a measurement procedure. The diagnosticity indicates whether a method allows to discriminate between different kinds of workload. The criterion of sensitivity specifies in which way an instrument reacts to different levels of workload. Both criteria facilitate the choice of an instrument.  Accordingly in the sphere of standardization, the intention exists to postulate that every measurement instrument should be analysed with respect to its diagnosticity and sensitivity.

The aim of the present experiment was to examine the diagnosticity and sensitivity of the effort scale. For creation of load conditions, a Sternberg task was used allowing the manipulation of perceptual as well as memory related demands. For this purpose, sequences of letters and numbers were used differing in length and recognizability.
The task started by learning a sequence. This was followed up with the presentation of further sequences of the same length and recognizability. The test persons (N=19) had to decide whether the actually presented sequence corresponded completely with the previously learned one. They accomplished all conditions resulting from the combination of both, the length and recognizability of items. After completion of all tasks belonging to an experimental condition the test persons had to estimate their experienced effort.

Data analysis was based on the generalizability theory. Generalizability theory is based on analysis of variance and partitions variance into sources corresponding to systematic variance among the objects of measurement (in this case visual and memory load), to multiple error sources and to their interactions. The generalizability coefficient (G-coefficient) is a reliability like parameter indicating how well an observed score is likely to locate conditions relative to other members of the corresponding population (relative G-coefficient), or how well an observed score is likely to locate conditions without regard to other members of the corresponding population (absolute G-coefficient). The results of the accomplished 3-factorial analysis of variance show that the variance component for persons accounts for about 48,6 % of the variance in the scores. Furthermore, the variance component for the interaction between memory load, visual load and persons is noticeable (40,6 %) pointing to a reciprocal influence of all three variables. Accordingly, two main sources of measurement error exist. One of them results from the individuality of persons and the other from the specific effects of all experimental factors. The  variance components for load conditions are much smaller. Starting from the explained proportion of variance, the effort scale reacts only to visual load. The calculated G-coefficients reflect these results.
For memory load the G-coefficient takes a value of 0,28 for relative decisions and a value of 0,06 for absolute decisions. Therefore the effort scale insufficiently differentiates levels of memory load. Considering visual load the relative G-coefficient - amounting to 0,79 – does not meet to the usual limit of 0,80. Consequently, the scale is not able to reflect differences in levels of visual load with sufficient precision. The confidence interval demonstrates that the scale only discriminates between low and high levels of visual load. Reliable statements about the absolute level of the experienced effort are also not possible, as the corresponding G-coefficient taking a value of 0,50 reveals. Nevertheless, the relative G-coefficient for visual load reaches a level justifying further examinations concerning the expenditure necessary to achieve an increase of the coefficient. The results show that a relative G- coefficient of 0,85 can be expected if the database is raised by 12 %. A coefficient of 0,90 presupposes doubling the number of observations and a coefficient of 0,95 requires a dataset four times greater than the actual one.

All together the effort scale is only usable for the determination of strain resulting from visual load. In this case the scale can be merely utilized for screening purposes and for relative decisions. Because the scale exclusively differentiates between low and high workload conditions the range of application is presumably limited due to the fact that in practice medium load levels probably predominate. Beyond that, the applicability of the scale is further restricted since the sample sizes necessary for the enhancement of reliability are hardly attainable in industry.
Nevertheless it seems reasonable to explore the differences between persons, in more detail. The interindividual deviations in the effort scores are possibly caused by variations in the initial level of strain or by differences in existing abilities relevant for task execution. It cannot be excluded that the mentioned variables significantly contribute to error variance. Knowledge of the effect of such factors lead to indications of conditions to be to consider additionally during strain measurement. Beside this, it could be expected that the sensitivity of the effort scale will improve.

Practical Relevance: The present experiment gives information on the practicability of the effort scale. The results show that the effort scale is only usable for the determination of strain resulting from visual load. In this case, the instrument can be merely used for screening purposes and for relative decisions.


Examination of the sensitivity of nasa-tlx and nasa-tlx-zeis in a flight control task (Vol. 1-2/2002)

Authors: Claudius Pfendler und Martin Schütte

Keywords: Workload Measurement, Sensitivity, Generalizability Theory, Flight Control Task

Summary: Flight control tasks have high safety, sensumotor, and cognitive demands as well as time pressure. Therefore, there is a special need for suitable workload measurement methods. One method having been used successfully with flight control tasks is the NASA Task Load Index (NASA-TLX). As the German translation of NASA-TLX has not been completely satisfying in respect to reliability as yet it was tried to overcome these problems by combining it with the so called Sequential Judgement Scale (ZEIS). The integration of the hierarchical decision structure of ZEIS with NASA-TLX seemed to be promising to improve the measurement properties of the scale according to the literature. The new method called NASA-TLX-ZEIS and NASA-TLX were evaluated empirically in a simulated flight control task in respect to sensitivity using generalizability theory. Sensitivity is an important criterion for workload measurement methods indicating the degree a measure can differentiate between different levels of input load. Generalizability theory is based on analysis of variance which also gives estimates of variance components.
They inform on strength of effect of the independent variables of the experiment including the variables regarded as error factors. At the same time the generalizability coefficient as an indicator of sensitivity shows on a scale from 0 to 1 the extent to which a score allows to generalize accurately to some wider set of situations. The relative generalizability coefficient (r 2Rel) gives information concerning sensitivity of scores with relative decisions (with regard to other values). The absolute generalizability coefficient (r 2Abs) informs on sensitivity with absolute decisions (without regard to other values).

The flight control task was performed in a fixed base simulator with a sidestick. The outside view was projected on a screen in front of the subject showing the flight channel the subject had to follow as accurately as possible.
Input load was varied in five levels by increasing the speed and the amount of the course and altitude changes with task difficulty. As the 20 subjects were no pilots they had to be trained first. After the training each task was presented three times to each subject, half of the subjects using NASA-TLX the other NASA-TLX-ZEIS.

As a precondition for sensitivity analysis it was tested first on the basis of lateral stick activity whether the five task levels were significantly different. This is why the most difficult task was eliminated from further data analysis. Furthermore it could be shown, that the two samples did not differentiate significantly in respect to lateral stick activity.

Workload data were analysed with generalizability theory with two different approaches.
The first analysis based on averaging the workload scores of NASA-TLX and NASA-TLX-ZEIS used a complete random model with the factors input load level of the flight control task, time of measurement, method (version of questionnaire), and persons.
The estimated variance components of these factors show that the factor method and the respective interactions do not have a relevant contribution to variance. Therefore, there is a high consistency in ranking of NASA-TLX and NASA-TLX-ZEIS scores concerning load level and time of measurement. Nevertheless not all interactions can be estimated due to nested factors which restricts a comparison of both methods. Therefore a separate analysis of NASA-TLX and NASA-TLX-ZEIS seemed to be recommended.

In this respect NASA-TLX WWL-scores document a continuous workload increase with load level and a decrease with time of measurement due to learning effects. Analysis of variance according to a random model shows an explained variance of 46% for input load, 33% for persons, and 7% for time of measurement. The generalizability coefficients reach values of 0,98 (r 2Rel) and 0,87 (r 2Abs). According to the confidence intervals NASA-TLX allows to discriminate two load levels namely the lowest and the highest one.

NASA-TLX-ZEIS results analysed according to the same model demonstrate a similar increase in workload with difficulty level. On the other side there is a higher explained variance for persons (41%) than for input load (29%) and learning effects are not so well reflected (1%). Nevertheless generalizability coefficients reach values of 0,95 (r 2Rel) and 0,82 (r 2Abs). The calculated confidence interval for the WWL-Scores shows that the NASA-TLX-ZEIS also differentiates only between low and high workload conditions.
In contrast to NASA-TLX the explained variance for persons is higher than for load level. This result documents that it was not possible to improve measurement characteristics with NASA-TLX-ZEIS. The hypothesis is tentatively formulated that these shortcomings are caused by the instructions.

Considering both methods the results justify the conclusion that the original form NASA-TLX is more adequate for workload measurement if a screening is intended. Both instruments have the problem that they are based on summated workload scores which can compensate high with low scores so that ergonomic design shortcomings can be left undetected when the subscale values are ignored. Furthermore, the range of the subscale weights may be too small to represent the experienced differences and the WWL-Score may include different subscales with different subjects. To solve these problems multivariate generalizability theory is recommended.

Practical Relevance: Practical measurement of workload requires a sufficient sensitivity of methods used. The results show that NASA-TLX and NASA-TLX-ZEIS can be applied for workload measurement in flight control tasks for screening purposes. However, NASA-TLX should be preferred because of its higher sensitivity.


Multivariate analysis of the condition related reliability of strain indicators (Vol. 1-2/2002)

Authors: Martin Schütte und Peter Nickel

Keywords: Mental load, heart rate variability, performance quality, performance quantity, multivariate generalizability theory

Summary: Although the number of workplaces characterized by psychological or mental demands increases, a uniform theoretical model does not exist. However, concepts related to activation and attention are favoured, especially those considering both aspects simultaneously because they allow a better description of the human processing systems and their degree of participation in task execution. Accordingly the measurement of human related effects resulting from workload has to take into account different criterions.
Consequently a multivariat measurement procedure should be used. Here one important aspect is the reliability of profiles, especially their condition related sensitivity. For testing the measurement characteristics of such instruments multivariate generalizability theory (GT) is applicable. This method permits the determination of those experimental conditions contributing to the covariations appearing between dependent variables. A relative and absolute G-coefficient can be calculated describing the correlation between universal and observed scores. The weights of the strain indicators on the extracted canonical variables give indications to those indicators relevant for the interpretation of a strain profile.

The study analyses the reliability of a measurement concept taking into consideration psychophysiological as well as performance related parameters. For creation of load conditions the AGARD-STRES was chosen consisting of 14 tasks from which the five reaction time tasks (reaction time basic, coded, inverse, double, uncertainty) and three more complex tasks (mathematical and spatial processing, grammatical reasoning) were selected. As dependent variables the execution times, the percentage of incorrect tasks solutions and the 0,1 Hz component of the heart rate variability (HRV) were registered.
The study was carried out with 14 employees of the University of Oldenburg. The persons performed the tasks twice in order to obtain information on the stability of the scores. At first all dependent variables were analysed separately according to a three factorial ANOVA (random effect model). The results for HRV show that the variance component for persons accounts for about 65 % of total variance. Furthermore, the component for the interaction between load conditions, replications and persons is noticeable (13 %) pointing to a reciprocal influence of all three factors. The component for replication is insignificant indicating that the scores are stable over time. The load conditions have only a small effect (6 %). The relative G-coefficient amounts to 0,85. Therefore, the HRV differentiates between load conditions with satisfactory precision. The absolute G-coefficient takes a value of 0,51 indicating that reliable statements concerning the absolute level of HRV are not possible. Task execution times are strongly influenced by load conditions (76 %).
The replications show no substantial effect. The variance component for persons takes a value of 9 %. Significant interactions do not occur. The high condition related precision of task execution times do not surprise, since the reaction time tasks force persons into fast reactions, whereas the complex tasks set lower standards concerning time demands. The number of failures is correlated with load conditions (33 %). The factor persons has also an important effect (17 %). Replications show no substantial effect. Certainly the interaction between load conditions and persons produced a remarkable effect (16 %) indicating that the number of failures is specific for persons and tasks. The three way interaction (29 %) represents a further source of measurement error. For the G-coefficients values of 0,92 (relative) and 0,84 (absolute) resulted. Accordingly, the number of failures permit a precise discrimination of load conditions. The multivariate analyses results show that load conditions contribute to covariances since there is a relationship between the execution times and HRV respectively the number of failures. Considering the reaction time tasks the HRV takes small values justifying the assumption that in this variable primarily time pressure is manifested. The values for the grammatical reasoning task are different indicating not only that time pressure is here less dominant but also that the covariation depends heavily on this task. The concordance of execution times and the number of failures is better. Tasks having shorter execution times are characterized by a smaller number of failures and vice versa.
Based on the covariances multivariate G-coefficients were calculated for the factor load conditions. Doing this for relative decisions two canonical variables are characterized by a relative G-coefficient taking a value above 0,90.
Considering the weights of the variables the first canonical variable represents task execution times. For the second canonical variable HRV and the numbers of failures are relevant. For absolute decisions again two canonical variables reach a high G-coefficient of 0,97 respectively 0,84. The first canonical variable can be described by the variable task execution times and the second by the variable number of failures.

The results are surprising because statements concerning differences in load conditions are possible using only task execution times. The data do not allow unequivocal inferences to the causes of this result, but the covariance between HRV and execution times justifies the assumption that as an implicit factor time pressure may be effective. Furthermore it should be investigated whether it is possible to reduce the interindividual variance of HRV.
Generally multivariate GT supplies not only information concerning the reliability of a set of dependent variables but also gives indications to those conditions contributing to error variance. Both aspects facilitate the selection and application of instruments. In this study the weights of the variables were determined empirically. This is adequate if there is no precise knowledge about the relevance of the particular variables. Multivariate GT can also handle weights defined a priori. Therefore it is possible to prove the reliability of the same instrument for various applications requiring different weighting schemes.

Practical Relevance: The study describes how the multivariate G-theory can be used for the determination of the reliability of an instrument taking into account several dependent variables. Especially the experiment supplies information in which way this procedure is applicable by investigating the condition related measurement characteristics of such instruments.


Diagnostic of job demands and cumulating strain consequences in call center jobs (Vol. 1-2/2002)

Authors: Peter Richter, Uwe Debitz and Frank Schulze

Keywords: load-strain measurement, task analysis, strain consequences, Call Center jobs

Summary: Health promotion in a permanently changing work environment requires a prospectic design of flexible task sequences. Further developments of evaluation- and design-methods based on a load-strain-coping model are necessary. This paper introduces an application of the objective analysis instrument TDS/REBA which is used to examine highly interactive job demands as they are found in Call Centers. This method combined with the BMS questionnaires technique is designed to reveal strain consequences like fatigue and monotony. Furthermore, the paper presents the factorised structure of monopolar rating scales, the validity of which is tested to estimate short-term (hourly) strain consequences (oriented towards ISO DIN EN 10 075-1). While fatigue and monotony clearly load on different factors, satiation and stress are traceable to one joint factor.
Weak, but consistent correlation between the BMS and the rating-scales are found. The presented approach will be advanced and tested on representative samples and other work conditions.

Negative effects of the tayloristic designed job structure leads too significantly increased monotony and psychic satiation. These negative strain consequences occur however not in the first days of the week, but only on the fourth and fifth day of a continuous weekly work in normal layer. Enriched task structures (measurement 2000) show no strain changes and indicate during the week stabile well-being. This significant increase of monotony and satiation, not however of fatigue and stress, under condition of reduced job variety can be interpreted as a cumulation of insufficiently compensated underload.

In this study 20 monopolar rating scales were used to a short-term diagnostics of strain in hourly measurements with n=156 Agents. The factorial structure of a reduced terms of 12 items proved to be over the eight entered hours as extraordinarily stable. The similarity coefficient Q = 98 between the 1. and 8. working hour underlined this structure stability.
Therefore the factor structure of the entire data set could be used. All factors indicate a satisfactory high internal consistency (Cronbach). Two factors can be interpreted unique by fatigue and monotony. One factor indicates a mixed structure from the phenomenology of psychic satiation and stress according the theory of Lazarus.

It appears quite as meaningful to define this factor satiation/stress. The first factor is related to the concept of the positive affectivity and reflects an positive engagement and well-being during the work.

Correlations with the dimensions of the interval-scaled questionnaire BMS support this factor interpretation, even if it becomes clear that the phenomenological equivalence in this study cannot satisfy. The correlations between the two procedures show the overlap of strain experiences. That corresponds with often reported intercorrelations between the BMS scales, which reduce their diagnostic value for diagnosticity. That concerns first at all the concept of psychic satiation, which overlapped with its emotional "interferences" with the phenomenology of fatigue and monotony strongly. These results underline the necessity for the methodological requirement of a multivariate diagnostic of performance/behaviour criteria, perceived strain and psychophysiological activation. Only their relation to objective profiles of job demands can enable a design-oriented analysis and strain evaluation.

The weak correlations between the perceived mental effort (Zijlstra & van Doorn 1985) and the at the same time recorded perceived strain, speaks for the fact that with both procedures different psychological states are identified. The perceived mental effort is regarded as phenomenological correspondence of the psychological costs of the regulation of actions. These costs increase, as the correlations show, following fatigue and stress consequences and reduce the commitment to continue to work. However the correlations are so weak that no generalisation can be made. Further investigations in other technologies are necessary.

Practical Relevance: A practicable method for the estimate of short-term (hourly) strain consequences is described. By means of a Call Center design study is shown that the diagnostic of strain effects should take a week to be able to identify cumulating effects of satiation and monotony found under conditions with reduced task content.


Mental workload and it’s effects: The position of the German Federal Institute for Occupational Safety and Health (Vol. 1-2/2002)

Authors: Peter Ullsperger and Armin Windel

Keywords: mental workload, occupational safety and health, ergonomic design principles, standardisation

Summary: The assessment of mental stress and strain and the ergonomic design of work-systems are essential for avoiding negative consequences not only on safety and health of the employees but also on their performance. As a consequence one of the most important activities of the Federal Institute for Occupational Safety and Health centres on mental workload. In this context the ISO 10075 is seen as a basis giving guidance on general terms and definitions and providing information for choosing appropriate methods depending on the purpose of the assessment.


Analysis of human errors in order-picking systems with laboratory experiments (Vol. 1-2/2002)

Author: Andreas Lolling

Keywords: order-picking, human reliability, human error, pick error, human error probability, design measures

Summary: Objective: The aim of the laboratory experiments is the investigation of selected influence factors with reference to human errors in order-picking systems.

Order-picking: Order-picking is the collecting of articles of an assortment of different articles in a warehouse or order-picking system in order to fulfill customer orders. Due to the flexibility of humans, order-picking is done manually in most companies. Therefore the pick error has a high effect on the quality which can be achieved when customer orders are order-picked.

Pick errors and their classification: A human error in order-picking systems, called pick error, can occur in four different ways: type error, quantity error, omission error, state error.

Examined influence factors and results: The laboratory experiments were carried out in an order-picking system for small parts which was designed for this purpose. Eight influence factors on the error rate with two different states each are examined.

For the statistical analysis a poisson-regression model is selected; the analysis is done full-factorial.The descriptive analysis provides the following differences between the two different states of each influence factor (in percent; the first state with the smaller error rate)

· Work flow organisation (serial and parallel order-picking; 24 %)

· Pick list design (structured and unstructured pick list; 55 %)

· Identification of the storage location (structured and unstructured identification number; 39 %)

· Phase (phase 2 and phase 1; 39 %)

· Presentation of information (pick list and mobile data terminal; 150 %)

· Control (with and without article name; 103 %)

· Type of removal (in order and chaotic; 7 %)

· Payment (bonus payment and time payment; 14 %)

The descriptive statistical result for the "presentation of information" shows a difference of 150 % in favour of the mobile data terminal. This result is supported by the poisson-regression that shows a significance for the level of ? at 0,2 % (the test was designed to prove significance for a level of ? lower than 5 %). Significance at the 5%-level could also be proven for the influence factors "pick list design" (3,4 %-level) and "phase" (1,9 %-level), but not for "control", although the descriptive statistics show a difference of more than 100 % between the two states. Despite the missing statistical significance for the factor control and the other influence factors, the descriptive percentages can be used to identify potentials in reducing the error rate.

Design measures: The use of mobile data terminals instead of pick lists has the highest potential of the examined influence factors to reduce the error rate in order-picking systems. If such a technical equipment is introduced in an order-picking system, a redesign of the work flow process should be implemented as well in order to adjust the work flow to the new technique.

If not already done, the employees should be given the possibility to control the removal of articles by providing the article names on the pick list or display of the mobile data terminal and at the storage location. This measure can be implemented with little investment.

Other effective measures which do not require large expenses are the structuring of the information on pick lists and in article or storage identification numbers. Some design rules are the combined use of numbers and letters for identification numbers, the structuring of information with hyphens or spaces, the use of not more than five digits in one group, the arrangement of related information in defined areas, and the application of a minimal writing size [mm] which equals the reading distance [mm] divided by 200.

Conclusion: If order-picking systems show error rates which are above average, it is often possible to introduce simple measures for error reduction without high investments. If technical aids are introduced which are supposed to support the employees (such as mobile data terminals), it is often necessary to redesign the work flow to adjust it to the new technique and to train the employees in the use of the technique. Apart from the potential of mobile data terminals or other technical equipment to reduce the error rate, in many small and medium sized companies manual order-picking with just a paper pick list is the most economical solution without major investments.

Practical Relevance: Due to the flexibility of humans, order-picking is done manually in most companies. Therefore the pick error has a high effect on the quality which can be achieved when customer orders are order- icked. Since it is necessary to know the decisive influence factors and their consequences in order to reduce pick-errors effectively, the analysis carried out helps to evaluate some of the influence factors and to implement design measures.


The load on the lumbar spine during vertical lifting and horizontal placing of objects (Vol. 1-2/2002)

Authors: Matthias Jäger, Rainer Göllner, Claus Jordan, Andreas Theilmeier and Alwin Luttmann

Keywords: lifting, placing, lumbar load, biomechanics, assessment criteria

Summary: Manual materials handling is unavoidable in solving logistical problems such as the commissioning of goods. Considering the resultant stress on the working persons, in particular the corresponding mechanical load on the lumbar spine should be quantified and assessed with respect to possible overloadings, since for the lower back increased disease frequency and absenteeism exist. In this study, systematic load quantifications are provided for two typical manipulation tasks, vertical lifting and horizontal placing.

As, for ethical reasons, invasive measurement of mechanical indicators of the load on the lumbar spine cannot be performed in ergonomic investigations, the analyses of this paper are based on biomechanical model calculations using the 3-D dynamic software tool The Dortmunder which was formerly developed at the Institute for Occupational Physiology at the University of Dortmund. With the help of this system of biomechanical modellings of skeletal and muscular structures, movement and intra-abdominal pressure effects, lumbar load indicators – such as bending and torsional moments, compressive and shear forces, or pressure at the intervertebral discs – can be calculated for a wide variety of occupational manual materials handling tasks. The handling task under study is described regarding spatial orientation of the body segments, spinal curvature, position and distribution of the handled objects, action forces as well as the modifications of those input data during task execution, i.e. during the movement of body and load.

Vertical lifting was analysed for two-handed operations with symmetrical body and load configurations only. Grasp positions were assumed in floor or knee joint height, whereas load release was proposed in knee, hip, abdominal or shoulder height. Horizontal positions of the lifted objects were either close or far from the body. Further variation of object weight (0 to 40 kg) and task performance velocity (duration1 to 2 s) and inclusion or exclusion of inertial effects (dynamic vs. static analysis) lead to lumbar load quantities for approx. 200 different typical lifting tasks. Horizontal placing of objects was studied for two-handed handlings performed in abdominal or hip height, i.e. object transferring in an upright or almost upright posture. Five horizontal paths with an initially more lateral position and finally a more medial one were assumed. Varying task conditions resulted in quantified lumbar load for 60 different typical placing tasks. For both vertical lifting and horizontal placing, time courses for predicted disc compression are exemplarily introduced to show the principal temporal behaviour and effect of main influencing factors on lumbar load. As characteristics usable for later work design analyses, the maximum values for disc compression and shear were tabled for all studied lifting and placing tasks.

The so-called Dortmund Recommendations, based on structural strength of lumbar spine elements, permits the evaluation of manipulation tasks with regard to mechanical overload. As a typical application of the provided load values for single actions, the cumulated spinal load is described for a commissioning task, as an example for a more complex activity sequence, using dose models.

Practical Relevance: Overloading the spine is one of the main reasons for diseases, on the one hand, and for absenteeism, on the other hand. For the identification of working situations causing a too high spinal load, quantitative measures with regard to load level during typical handling activities as well as assessment criteria are provided. From the preventive point of view, such overloadings can thus be avoided in the future.

 


Vol. 3/2002

Study in spatial knowledge acquisition large-scale real world environments by training in virtual environments (Vol. 3/2002)

Authors: Dirk Schlender, Olaf H. Peters

Keywords: localisation, navigation, environnement virtuel

Summary: Acquisition of spatial knowledge in large-scale environments is usually based on the exploration of the "real world". If this is not possible, other training methods must be considered. In addition to maps or route descriptions virtual environments present new possibilities to acquire knowledge. This study presents an experiment to examine how effective virtual environments can be used to obtain spatial knowledge of a specific largescale scenery. Our virtual environment generating equipment consisted of a graphics workstation, a binocular head-mounted display, a head-tracking system and a MIDI-based sound system. The generated environment contained a complex path network of 119 trail segments in a large-wooded area. The size of the database as well of the landscape was approximately 4 km x 3 km and all objects in the database were shaded with photo-realistic textures. The virtual environment training was intended to allow the participants to become familiar with the actual landscape. Participants were passively moved through the virtual environment with a constant velocity of 40 km/h, but the user´s field of view could be controlled by own physical head rotation. There was no location in the environment, to show an overview with all the paths. The duration of training was 30 minutes. Thirty-two subjects were randomly assigned to two training and two testing conditions: the training conditions were a) virtual environment training and b) no virtual environment training. During the test phase we compared one group equipped with an additional map and one group without a map. The supplied map contained the position of the starting point and four target points, but no information about the road network. The data of navigation performance was taken in the real landscape (large wooded area). Subjects in individual sessions were to use a bicycle due to the lengths of travelling distance. There was a time limit of 40 minutes for the complete navigation test.
We performed statistical analyses on the number of reached target points and on the distances to reach the first target point. The results from both training conditions proved to be nearly the same. The training in the simulator yielded no better results. Independent of training conditions, navigation performance was better with than without map. After the test in the real world participants in all groups were required to draw the travelled route into a sketch map. The results of statistical analyses showed no correlation between test conditions and the ability to sketch the path. This proves that after the experiment the participants showed no substantial differences in spatial knowledge. For the future, we are planning further investigations to compare different concepts for long-time training in large virtual worlds for a better understanding of the effect of different training methods and training time.

Practical Relevance: This research tested future applications for virtual environments in private, industrial and military tasks. Further on these findings will help to design future telematics and navigation systems.

 

The effectiveness of various virtual reality elements in incident training courses (Vol. 3/2002)

Author: Ludwig Hub

Keywords: Process Safety, Incident Training, Process Simulation, Virtual Reality

Summary: The technique of dynamic simulation is increasingly used in teaching, instruction and incident training of the plant personnel in the chemical industry. The basic task of the simulation is to calculate the time dependent course of all relevant variables that characterise the real process. For the efficient training of the plant personnel an additio-nal feature of the simulation becomes important, namely, the presentation of the results. It was found that the training efficiency can be significantly increased, when working with the simulator reproduces the daily experience of the trainee with his real process. An ideal training simulation system confronts the trainee on the computer screen with virtual reality, which in relevant aspects is indistinguishable from the appearance of his working place and from the behaviour of the plant he is in charge of. However, the development of a simulator that provides true and detailed virtual reality for a large number of different chemical processes is a demanding and expensive task. There are many facets of virtual reality, such as the depth and accuracy of the mathematical model used, the visual appearance of the control panel, or the possibility of a spatial view from the angle of a moving observer. For practical applications it was necessary to investigate, which features of virtual reality are most important for the training of plant personnel. Based on the experiences with the simulation system ISIS, the impact of various features of the simulator on the efficiency of the training was evaluated. It was found that the choice of the simulated process, the visual representation of dynamic variables and the response of the model to the actions of the trainee are most influential. The mathematical model, chosen for the training of plant operators should be generally identical with the process they are trained for.
Using the same names of materials, the same basic properties of the equipment and having similar process behaviour shortens the time required for initial familiarisation of the simulator. It was also observed that even teaching some phenomena that are common to many different chemical reactions (e.g. the conditions that trigger a runaway reaction) is much more efficient when presented in terms of familiar circumstances. Thus, the participants displayed significantly increased attention when confronted with a situation that they considered to be a potential problem of "their" own process, compared with the presentation of the same situation on a "text-book" example or on an unknown reaction. The repetition of the same instructional material at a later time indicated also that the acquired knowledge was obviously remembered much better, compared to classical teaching methods.
For the development of a suitable mathematical model that represents the reality of the participants it is not necessary to elaborate every detail of the process; the accuracy of the different parts can be adjusted dependent on the goals of the training. Higher educated personnel, such as safety specialists or plant managers, showed higher flexibility with respect to the chosen process. When presented by some "foreign" reaction, they could more easily project the newly acquired knowledge onto their process.
For the visual representation of the control panel great attention must be paid to the instruments that capture the time dependent variables, e.g. recorders. Many plant operators observe the form of the recorded curves and use them as an important criterion to determine the state of the plant. Already the application of different axis scales can cause confusion. The shape of the simulated instruments, on the other hand, is only of insignificant importance. Use of round instead of rectangular meters, digital or analogue displays are readily accepted. Often similar difficulties are displayed by supervisors and plant managers when they evaluate time dependent variables having different scales. Also important for efficient training are the expected responses of the simulator to the actions of the plant operator. If the model behaves in a familiar way the trainee accepts the virtual reality as the representation of his plant. The instructor can easily check the surveillance abilities of the operator by introducing small disturbances. For this purpose the mathematical model has to be sufficiently accurate.
Deviations in the behaviour of the model from reality usually disturb the progress of teaching, because they are often interpreted as potential failures. In some cases, however, an observed difference between the expected and simulated behaviour can lead to better understanding of the process by the instructor or development chemist who supplied the data for the mathematical model. In general, the plant operators are usually fairly sensitive to unexpected reactions of the plant. This could be trained for early detection of potentially dangerous situations. The method of dynamic simulation offers also a possibility of a new form of virtual reality. The simulator can provide information on variables that are usually difficult or impossible to measure, like the concentration of unstable intermediate products, the heat production of individual reactions in a complex reaction mechanism or the course of extremely fast processes. Currently available hardware and software tools enable the accomplishment of efficient virtual reality at reasonable effort and expenses. The training with the simulator can reduce costs of the education. Careful choice of the elements of the virtual reality supports the optimal development of a training simulator.

Practical Relevance: Training and instruction of plant personnel in the chemical industry with help of simulators reduces the risk of losses caused by unnecessary incidents. The training itself is often expensive. The study of the effectiveness of various virtual reality elements enables the efficient development of incident training systems.

 

The application of a questionnaire (D-MEQ) for the identification of the diurnal type as criterion for shiftwork (Vol. 3/2002)

Author: Barbara Griefhahn

Keywords: shiftwork, diurnal type, questionnaire

Summary: In the European Union shiftwork is performed by almost one fifth of the employees, where 20 to 30 % break off within the first 2 or 3 years due to medical symptoms. A major cause for this break off is probably an individual intolerance against shiftwork, which is particularly expected in persons with extreme circadian phase positions. These persons reveal already some problems to adapt to a normal daily schedule and these problems increase with abrupt alterations of the temporal regime, i. e. when performing shiftwork. Shiftwork is frequently accompanied, by partial sleep deprivation, which is expected particularly in morning types during nightshift periods and in evening types during periods with early morning shifts, and during night shifts by a dissociation of physiological and psychomental functions. A resynchronisation of the sleep-activity cycle with the various physiological functions (adaptation) is possible for evening types and requires one day per hour of relative time shift. In morning types, however, the physiological functions remain dissociated throughout the whole nightshift period. To avoid deleterious consequences for health in the long run it is advisable to perform a careful medical and psychological examination before assigning a person to shiftwork and this examination should include the determination of the individual circadian phase position. The knowledge about the chronotype facilitates then the decision about the type of shift a person is able to cope with. As the phase position is a longterm personal trait its identification should be considered before the vocational training and retraining. For the identification of the diurnal type Horne and Östberg (1976) developed the Morningness-Eveningness-Questionnaire (MEQ). The German version of the MEQ (the D-MEQ) was validated with the nadir of core body temperature and the temporal parameters of the individual courses of melatonin synthesis which were determined during constant routines. To test its reliability, the D-MEQ was filled in twice with an interval of 7 to 12 weeks. It was furthermore completed by persons, who live under different either selfdetermined or heteronomous daily schedules and whose subjective circadian phase position was again confirmed by the courses of melatonin concentrations. The results reveal that the D-MEQ is an as valid and reliable instrument for the determination of the individual diurnal type as the original English version. The D-MEQ is accessible via internet. With www.ifado.de/fb.pdf it is possible to print hardcopies, where www.ifado.de/chronotyp/ index.asp provides an online version. The D-MEQ consists of 19 questions which are easy to answer within approximately 10 minutes. The manual evaluation of the hardcopy is again easy and takes a few minutes where the online version provides the categorized phase position (diurnal type) immediately after its completion. The knowledge of the diurnal type enables the introduction of individually directed preventive measures. The latter reach from flexible working hours over the application of bright light to accelerate the process of adaptation (resynchronization) to nightwork and even to the exclusion of a person from certain jobs.
Early shifts are in general safe for morning types, whereas evening types are unable to sleep in advance and develop considerable sleep deficits which cumulate over several successive shifts. Evening types should not work in this shift but occasionally. Late shifts are as a rule safe for morning and for evening types as well. Night shifts are from the physiological point of view undoubtedly the most problematic as each change of shift is generally associated with partial sleep deprivation and internal dissociation of the physiological and psychomental rhythms. As these alterations are much more pronounced in morning types they should perform nightshifts only occasionally. These recommendations are by no means rigid rules. However, the identification of the diurnal type by means of the D-MEQ should be included into a broad medical and psychological evaluation and considered for shiftworkers.

Practical Relevance: This article presents a measuring instrument to record the important aspect of the individual circadian phase position. It gives not only to company doctors and industrial psychologists but also affected person and decision makers relevant information to develop working time.

 

Hand-Eye-Coordination and trraining during simulated monitor endoskopy (Vol. 3/2002)

Author: Felix Klimmer

Keywords: monitor endoscopy, hand-eye-coordination, transformations, training

Summary: Endoscopic surgery allows operations to be performed through small incisions. It is being used in various disciplines like laparoscopy and urology. To an increasing degree, these operations use video-based feedback instead of looking directly through the endoscope’s ocular. With video-based endoscopy a camera is mounted on the endoscope and the image is being presented on a video monitor. This may facilitate endoscopic surgery by enhancing the endoscopist’s view and reducing the need for the surgeon to assume awkward positions in order to peer into the endoscope. On the other hand, it also implies that the surgeon looks at another position than where the actual operation is carried out, i.e., the surgeon is physically close to the patient, but the location of the visual scene (monitor) is not the same as the actual operative area (operation). This produces a new problem for endoscopists in that it forces them to develop transformations for translating visual information into movement. To study the development of such transformations, an apparatus was built that consisted of a closed 20 x 20 x 20 cm box with a hole in the center of its front. It contained ring-shaped targets at ten different spatial locations. These locations involved five horizontal angles (40° or 20° left, 0°, 20° and 40° right), and two depths (10 cm or 15 cm). Participants were required to hit the centre of each target. Hitting accuracy (3 concentric circles) was indicated by different frequencies of a feedback tone. Another tone indicated if the target was hit with too much force. This information is important as it should be prevented with real operations as well.
Performance indicators were the speed to reach the target, and radial accuracy. This layout enabled the simulation of a video-endoscopic resection of the urine bladder, as well as the acquisition of specific hand-eye- ransformations for such a task. One research question concerned the optimal regime to learn or train the most exact execution of movements of the hands and the arms, especially if only a limited number of trials is availiable for the learning of an endoscopic task. The alternatives were blocked practice (i.e., each version of a task is practiced separately in a se-parate block) and random practice (i.e., all versions of a task occur in random order). The results show that for targets at all spatial positions random practice produced a higher improvement for velocity and accuracy performance than blocked practice.
However, performance did depend on the spatial target location. The task was carried out slower as horizontal angle (left and right) was greater, but significant differences between 0° und 40° were found only after blocked practice for depths of 15 cm. Furthermore, the differences between the central target position and right or left position were significant, but again only for 15 cm depth, and with higher values for targets at the right. The latter finding probably resulted from the position of the body relative to the ‘operating area’. This suggests that future research should look more specifically into the role of the design of the work situation, and of the body position during endoscopy. The simulator box used for the investigations enables to learn the transformations, needed for the execution of an endoscopic task. For the learning and the training of such tasks random practice will be recommend.

Practical Relevance: Video-based endoscopy may spare the endoscopist much of the fatigue associated with traditional urologic endoscopy. Yet, the problems of this technology concerning e. g. observation, manipulation, and eye-hand coordination, need to be identified and resolved. This is a prerequisite for the safe and efficient performance of such complex surgeons’s tasks. Important further research topics are the learning and training of endoscopic tasks, especially with novices, as well as the improvement of endoscopic work conditions.

 

Study of human-seat interface pressure distribution depending on seat type, postures, and anthropometric characteristics (Vol. 3/2002)

Authors: Barbara Hinz, Lutz Gericke, Jürgen Keitel, Gerhard Menzel, Helmut Seidel

Keywords: Human-seat interface pressure, Driver seating, Backrest, Posture, Anthropometry

Summary: The article summarises the state of the art concerning pressure measurements at human-seat interface in the work environment. Quantitative results have not been reported so far for different seat types and postures with consideration of real working conditions. The examination of the human-seat interface is es-sential for a further development of models of a sitting driver. An experimental study was performed in order to test the pressure distributions at the human-seat interface and to study quantitative results depending on seat types, postures and anthropometric characteristics. Nine male subjects with a body mass between 68.5 and 85.8 kg (mean value 75.2 kg) and a body height between 170.0 and 186.3 cm (mean value 175.8 cm) were selected for this study. Several anthropometric measures during standing and sitting were determined. 4 seats were included in the study: 2 damped driver seats (S1, S2), 1 hard seat (S4, without backrest), 1 automotive seat (S5). One driver seat (S1) and the automotive seat (S5) were also used without a backrest (S3 and S5, respectively). The subjects sat in 5 postures: (1) in a driving and (2) a bent forward posture, (3) in an erect upright position with arms crossed in front of the body, or (4) the hands on the legs, and (5) in a bent back-ward posture (only for seats without backrest). The measurement of pressure distribution at the human-seat interface was performed using the Pliance-system (novel, Munich). The measurement system comprises two pressure sensing mats (1024 and 256 sensors, thickness 2.5 mm) with capacitive sensors, two analysers with analogue amplifiers, a control-interface module and a data acquisition system. The pressure is measured perpendicular to the surface of the mat. The resulting forces (FZ/FX), the contact areas (AS/AR), and the maximum pressures (pS/pR) were determined for each experimental condition at the seat and backrest, respectively. The coordinates of markers positioned over the joints were measured optically by means of the MacReflex (Qualisys) movement analysis system. 10 angles between body parts were calculated on the basis of the coordinates of joints.
The mean values of pressure parameters during sitting with or without backrest contact were compared by t-tests for paired samples. They exhibited significant differences. Regression analyses were performed to obtain regression equations for the prediction of the pressure parameters from the anthropometric characteristics. The results reflected the interaction between body shape, cushion material, cushion-form and the actual posture. For  the seats with backrest, the mean values of FZ at the seat ranged from 323 to 391 N, those of FX at the backrest from 19 to 90 N with the lowest values during the bent forward posture. The contact areas at the seat displayed a remarkable dependence on the seat type. The smallest contact areas at the seat (914 - 935 cm²) were measured with S2, whereas the highest ones (1413 - 1417 cm²) were observed with S5. The mean values of the maximum pressure values were inversely related to the contact areas at the seat. For S2 the highest maximum pressure values (1.7 - 2.1 N/ cm²) were registered, whereas the smallest maximum pressure values for S1 and S5 were in the range from 0.7 to 0.9 N/cm². With the hard seat (S4), FZ and the contact area were lower, and the maximum pressure was higher than for the soft seats. The effects of different postures were similar at seats with and without backrest. The mean values of FZ/FX, contact area and maximum pressure were significantly different for the conditions with and without backrest contact. Generally, higher mean values for FZ were found without backrest contact. Regression equations for the prediction of the variables FZ/FX , AS/AR and pS/pR were characterised by sufficiently high coefficients of determination.
The cumulative force distributions along the sagittal or frontal axes were presented as "force profiles", i. e. sums of forces accumulated across rows or columns of pressure sensors, respectively. These profiles help to identify and to compare pressure distributions under different conditions. The effects of posture on the normalised body weight supported by the seat were in conformity with results of previous studies. The differences between pressure values caused by various postures and the use of the backrest resembled those obtained with intradiscal pressure measurements (Wilke et al. 2001). The hypothesis is put forward that the use of the backrest can contribute to an essential relieve of the intraspinal load. The high shares of variance explained by regression equations suggest a promising method for a future prediction of pressure parameters at the human-seat interface. The distinct effects of seat type, posture and subject characteristics underline the significance of more sophisticated modelling of this interface. Further research is needed to provide comprehensive data, based on a larger group of subjects typical for the working population.

Practical Relevance: New data and knowledge about the pressure distribution at the seat interface can contribute to a further development of an ergonomic driver seat design.

 


Vol. 5/2002

 

Does successful product design have particular process characteristics? (Vol. 5 – 2002)

Authors: Winfried Hacker, Annekatrin Wetzstein & Anne Römer

Keywords: Design problem solving · design activity · characteristics of the design

Summary

Innovative products are decisive for economic development. These products, whether machines, software or everyday objects, are designed by humans. However, little is known about the process of design and the possible characteristics of successful designing. Empirical investigations about the engineering design process have given heterogeneous results. Besides, explanations are missing as to why different procedures lead to solutions of comparable quality. Up till now no optimal strategy has been found, instead only single characteristics of successful design procedures in different investigations.

In analogy with epidemiological risk analysis, the aim of our study was to find those characteristics which are repeatedly found in connection with unsuccessful solutions, and those which can be observed in connection with successful solutions. We expected that the following process characteristics would occur more frequently with successful solutions than with unsuccessful solutions.

 a more extensive analysis of the work order,

 consideration of alternative solutions possibilities,

 starting work on the whole design and then continuing with parts of the design,

 multimodal, graphic and conceptual representation of solutions,

 more frequent changing between mental and external, e.g. graphic, steps,

 alternating between a systematic procedure, where the work order is decomposed into parts, and

 a nonsystematic (mixed) procedure

In our three studies the participants were volunteers without a design or engineering background and without knowledge of specific design problem solving strategies (n1 = 71, n2=73, n3= 60). They had to design and sketch a garden grill that had to meet several requirements. The quality of the solutions, the working time required, and the different characteristics of the procedure were recorded and analysed. The results of the first study concerning the quality of the designed object, the sequence of design, the design of alternatives and the form of design could be replicated and be shown more clearly in second study with a different sample: Three-quarters of the participants combined in their procedure at least one whole design with the design of subparts, and more than three quarters of the participants (80%) began with a rough sketch of the whole design and then switched to the design of subparts. Fewer than one fifth of the participants began locally with the design of subparts. The working time increased with the frequency of the changes between the whole design and the design of subparts. Only one third of the persons submitted alternative solutions; in the first study this proportion was only one quarter. Almost 90% of the participants in addition described their design conceptually. A comparison of the extreme groups and an analysis of the times of the partial activities could show that there was a positive relationship between the solution quality and the total working time, the times interpreted as thinking time, the designing of subparts in addition to designing the whole, the frequency of changes between the whole design and the design of subparts, and the designing of alternatives solutions possibilities.

In the third study we asked whether different strategy instructions (either systematic, opportunistic, or no strategy instruction) have an influence on the solution quality and which relations exist between the strategy instructions and the single characteristic of the procedure. The results showed that for the untrained, "naive" participants, the three strategies did not differ significantly in their solution quality. That means, the hypothetically optimal strategy of the systematic-decomposing procedure does not produce a higher solution quality than a free or opportunistic procedure.

Nevertheless, some single characteristics that were associated with a higher solution quality in the first two studies, e.g. production of alternatives, occurred more frequently and more pronouncedly with the systematic strategy than in the no strategy group. A rather unexpected result was that the systematic strategy was on the one hand experienced as hampering the search for solutions whilst on the other hand there was a higher satisfaction with the solution than with the opportunistic strategy.

Thus, we assume that some of the solution-favouring single characteristics tend to occur in a systematic-decomposing procedure. However, this procedure is not appropriate as the only strategy, and it seems that the flexible application of mixed procedures such as an opportunistic procedure with systematic episodes is likely to be more successful with regard to performance and experienced workload.

Practical Relevance: The results of the studies presented here establish, firstly, important and necessary aspects of construction or design training, which should be considered in its didactical concept. Secondly, important factors for the activity-fair design of work tools, e.g. the CAD software, are also revealed.

 

Cause and Effect Analysis of the Leadership-Employee Relationship (Vol. 5 – 2002)

Authors: Dirk Mackau, Matthias Brüggmann & Holger Luczak

Keywords: Leadership · Employee Questioning · Measurement System · Evaluation

Summary

Customer requirements today force companies focused on services to reorganise there business processes. This needs to focus on closer contacts to customers and suppliers and more flexible and efficient processes for internal performances. An increasing number of management instruments, e. g. the Balanced Scorecard (BSC), have been developed to support these processes through the introduction of measurement indices. Never the less the most popular reason for minor efficiency and flexibility is the lack of information in the middle and lower management. This leads to problems in analysing their strategic programs, defined aims and related measures. Although they miss the knowledge about why the required degree of achievement has not been reached. In most cases today, the indices chosen are simple to quantify. Sadly, the effect of leadership-employee relationship has rarely been considered thereby due to a lack of valid methods and indices to generate quantified information. By using a BSC for analysing the human resource focus, the needed indices mostly reflect the process of leadership. But this does not enable to discover most potentials of learning and development or other two sided relations. Therefore the key for surviving competition on the market is dismissed. A model based system is used to describe leadership as a first determined criteria. Therefore, through abstraction and reduction it is possible to draw a close picture of the relations between cause and effect. This leads to two other criteria, one being the use of variables for transformation the other verification through statistics. This is the only way to give a valid presumption of the model. In this case a questionnaire was developed on the basis of literature analysis consisting of the Managerial Practice Survey (MPS) (based on the Multiple-Linkage Model by Yukl) and the Job Diagnostic Survey (JDS) (based on the Job Characteristics Model by Hackman and Oldham). The further course of this paper deals with the experiences made with this set in a German service business. Altogether 655 questioners were distributed in 20 decentralised maintenance units in this survey. The 344 evaluated questioners were split up in two groups, one answered by the leadership (N = 112) and the other by the employees (N = 232). The gained data were first examined for their meaningfulness. Then the JDS-data were subjected to the correlation investigation with the intention to verify the predictions that the theoretical model has made. This paper shows that the found results matched up with the model’s suggestions, which follows the examination of the MPS. Afterwards it is investigated to what extent there is a connection between the JDS-items and the MPS. At least the weakest correlation are also confirmed here. The paper closes with an examination about the understanding of possible causes and effects. On the basis of the data of the leadership behaviour and the average missing days at work the random samples are separated into four groups. These considerations end with the discussion about the causes of each organisation unit and it’s belonging to a group. The available results allow more exact conclusions to be drawn about the effects of the leadership-employee relationship. As a consequence the analysis of the connection between the state of health and the leadership behaviour enables to diagnose the state of the individual organisation units. This diagnosis allows conclusions on potentials for improvement regarding the learning and development perspectives of the prevailing organisation unit.

Practical Relevance: This paper describes a method to analyse the causes and effects in complex situations like leadership behaviour in organisation units.

 

The Job-Exposure-Matrix as a tool for work-related analysis of morbidity data from Health Insurance Institutions (Vol. 5 – 2002)

Author: Wolfgang Boedeker

Keywords: Job-Exposure-Matrix · sickness absence data · drug prescriptions · occupational epidemiology · statistical models

Summary

Morbidity data of health insurance institutions are routinely used for work-related health monitoring. Unfortunately, apart from crude job codes these data do not include any information on the specific field of occupational activity and on risk factors at work. Consequently, in order to utilise health insurance data in occupational epidemiology, work load information has to be supplemented first. This could be done by means of a Job-Exposure-Matrix (JEM). Aim of this paper is to describe a JEM with respect to the job-types and work load factors studied and to review suitable statistical models for the analysis of the relations between work load and morbidity.

In a JEM work load information gathered by experts in a standardised way (the columns) is related to defined job types (the rows). While often exclusively applied to chemical exposures, this method has been extended for non chemical work load and was even used to address psychological demands by quantifying work organization restraints. The JEM described in this study was set up for some 300 homogenous job types covering app. 150.000 employees. Work load information was compiled by an expert panel with respect to 70 work load items. Every job type was accessed with respect to frequency and partly to intensity of exposure. For quantification items were a priori allocated to risk factors. Building up a JEM by job titles or job codes might lead to considerable misclassification, because working environment and tasks may differ tremendously within a job code. It is therefore recommended to built up job types by combining multiple information.

After assessing the work load, associations with the morbidity of employees could be analysed by use of suitable epidemiological-statistical models. The selection of these models depends basically on the scale of the morbidity parameter. Count data (e.g. number of sickness spells) can be analysed by the multiple Poisson-Regression and Negative-Binomial-Regression. For ordinal effects (e.g. health status bad, good, very good) in contrast, specific models within the Logistic-Regression are available. Most of the models can be treated within the framework of general linear models by selection of suitable link functions and error distributions. This is of great computational convenience since general statistical models are implemented in widespread statistical software packages.

In order to study the suitability of models for typical data of health insurance institutions and to provide recommendations for model selection for routine application we used the following criteria

¨ models should provide relative risk figures instead of odds ratios, since the latter in most cases overestimate the true risk;

¨ models should give the association between work load factor and morbidity parameter in only one estimate even when more than one category are under observation (e.g. good, bad, fair);

¨ models should be robust concerning the classification of the morbidity parameter, e.g. the overall estimation for a risk factor should not differ when sickness frequency is classified as 1-2 vs. 3-4 spells or 1-3 vs. 4 spells.

In the light of these criteria, no model studied is universal. With respect to count data, the situation is rather simple and Poisson regression is recommended. For ordinal data the so called sequential-binary-logistic-regression does not rely on specific assumptions and therefore could be used generally. However, by this method an odds ratio is calculated for every morbidity category. In contrast to widespread practice, the proportional odds model should not be used without checking the proportionality assumption.

Practical Relevance: A simple procedure for work-related health monitoring with data of the health insurance is introduced. Job-types are assessed with respect to the physical and psychosocial work load. The associations between work load and morbidity can then routinely be quantified by suitable statistical models.

 

The everlasting dream of the abolition of the assembly line: some thoughts on the past, present and future of a classical model of work organization (Vol. 5 – 2002)

Author: Ben Dankbaar

Keywords: assembly line · work organization · automation · quality of work · service work · car industry

Summary

The moving assembly line was a standard model for the organization of work during most of the twentieth century, not just in the automobile industry, but throughout the economy. The assembly line introduced the idea of flow production, which was known already in slaughterhouses, and combined it with the methods of scientific management (Taylor) for the analysis and calculation of work times. It has been argued that Taylor’s focus on hierarchical control should be differentiated from Ford’s emphasis on flow production, as the two have potentially different effects on work organization. This paper argues that this argument can be used to understand the discussions concerning alternatives to the assembly line, which have existed almost as long as the assembly exists as a model of work organization. The alternatives proposed for the assembly line have traditionally focused on improving the quality of work and returning a measure of control to the workers. In this sense they had something to offer. However, the alternatives have not been convincing in the economic effectiveness, which was the driving force behind Fordist flow production. In the course of the 1990s, the traditional assembly line with short-cycled work has re-gained the upperhand even in those countries like Germany and Sweden where alternative models had received considerable support. First criticisms of short-cycled work on the assembly line were voiced by the Human Relations School in the USA in the 1930s. In the early 1950s the sociotechnical systems approach developed the semi-autonomous team as an alternative to Taylorist conceptions of work organization. Volvo was the first car manufacturer to apply these concepts in its Kalmar factory in 1974. It introduced automatically guided vehicles, which served as assembly platforms in parallel stations, allowing for longer assembly times per work station. This experiment was motivated mainly by shortage of labour. It was hoped that by improving the quality of work, work in car facories would become more attractive. The experiments were not driven by the desire to create more efficient assembly systems, even though car production had moved in a direction which potentially aggravated the inherent problems of assembly line production. The increasing number of functions available in a car as well as the increasing number of options offered to customers have greatly increased the logistical complexity of car production and consequently the size of unavoidable systems losses. In a setup with parallel work stations it is potentially easier to deal with variety. These advantages, however, were not systematically explored in Sweden. Throughout the 1970s, various experiments in work organization were carried out in the same tradition, supported by large government programmes in Germany and other Northern European countries. The focus of these programmes was on the quality of work and it turned out to be difficult to prove the economic viability of the alternative forms of work organization, because they were usally limited to re-arranging direct production work. It was argued that their economic viability would become visible if enlarging work on the shop floor would be accompanied by a then possible reduction of the indirect work force. This hardly ever happened. In the 1980s, experiments with work organization were attracting less attention, although they were still continued in various places. Instead, the focus of attention shifted to automation. The abolition of the assembly line was now expected from the introduction of ever increasing levels of automation. New concepts of work organization were developed for high automation production environments, which incorporated some of the same elements as the alternative concepts of the 1970s: more autonomy on the shop floor, team work, upgrading of skills, more complex tasks, etc. And indeed, automation did proceed in parts of the assembly line, especially in the body shop, where robots took over. However, over the past twenty years automation did not make much progress in other parts of the assembly line. Again, if you take the perspective of control, automation seems to hold great promise, but if you take the perspective of cost, there are definite limits to what automation can accomplish. In the course of the 1990s, whatever hesitating trend there was to move away from the assembly line in car production was clearly and drastically reversed. Under the influence of the productive successes of the Japanese car manufacturers, as proclaimed and explained by the famous MIT-study on lean "production", car manufacturers all over Europe re-emphasized the importance of the traditional short-cycled assembly line as the core element of every automobile plant. It is true that in the modern literature on Japanese production concepts teams are frequently mentioned as an important element of work organization. In practice, however, such teams are not much more than a number of individual workers employed on a line-segment. They have very little to do with the notions of autonomy and self-regulation connected to the sociotechnical concepts of the 1960s and ’70s.

The 20th century therefore ended in the same manner as it began: with the discovery of the assembly line. There is an important difference, however. Hundred years ago, the assembly line was the system of the future, paving the way for an economy characterized by mass production and mass consumption of industrial goods. The economy of the 21st century, on the other hand, is first and foremost a post-industrial service economy. Some of the proponents of "lean production" went as far as to proclaim their model as the model of this new century. This is extremely unlikely. On the contrary, models of work organization for the new age take their inspiration from the customer-oriented, knowledge-intensive, computer-supported environment of the professional service worker. In fact, this poses a completely new threat to the old-fashioned assembly line. To the extent that professional service work provides the overall model for work organization in pur times, it will become increasingly difficult to make work on the assembly line attractive for modern workers.

Practical Relevance: This contribution argues that there is no convincing alternative to the assembly line in spite of various efforts to create such alternatives. The assembly line as model for work organization will nevertheless come under increasing pressure, because it cannot meet the standards of work in a post-industrial service economy.