Effect of the judge and definition of the trait for horse free jumping evaluation

The presented research investigates the effect of the judge on scores for horse free jumping skills, the agreement of judge's scores and relations of these scores with measured jumping parameters received by video image analysis in order to recognise judging preferences and trait definition. The investigation was based on a group of 32 warm-blooded stallions that were judged in free jumping by six experienced judges in five routinely evaluated jumping traits. Simultaneously horses were filmed during jumping and linear jumping parameters were measured. Additional jumping parameters were calculated to describe the jumping style in a more detailed way. The influence of the effect of the judge was estimated by analysis of variance, the relationships between judges' notes and jumping parameters by analysis of correlations. The effect of the judge was statistically significant for all traits. The correlations between individual judges' notes were not equal. The notes for particular traits were in some cases more correlated with other traits like with notes for the same trait. Mean notes for evaluated traits were correlated above 0.6 between each other. Correlations between judges' notes and measured jumping parameters were low and medium. Some jumping parameters were correlated with all traits whereas some parameters were not correlated at all. Received results showed that the definition of the traits evaluated by judges is not the same for all of them. However, for all judges the distance of landing, elevation of the body and lifting of front limbs were the most important parameters of the jump.


Introduction
The quality of breeding is always based on conducting correct selection, which should be founded on the proper and adequate evaluation of selected traits.That is of special importance for traits scored subjectively like it is widely used in sport horse breeding.The same or compared definitions are also of main importance for every step of selection.Comparison of the data definition is the basic requirement for further national or international horse breeding evaluation (Koenen & Aldrige 2002).The decision-making process is a very complicated task, especially in case of biological objects like moving horses connected with real-time evaluation.The judges' visual cognition may be affected by unforeseen effects.Psychometric tests for static discrimination between two opposing stimuli -dots with parts placed »up« or »down« -may lead to only 65 % of positive judgement (Braddick 1997).So the evaluation of objects in movement might be more difficult and affected by judges' expectations or bias in observations.The practice of judging requires certainty and understanding of the task, a stabile performance level and a learned ability to activate specific parts of the brain (Braddick 1997).According to Funder (1999 by Morris et al. 2002) good judging is connected with the trait (some traits are judged more easily than others), the judge (some persons are better judges than others) and the target (some individuals are judged more easily than others).
In recent research the role of motivation and emotion in judging and the decision-making process is widely underlined (Maner et al. 2007).Agreement between judges is treated as an indicator for good judging (Morris et al. 2002, Fuller et al. 2006, Lloyd et al. 2007).The subjective evaluation of traits provides basic information on the skills of a young horse's performance.From the point of view of genetic research evaluations made by judges seem to be a good prediction of the horse's future performance (Thorén-Hellsten et al. 2006).However, individual horse evaluations across countries are not always comparable (Koenen & Aldrige 2002, Koenen et al. 2004).Biomechanical studies underline the possible bias on horse jumping skills evaluation (Santamaria et al. 2006).Other authors wrote about possible systematic manipulation by owners and breeders as well as about insufficient and overlapping definitions (König von Borstel et al. 2012).Further investigations could be helpful for a better understanding of judging systems.The tools to study horse biomechanics are advanced (Colborne 2004) and ready as the methods for calibration of judgements.The aim of the present paper was to investigate the effect of the judge on horse free jumping evaluation, the agreement of judges' scores and their correlations with measured jumping parameters received using video image analysis.The description of evaluated skills and judges' preferences will be ascertained by an analysis of relations between intra-and inter-judges scores for jumping and relationships between these scores and kinematic measurements of the jump.

Horses and traits
The material was collected at a performance test station for young horses.The group of horses in investigation consisted of 32 warm-blooded stallions at the age of three years.The horses were previously trained for a period of 100 days and the jumping test was conducted at the end of that period.Six experienced judges were asked to evaluate horse jumping skills according to the rules used for young horses' free jumping evaluation.The following traits of jumping description were marked: »willingness to jump«, »ease of the jump«, »work of the front«, »work of the back« as well as »work of the trunk, head and neck«.The traits were not described more precisely.Horses were evaluated in accordance with the scale designed for performance test stations, ranging from 0 to 10 points for every separate trait.

Video analysis
The horses were simultaneously filmed during jumping by a digital Panasonic AG-EZ 35 camera (25 FPS) (Panasonic Corp., Osaka, Japan) with the set up standing at a distance of 10 m from the main obstacle, perpendicular to the horse's movement.Horses jumped in the riding hall and the line of obstacles was placed in the following order: 1) indicator pole on the ground at a distance of 2.5 m before the first vertical obstacle, 2) the first vertical obstacle measuring 0.6 m at a distance of 6.4 m, 3) the second vertical obstacle of 0.6 m at a distance of 6.8 m and 4) a spread obstacle (doublebarre) with a fixed width of 0.8 m and height ranging from 0.9 m to 1.2 m.The warm-up of all horses consisted of 20 min on the lunge (mostly trot).Horses jumped every height of the obstacle (0.9 m, 1 m, 1.1 m and 1.2 m) once or twice depending on the decision of the trainer.Almost all repetitions were observed on the fourth height of the obstacle.The investigated group of 32 horses performed 156 jumps in total.Filmed material was analysed by a manual programme for video image analysis (Cytowski & Sakowski 1998).The scale for measurements was achieved by measuring real distances on the wall behind the path of the horses' movement.The following linear kinematic parameters were considered: taking off, landing, lifting of the limbs and elevation of specific points of the trunk (bascule points) above the obstacle.In order to describe the silhouette of the horse during the jump in more detail some additional parameters were calculated based on the basic measurements: symmetry of limbs lifting over the obstacle, »work« of the front and hind limbs, »work« of the head, croup and curve of the upper body line.All measured and calculated parameters are defined in Table 1.

Statistical analysis
The analysis of the data was performed by statistical package SAS v9.1 (SAS Institute Inc., Cary, NC, USA) using different procedures.The aim of the study was realised by following analysis: the effect of the judge on the horse evaluation, the correlations between individual judges' score and the means of them as well as the correlations between judges' scores and jumping parameters.
The effect of the judge was estimated by analysis of variance MIXED procedure with the following statistical model: where y ijk is the judges' score, μ is the mean, a i is the random effect of the horse ( i =1.…. 32), J j is the fixed effect of the judge ( j =1.….6) and e ijk is the error.11.Symmetry of front limbs difference between the lifting of front left (3) and the lifting of front right (4) 12. Symmetry of hind limbs difference between the lifting of hind left (5) and the lifting of hind right (6) 13. »Work« of head difference between the elevation of withers ( 8) and the elevation of head (7) 14. »Work« of croup difference between the elevation of withers ( 8) and the elevation of croup ( 9) 15. Curve of the upper line difference between the elevation of croup ( 9) and the elevation of head ( 7) 16. »Work« of front limbs difference between the elevation of withers ( 8) and the mean of lifting of front limbs (3,4) 17. »Work« of hind limbs difference between the elevation of withers ( 8) and the mean of lifting of hind limbs (5,6) The analysis of relationships between the judges' scores for every trait was calculated as Pearson's correlations using the CORR procedure of the programme.The same procedure was performed to find out the relationships of the individual judge's score with the mean of all judges.Results were presented as the range of correlations between individual judges and correlations between individual judge and the mean of all judges as well as correlations between means for all judges.
The relationship between measured jumping parameters and scores for single traits evalu ated for jumping skills were calculated as correlations corrected for effects that influence jumping parameters (height of the obstacle, successive number of the jump) using the Manova option in GLM procedure.This analysis was calculated for every judge's score as well as for the mean score separately.

Variance of judges' scores and jumping parameters
The mean scores of six judges and their basic characteristics are presented in Table 2. Most of the scores were between 6 and 7 with a standard deviation of less than 1.Most scores were given within the range of 3 points, in some cases they were within a wider range of 4 points (13 %) and one judge used for only one trait half of the scale (5 points).The investigated judging is comparable with the results obtained by other authors and countries.The cited values were between 6 and 7 with a standard deviation between 0.7 and 1.7 (Bruns et al. 2001).Some values of standard deviation in the presented study were below the level given by the cited authors working on the international horse breeding evaluation (Interstallion).This was mainly observed for the trait »willingness to jump«.The effect of the judge was statistically significant for all traits (P<0.001).In the case of »willingness to jump« and »work of front« one judge differs in a spectacular manner from the others.All differences are presented in Table 3.The jumping parameters achieved in this study are presented in Table 2.The results obtained were similar to the values measured on other groups of young horses in free jumping (Lewczuk 2007).Only lifting of the hind limbs seems to be measured higher in the group of horses presented in this paper.Comparable values were obtained for adult sport horses (Puchała 2005, Pietrzak et al. 2006).However, the standard deviations for measured parameters of young horses were higher.Similarly, the coefficient of variance reached higher values.

Correlations between individual judges' scores and means for different traits and within a trait
The range of correlations between individual judges' scores for each trait are presented in Table 4.The results obtained were on a medium level.The highest values of correlations between individual judge's scores for the same trait were calculated for »ease of the jump« and »work of the front«.The lowest values were achieved for the traits »work of the back« and »work of trunk, head and neck«.In some cases correlations between individual judge's scores for different traits were higher than correlations between individual judge's scores for the same trait.This was observed for the traits »ease of the jump« and »work of the front« (judge 3), as well as for the traits »willingness to jump« and »ease of the jump« (judges 3 and 4).Some high values of correlations were calculated also between scores of the judges for different traits on the level of 0.8-0.9(judges 4 and 5).The lack of correlation between some individual judge's scores for the same traits was rather surprising.This was observed for »work of the trunk, head and neck«, »willingness« and »work of the front«.
The correlations between the means of the judges' scores for every trait and the cor rela tions between every individual judge's note with the mean score for each trait are also presented in Table 4.The correlations between the mean score for different traits were high and ranged from 0.62 for the traits »willingness to jump« and »work of the trunk, head and neck« to the highest value of 0.88 for the traits »ease of the jump« and »work of the front«.Some differences in style of judging could be noticed on the basis of the correlation between the individual judge's score and the mean for the traits.It seems that two tendencies could be observed for the traits »ease of the jump« and »willingness to jump«.The correlations of Corr: range of correlations three judges (1, 2, 3) were on an equal level of 0.5, while the correlations for judges (4, 5, 6) reached higher values, about 0.6-0.7.Another trend in judging could be observed on the basis of calculations for the traits »ease of the jump« and »work of the front«.Except for two extreme values (0.6, 0.8), all others were around 0.65.The correlations between individual judge's scores and the mean of all judges ranged from 0.51 to 0.86.The most even values were calculated for the traits »work of the back« and »work of the trunk, head and neck«.The relationship between individual judge's scores and the mean of all judges was more differentiated for »work of the front« and »willingness to jump« than for other traits.

Correlations between judges' scores and measured jumping parameters
The results obtained are presented in Table 5.All correlations are on a low and medium level ranging from −0.29 to 0.42.Some jumping parameters were not correlated with the judges' scores at all like lifting of the hind limb and symmetry of the hind limbs or were correlated only with individual judge's scores like »work« of the croup or the taking off distance.The jumping parameters landing distance and lifting of the front limbs were correlated with all traits evaluated by the judges.Elevation of the trunk (bascule points) was correlated with the traits »willingness«, »ease of the jump« and »work of the back«.3. Lifting FL 0-0.20 0.15 0-0.23 0.18 0-0.24 0.18 0-0.24 0-0.24 0.21 4. Lifting FR 0.14-0.240.24 0.14-0.370.36 0-0.32 0.31 0-0.300.27 0.20-0.380.37 5. Lifting HL -0.16-0 -0.11-0 0-0.23 0.14 6. Lifting HR -0.20-0 -0.17-0 0-0.16 7. Elevation -head 0-0.33 0.25 0-0.25 0.17 -0.24-0.320-0.24 0.14 -0.29-0 8. Elevation -withers 0-0.25 0.21 0-0.27 0.21 -0.17-0.310-0.19 0.20 0-0.21 9. Elevation -croup 0-0.24 0.24 0-0.28 0.24 -0.14-0.350-0.27 0.23 0-0.23 10.Symmetry -jump -0.36-0 -0.31 -0.16-0 11.Symmetry -front -0.17-0 -0.26-0 -0.21 -0.21-0 -0.20-0 -0.22 -0.29-0.17-0.19 12. Symmetry -hind 0-0.15 0-0.18 0-0.20 0-0.18 13. »Work« of head -0.18-0 0-0.21 0-0.16 0-0.33 0.22 14. »Work« of croup -0.17-0 -0.18-0 -0.23-0 -0.20-0 0-0.16 15.Curve -upper line -0.15-00-0.23 0-0.20 0-0.31 0.24 16. »Work« of front limbs 0.15-0 -0.18-0 -0.29-0 -0.16 -0.16-0 -0.19-0 17. »Work« of hind limbs 0-0.42 0.28 0-0.32 0-0.39 0-0.22 0-0.18 J 1-6: Judge 1-6 range The highest correlation for mean of all judges was achieved for the distance of landing and the trait »willingness to jump« (0.4) as well as the trait »ease of the jump« (0.37).Some correlations between lifting of the front limbs and the individual judge's scores reached values above 0.3.The highest values of correlations (0.45) were calculated for individual judge's scores between the parameter landing and »ease of the jump«.The most even correlations between individual judge's scores were noted for lifting of the front limb and »work of the trunk, head and neck« as well as for the parameters distance of landing, the trait »willingness to jump« and lifting of the front limb and the trait »willingness to jump«.The same tendencies in judging were observed for four out of six judges in relation to the parameters of elevation of the trunk and limbs and »willingness to jump« and »ease of the jump«.The most differentiated correlations between individual judge's scores for specific traits were calculated for the parameter elevation of the head and the trait »work of front«.Received results ranged from −0.24 to 0.32.The additional parameters »work« of the head and curve of the upper body line were correlated with the trait »work of the trunk, head and neck« on the low level about 0.22 and 0.24, respectively.The parameter which was in negative correlation with the judges' scores was symmetry of the front limbs.The parameters which were related only with some individual judge's scores were »work« of front limbs, lifting of the hind limbs and symmetry of the hind limbs.Symmetry of the jump was the only parameter that was negatively correlated with all judges' scores.The height of received correlations between traits and jumping parameters are comparable with the French results.The kinematics measurements were correlated with the judges' notes in the range between −0.36 and 0.18 (Dufosset & Langlois 1984).
The consistency of the judges' opinion is widely used as a predictor of good judgement.However, the classification of the meaning of received heights of values was not always the same.Landis & Koch (1977) suggested that values above 0.8 were classified as »excellent« and values above 0.4 as »moderate«.Martin & Bonnett (1987) qualified relationships between 0.3-0.5 as »acceptable«, between 0.5-0.7 as »good« and above 0.7 as »excellent«.According to Fleiss (1981 by Fuller et al. 2006), values above 0.4 are »acceptable«.The same level of importance was accepted by Keegan (Keegan et al. 1998).Most of the relationships are presented as kappa (κ) values or Kendall's (W) statistic.However, as the scale of all of these coefficients is the same, ranging from 0 to 1, relationships calculated in different ways seem to be comparable in the meaning.Most correlations received in the presented study for the same traits were above 0.6, therefore reaching the common acceptable level.
Correlations between individual judge's scores were not always on an acceptable level, especially if there was a lack of correlations between some of them.On the basis of relations between the judge's scores also the character of the traits could be discussed (Morris et al. 2002).The traits »willingness to jump«, »ease of the jump« and »work of front« could be considered to be easier to judge because of judges' agreement on their evaluation.Higher correlations between individual judges' scores for different traits than correlations between individual judges' scores for the same trait implied that the definition of traits may not be the same for all judges.The same could be concluded on the basis of connections between judge's scores for evaluated traits and jumping parameters.The discrepancy of the definition for the investigated traits is also observed on the basis of the correlations between an individual judge's scores and the mean, which showed in some cases rather two different ways of judging.
The need for a global system of scoring has been emphasized in veterinary medicine (Fuller et al. 2006) and in judging for breeding purposes (Thorén-Hellsten et al. 2006).Also the fact that using the scale of judging may involve the results was known (Fuller et al. 2006).However, it seemed that even the same scale does not guarantee the same results.The way of using different scales may influence results.The judge's personal point of reference in her/his mind, that is the basis for judgement, seemed to affect the results even more significantly.The veterinary scale of judging lameness could be a very good example for this problem.Except for the scales 0-5 points or 0-10 points, also the correction on the last observations of judgement is used (Fuller et al. 2006).This provides for the situation where the earlier observation is the reference point for the next judgement.The role of experience in judgement is obvious.It was suggested that people who handled horses regularly are much more likely to agree on scores (Lloyd et al. 2007).Subjective judgement and problems connected with such kind of evaluation were underlined in many papers on horses' evaluation.Some authors suggested a new methodology of calculations of breeding value estimation based on official performance tests judged in the traditional way just because of the narrow scale of evaluation used (Borowska et al. 2011).New weights for traits in indexes of horses were also suggested because of differences in heritabilities of almost the same traits (observed in different time period) judged by different experts (Dietl et al. 2005).Another possibility of correction of horse evaluation was discussed in the paper that studied benefits of descriptive (linear) evaluation (Rustin et al. 2009).The usefulness of biokinematics in research on dressage horse judgement was underlined as an explanation for introduction of new traits for gait evaluation (Becker et al. 2012, Becker et al. 2013).Some new possibilities were also seen by using behavioural tests (König von Borstel et al. 2012).That suggested that the existing lack of definitions and irregular judgement is a long lasting international problem and should be solved by the introduction of new technologies and ideas.
In conclusions of the paper, it seems that the style of judging could be predicted on the basis of correlations between measured jumping parameters and the judges' scores.The correlations between these two sources of data received in the study are low and middle high.Diversified results of individual judges let recognise some different tendencies in judging.Comparison of judges' notes within and between the evaluated partial traits indicates that free jumping note is not precisely defined.Detailed traits' scores could be helpful by establishing the definition of overall trait.Imprecise trait definition is hidden if only the mean score of judges is analysed.Linear kinematical measurements launch trait definition for horse free jumping.However, it could be expected that the correlations between the judges' scores and the measured jumping parameters would be on a more equal level for every individual judge.Judges' preferences are evident due to kinematical and detailed traits analysis.For all judges, distance of landing, elevation of the body and lifting of front limbs were the most important parameters of the jump.
Lifting of FR distance between the highest pole of second stand of obstacle and the lowest point of front right limb above that pole 5. Lifting of HL distance between the highest pole of second stand of obstacle and the lowest point of hind left limb above that pole 6.Lifting of HR distance between the highest pole of second stand of obstacle and the lowest point of hind right limb above that pole 7. Elevation of head distance between the highest pole of second stand of obstacle and the highest point of head (occiput) on bascule frame 8. Elevation of withers distance between the highest pole of second stand of obstacle and the highest point of withers on bascule frame 9. Elevation of croup distance between the highest pole of second stand of obstacle and the highest point of croup (sacrum) on bascule frame 10.Symmetry of the jump ratio of the taking off distance (1) to the landing distance (2)

Table 2
Statistical characteristics of the jumping measurements and traits evaluated by judges

Table 3
The effect of the judge on the jumping traits means marked with the same letter differ significantly, small letter P≤0.05, capitals P≤0.001 *

Table 4
Correlations between notes of six judges for every evaluated trait

Table 5
Correlations between the individual judges' notes and measured jumping parameters