The least absolute selection and shrinkage operator (LASSO) and adaptive
LASSO methods have become a popular model in the last decade, especially for
data with a multicollinearity problem. This study was conducted to estimate the
live weight (LW) of Hair goats from biometric measurements and to select
variables in order to reduce the model complexity by using penalized
regression methods: LASSO and adaptive LASSO for

Native goat breeds play important socio-economic roles in the livelihood strategies of poorer farmers, especially those in rural and hard-to-reach areas of the world. Turkey has one of the largest goat populations in the world and has one of the highest breeding rates. The total number of goats in the country is about 10.3 million and the dominant goat breed is the “Common”, or “Hair”, goat, which constitutes approximately 92 % of the total goat population in the country (TUIK, 2017). Goats have been kept for milk, meat, skin, and hair for several centuries in Anatolia (Gokdal, 2013).

Studies to define adult live weights and body measurements are of great importance for the characterization of farm animal breeds. The prediction of body weight (BW) and the determination of its relationships with other biometric measurements generates considerable knowledge for breeding research relating to meat production per animal (Iqbal et al., 2013; Yılmaz et al., 2013; Khan et al., 2014). Multiple linear regression (MLR), based on ordinary least squares (OLS), is a traditional, simple method that has been used by researchers in order to predict the complex relationship between live weight and some body measurements in goat, sheep, cattle, fish, etc. (Francis et al., 2002; Pesmen and Yardimci, 2008; Yılmaz et al., 2013). However, when a multicollinearity problem exists among explanatory variables, the OLS method produces poor predictions (Montgomery et al., 2001; Yakubu, 2010; Dormann et al., 2013; Khan et al., 2014). The multicollinearity problem implies that the standard errors of regression coefficients are higher than expected, and thus it is difficult to find out the accuracy and robustness of the prediction models (Weisberg, 2005; Yakubu, 2009, 2010; Sangun et al., 2009).

Penalized methods based on minimizing the residual sum of squares are an
alternative to OLS method for data with multicollinearity problems. Ridge
regression is one of them; it overcomes the multicollinearity problem by
using

The aim of this study was to estimate the LW of Hair goats from biometric
measurements for the purpose of selection for genetic improvement and
breeding program in the field to select variables in order to reduce the
model complexity and to determine the best model to explain the change in LW
by performing ALASSO. Therefore, multiple linear regression was performed
to determine a potential multicollinearity problem; then the Ridge, LASSO, and ALASSO methods for

The data of the study comprised measurements from a total of 132 Hair goats from the Honaz district of Denizli province in Turkey. The data included age, gender, live weight, and 10 biometric measures of goats: forehead width (FW), ear length (EL), head length (HL), chest width (CW), rump height (RH), withers height (WH), back height (BH), chest depth (CD), chest girth (CG), and body length (BL) were recorded in the breeding season. Live weights of the goats were determined with a digital scale. CW, RH, WH, BH, CD, and BL were measured with a measuring stick, and FW, EL, HL, and CG were measured with a measuring tape.

The basic multiple linear regression model used to predict the live weight
with the LASSO and ALASSO model:

In the OLS method,

ALASSO modifies the original LASSO penalty by adding weights for each
parameter to the penalty term. These weights are data-defined weights,

The adjusted coefficient of determination (

The statistical evaluations were performed by using MEANS, CORR, GLM, and GLMSELECT procedures in SAS (2014). The R program was used to create a figure showing the correlations. The GLM procedure was used to eliminate age effect before performing OLS, and then the Ridge, LASSO, and ALASSO methods were applied.

Descriptive statistics regarding live weight and some body measurements for male and female goats.

There were 35 male (26.52 %) and 97 female (73.48 %) goats in the study.
Descriptive statistics regarding LW and biometric measurements (CW, RH, WH,
BH, CD, BL, FW, EL, HL, CG, and age) and the results of univariate analysis
of variance for all of variables in both genders are given in Table 1. It was
observed that there were significant differences (

The analyses were made after the data were corrected according to age.
Pearson correlation coefficients displaying relationships between live weight
and body measurements of Hair goats are presented by gender in Fig. 1. The
values for males are shown in Fig. 1a, and those for females are shown in
Fig. 1b. In Fig. 1, correlation coefficients greater than 0.5 were found to
be statistically significant for males (

OLS coefficients in multiple linear regression, tolerance, and VIF values for male and female goats.

Pearson correlation coefficients between live weight and biometric
body measurements for male

Regression coefficients, standard errors, tolerance values (TVs), and variance inflation factor (VIF) values are shown in Table 2 for both genders. The results revealed that all explanatory variables in the model explained 88.62 % of the variation in BL for males and 76.45 % for females. As shown in Table 2, there were VIF values of more than 10. VIF values for RH, WH, BH, and CD were found to be 77, 21, 51, and 11, respectively, in males. VIF values of RH, WH, and BH for females were 18, 11, and 13, respectively.

The coefficients and the standardized coefficients of Ridge, LASSO, and
ALASSO (

Coefficients and standardized coefficients of Ridge, LASSO, and ALASSO
(

VN: variable name; LW: live weight; FW: forehead width; EL: ear length; HL: head length; CW: chest width; RH: rump height; WH: withers height; BH: back height; CD: chest depth; CG: chest girth; BL: body length.

Coefficients and standardized coefficients of Ridge, LASSO, and ALASSO
(

VN: variable name; LW: live weight; FW: forehead width; EL: ear length; HL: head length; CW: chest width; RH: rump height; WH: withers height; BH: back height; CD: chest depth; CG: chest girth; BL: body length.

Coefficient progression with ALASSO (

Goodness-of-fit criteria for estimation equations of Ridge, LASSO, and ALASSO (

GFC: goodness-of-fit criteria; NV: number of variables;

In the current study we present the coefficient progression with AIC in Fig. 2a and b because we use AIC as a selection criterion. The selection process was done solely as visualized in Fig. 2. When the lowest AIC value was provided, the variable selection process was completed. As seen in Fig. 2, seven explanatory variables were selected for males: FW, EL, HL, WH, BH, CD, and CG. Five variables (FW, CW, WH, CG, and BL) were selected for females.

The present results show that there was a significant difference between the
genders in terms of body measurements in this study (

The correlation between LW and CG was found to be 0.87 for males and 0.83 for females (Fig. 1). The highest correlation coefficient with LW was revealed by CG for both genders. This was in agreement with the finding of previous studies (Pesmen and Yardimci, 2008; Cam et al., 2010; Tsegaye et al., 2013; Das and Yadav, 2015; Sam et al., 2016). The present study was focused the correlations between explanatory variables. Because there were high and significant correlations between explanatory variables, this study examined whether there was a multicollinearity problem. Previous studies have reported that when the tolerance values were less than 0.1 and VIF values were more than 10, the data had a multicollinearity problem (Montgomery et al., 2001; Yakubu, 2010; Dormann et al., 2013). According the results of OLS methods in MLR, the tolerance values found for RH, WH, BH ,and CD in males were 0.01255, 0.04779, 0.01947, and 0.08894, respectively, and corresponding VIF values were 77, 21, 51, and 11 (Table 2). Tolerance and VIF values for RH, WH, and BH in females were 0.05589, 0.09356, and 0.07891 and 18, 11, and 13 (Table 2). This result revealed that the current data set had a multicollinearity problem for both genders. It was emphasized by researchers that the multicollinearity implies that standard errors of regression coefficients are higher than expected, and, thus, it is difficult to find out the accuracy and robustness of the prediction models (Weisberg, 2005; Yakubu, 2009, 2010; Sangun et al., 2009).

In this study, where the variable selection for the data with multicollinearity is important, stepwise regression was not discussed because a previous study proposed that stepwise regression had some limitations and problems (Fan and Li, 2001; Shen and Ye, 2002; Whittingham et al., 2006). The body weight has been predicted from body structural and udder morphological traits in Frizarta dairy sheep, and it has been claimed that stepwise and LASSO regression selected the same variables with equal goodness-of-fit measurements (Kominakis et al., 2009). However, Kominakis et al. (2009) did not mention the multicollinearity problem.

In Ridge regression (in which coefficients of all explanatory variables are
estimated), the adjusted

When considering goodness-of-fit measurements for all methods (RMSE, AIC,
SBC, and ASE), except for Ridge regression, ALASSO (

In this study, the results from ALASSO (

In this study, LW was predicted from biometric measurement with high
accuracy for both male and female Hair goats by using ALASSO (

Data sets are available upon request by contacting the correspondence author.

The author declares that there is no conflict of interest.

I would like to thank Ibrahim Cemal and agricultural engineer Mustafa Varol for their permission to use the project data. Edited by: Manfred Mielenz Reviewed by: Ghobad Asgari Jafarabadi and one anonymous referee