Looking for breed differentiating SNP loci and for a SNP set for parentage testing in Mangalica

The whole genome of Mangalica animals has been screened on the Illumina porcine chip giving the possibility (1) to replace the previously applied ten microsatellite markers by nine SNP loci to classify the Blond, Swallow-Belly and Red Mangalica individuals into three different breed groups (P>0.95); (2) to propose 54 SNP loci for parentage testing in Mangalica pigs where the exclusion probability is 0.999115 if one parent is known and the probability of identity is 1.54×10-23.


Introduction
Since the commercial availability of 40K-60K or even 500K SNP chips, genome wide screening or genome wide association studies have been applied in domestic animals to identify SNPs to establish SNPs as genetic markers or to identify genes by causative SNPs which are responsible for a given trait (Andersson 2009).To identify loci associated with mono-or multigenic traits, the number of selected animals for genome wide association studies varied from 10 to 50 in dogs (Andersson 2009), horses (Orr et al. 2010) and cattle (Huang et al. 2010) or up to 311-820 individuals in cattle and pig (Kim et al. 2011, Fan et al. 2011).Searching for SNPs sensitive to the population structure in sheep (Kijas et al. 2009) or pig (Matsumoto et al. 2012) has also been successful, and the average number of animals here in a breed was 14 and 17, respectively.
SNP chips used in genome wide association studies are also useful to select for those SNPs which are applicable in parentage testing (Matukumalli et al. 2009).In pigs, based on Duroc, Landrace, Hampshire and Yorkshire breeds, 60 SNPs were proved to be more powerful than ten microsatellite loci (Rohrer et al. 2007).
The present study was aimed to build the basis to replace our previously applied microsatellite set used for both differentiation of the three Mangalica breed variants (Zsolnai et al. 2006) and for assignment of individuals in paternity or forensic tests (unpublished).This effort is described herein starting with the comparison of genetic distances obtained by different markers both in type and numbers, continuing with genome wide screening for handful SNP loci capable to fulfil breed differentiation and screening for an SNP set to perform parentage or identity tests.
As previously described (Egerszegi et al. 2003), Mangalica was bred by crossing of already extinct Hungarian and Mediterranean pig breeds in the 19th century (Figure 1).This breed was kept in very large numbers between the late 1800s and mid 1900s.However, twice in its history, it nearly disappeared.Scientific and breeding efforts supported its new development (Brüssow et al. 2005).Nowadays, Mangalica has three existing varieties, Blond (BM), Red (RM) and Swallow-Belly (SBM), and the Mangalica population is considerably growing in Hungary.This is due to the high quality of meat and meat products which is favoured in different cuisines and for niche food products.until DNA preparation.DNA was isolated from the samples using the Genomic DNA Maxi Kit (Geneaid, New Taipei City, Taiwan) according to the manufacturer's Frozen Blood Kit Protocol.

Analysis
For calculation of pairwise population F ST , those SNPs were selected from the entire SNP dataset for breed-pairs, of which call rate was 1 and the minor allele frequency was greater than 0.05.The number of the selected SNPs was 15297, 18208, 21057, 32273 and 33872 for the BM-SBM, BM-RM, SBM-RM, BM-Duroc and BM-Large White pairs, respectively.To calculate the F ST values between breeds, a preinstalled Eigensoft package (Patterson et al. 2006) was used on a BioSmack linux platform (Hong et al. 2012).
To find SNP loci which are able to differentiate Mangalica breeds, we have searched for differences in the allele effect in pairwise comparison, where Blond, Swallow-Belly and Red Mangalica were set up against all other breeds: BM vs. (RM+SBM+White), RM vs. (BM+SBM+White) and SBM vs. (RM+BM+White).In the experiment we have used SVS7 (Golden Helix, Bozeman, MT, USA) for principal component analysis aided quality check and genotype association tests; Genalex (Peakall & Smouse 2006) for building Structure format datasets and for calculating exclusion and identity probability values; and Structure (Falush et al. 2003) for population assignment, where 10 000 burn-in and 50 000 MCMC steps were applied.In SVS program the effect of alleles were tested using genotypic model and Chisquare test.Missing genotypes were not used as predictors.
For parentage testing, SNP alleles with call rate equal to 100 % and minor allele frequency higher than 0.40 were selected.The chromosomal assignment and position of the 202 loci meeting these criteria were obtained from the marker list of the PorcineSNP60 chip (http://www.illumina.com/products/porcinesnp60_dna_analysis_kit.ilmn).The calculated physical distance of adjacent SNPs positioned on the same chromosome was based on that list.Data filtering was performed by SVS7 software.

Results
Before any SNP genotyping of Mangalica samples, they were checked by microsatellite loci.Population assignments against our core population data (Zsolnai et al. 2006) had proved that the animals have matched into their expected category (P>0.9, data not shown).F ST values were calculated pairwise for the three Mangalica breed variants using SNPs selected for the described pairs, and were compared with the previously obtained microsatellite F ST values (Zsolnai et al. 2006).The microsatellite and SNP based F ST values displayed a strong correlation (r 2 =0.788).F ST values are presented in Table 1.These SNPs were tested by Structure program to determine their ability to separate Mangalica breed variants.All White animals were also incorporated into the iterations.The 24 SNPs were useful to perform assignment of the individuals to their corresponding groups with a probability higher than 0.8.Then systematically excluding the lower valued (−log 10 P) SNPs from the iterations, twelve, nine and six SNPs sets were used in the calculations and the assignment probabilities were determined.The best result for all animals (P>0.95) were achieved by nine SNPs (Figure 2).In case of the top ranked six SNPs the group identification of Mangalica individuals was further improved (P>0.98), but one White animal became misclassified (data not shown).For pedigree control 202 markers with call rate equal to one and with minor allele frequency higher than 0.4 in each Mangalica breed have been selected from the entire SNP data set.
The 202 loci were reduced to 54, based on the chromosomal location of the markers and the distance between two adjacent SNPs.The average physical distance between two adjacent loci on a given chromosome was 3.7×10 7 bp.The 54 loci covered all but the Y chromosome of the swine genome (Table 3).
Using our set of 54 SNPs, the number of loci needed for both identity (the chance that two animals have the same genotype) and exclusion (excluding one animal as a parent) probabilities were set at different levels.To determine identity, the numbers of SNPs needed at 0.01, 0.001 and 0.0001 probability levels were five, eight and ten, respectively.For exclusion, 23 and 34 SNPs were needed at the 0.99 and 0.999 probability levels, respectively, when both parents were known.
When only one parent was known in the simulation, the corresponding numbers of SNPs were 36 and 54, respectively.When all 54 SNPs were included in the calculation, the exclusion probability levels were 0.999115 and 0.999985 for one and two parents, respectively, while the probability level for identity was 1.54×10 -23 .In white pigs, using a panel of 60 SNPs (Rohrer et al. 2007), the corresponding values were 0.997391 and 0.999982 for one-and two-parents exclusion, respectively.The probability level for identity was 4.55×10 -23 .

Discussion
We have found in this breed differentiation study that F ST values obtained by thousands of SNP loci and by microsatellites (Zsolnai et al. 2006) are highly correlated.The observed F ST values are very likely relying on the breeding history of Mangalica (see Figure 1).It is assumed that BM was developed the earliest by crossbreeding older Hungarian pig varieties with the Serbian Sumadia breed and then BM was used to breed both RM and SBM (Egerszegi et al. 2003).The F ST data also indicated that the strongest and weakest genetic relationship is between the BM and SBM and the RM and SBM, respectively, while the strength of the BM-RM relationship is between them.Our study indicated that analysis of thousands of SNP loci For pedigree control 202 markers with call rate equal to one and with minor allele frequency higher than 0.4 in each Mangalica breed have been selected from the entire SNP data set.The 202 loci were reduced to 54, based on the chromosomal location of the markers and the distance between two adjacent SNPs.The average physical distance between two adjacent loci on a given chromosome was 3.7×10 7 bp.The 54 loci covered all but the Y chromosome of the swine genome (Table 3).
Using our set of 54 SNPs, the number of loci needed for both identity (the chance that two animals have the same genotype) and exclusion (excluding one animal as a parent) probabilities were set at different levels.To determine identity, the numbers of SNPs needed at 0.01, 0.001 and 0.0001 probability levels were five, eight and ten, respectively.For exclusion, 23 and 34 SNPs were needed at the 0.99 and 0.999 probability levels, respectively, when both parents were known.
When only one parent was known in the simulation, the corresponding numbers of SNPs were 36 and 54, respectively.When all 54 SNPs were included in the calculation, the exclusion probability levels were 0.999115 and 0.999985 for one and two parents, respectively, while the probability level for identity was 1.54×10 -23 .In white pigs, using a panel of 60 SNPs (Rohrer et al. 2007), the corresponding values were 0.997391 and 0.999982 for one-and two-parents exclusion, respectively.The probability level for identity was 4.55×10 -23 . in dozens of animals per Mangalica breed groups were as useful as using ten microsatellite loci with more than 50 animals per breed for characterisation.Similar SNP based approaches have been also performed in Meishan and White crossbred pigs (Matsumoto et al. 2012).Identification of a few SNPs for genotyping is desirable (Wang & Shete 2011).Whole SNP panelling is still expensive, while using only a few SNPs is more feasible both technically and financially.It was reported (Wilkinson et al. 2011) that among sixteen closely related cattle breeds about 200 SNPs are needed for separation.Frkonja et al. (2011) have systematically reduced the number of SNPs to 96 and 48 to detect the Red Holstein Friesian ratio in Swiss Fleckvieh individuals.We could demonstrate that as few as nine SNPs were sufficient for breed separation at a P>0.95 probability level in Mangalica pigs.However, it is mentionable here that such nine-SNP approach is no more able to describe the genetic variability in the studied breeds.On the other hand, these nine SNPs were chosen just for differentiation purposes.They can distinguish not only between Mangalica varieties, but also separate White pigs from Mangalica.It is suggested that after validation, using intensity values (Huang et al. 2010) of such SNPs, Mangalica and non-Mangalica ingredients can be distinguished and calculated in a processed food.Thereby, quality control of trade mark (Mangalica) food products can be promoted.Garcia et al. (2006) described a similar application to determine the ratio of Iberico in ham products by microsatellites.In addition, an effective SNP panel could also be useful for forensic applications where degraded samples might prohibit microsatellite typing (Dixon et al. 2006).


The SNP based comparison of breeds applied in our study could also be useful, if ancestry linked loci are requested amongst genetically more distinct breeds.In our example, a list of loci could be drawn to show the greatly influence of selection procedures in the Mangalica breed.
We have also identified SNPs for parentage testing in Mangalica.The identity and exclusion testing can be achieved at a very high probability level with the application of the selected 54 SNP loci.Rohrer et al. (2007) reported that 60 SNPs were useful in White pigs for parental exclusion, and the exclusion and identity probabilities are similar to the values obtained in our study.
In summary, it becomes evident that microsatellite genotyping can be replaced successfully by SNP genotyping.Employing the Illumina PorcineSNP60 chip, breed characterization and parentage testing could be done to describe in more detail the Mangalica breed variants.
Figure 1 Breeding scheme according to herd books Blood samples were obtained from the Hungarian Pig Tissue Biobank collected by the research consortium MANGFOOD at different farms of 80 Mangalica, including 24 BM, 33 SBM and 23 RM, and of 63 non-Mangalica (White pigs) including 10 Pietrain, 12 Large White, 3 H39 Hybrid, 12 Landrace, 12 Hampshire and 14 Duroc pigs.Samples were stored at −20 °C amples lood samples were obtained from the Hungarian Pig Tissue Biobank collected by the esearch consortium MANGFOOD at different farms of 80 Mangalica, including 24 M, 33 SBM and 23 RM, and of 63 non-Mangalica (White pigs) including 10 Pietrain, 2 Large White, 3 H39 Hybrid, 12 Landrace, 12 Hampshire and 14 Duroc pigs.Samples ere stored at −20 °C unl DNA preparaon.DNA was isolated from the samples sing the Genomic DNA Maxi Kit (Geneaid, New Taipei City, Taiwan) according to the anufacturer's Frozen Blood Kit Protocol.

Figure 2
Figure 2 Assignment of Mangalica and White pigs into four clades using nine selected SNPs using the Structure program (n=23 RM, 24 BM and 33 SBM, and 63 White crossbred pigs).Each animal refers to a vertical bar broken into four segments, representing an individual's estimated membership fraction in each of the four clusters.
Figure 2 Assignment of Mangalica and White pigs into four clades using nine selected SNPs using the Structure program (n=23 RM, 24 BM and 33 SBM, and 63 White crossbred pigs).Each animal refers to a vertical bar broken into four segments, representing an individual's estimated membership fraction in each of the four clusters.

Table 1
Garcia et al. 2006)s between Mangalica breed variants values based on ten microsatellite markers, SNP values based on SNP markers with minor allele frequency greater than 0.05, in the BM, SBM and RM groups, respectively.The SNP-based genetic distances between BM and other breeds were 0.24 to Duroc and 0.18 to Large White which is similar to the microsatellite-based F ST values reported by others (0.27 to Duroc and 0.21 to Large White;Garcia et al. 2006).The next experimental setups have yielded 24 SNPs with high −log 10 P values (Table2.) m

Table 2
24 breed discriminating SNP loci.A given Mangalica breed variant was compared to the other two Mangalica plus the white groups involved in the study to identify potential breed-discriminating SNP loci.

Table 3
Chromosomal positions of the 54 SNPs used for parentage testing in Mangalica pigs