Title: | Genome Wide Association Studies |
---|---|
Description: | Fast single trait Genome Wide Association Studies (GWAS) following the method described in Kang et al. (2010), <doi:10.1038/ng.548>. One of a series of statistical genetic packages for streamlining the analysis of typical plant breeding experiments developed by Biometris. |
Authors: | Bart-Jan van Rossum [aut, cre] , Willem Kruijer [aut] , Fred van Eeuwijk [ctb] , Martin Boer [ctb] , Marcos Malosetti [ctb] , Daniela Bustos-Korts [ctb] , Emilie Millet [ctb] , Joao Paulo [ctb] , Maikel Verouden [ctb] , Ron Wehrens [ctb] , Choazhi Zheng [ctb] |
Maintainer: | Bart-Jan van Rossum <[email protected]> |
License: | GPL-3 |
Version: | 1.0.9.9000 |
Built: | 2024-10-12 04:26:50 UTC |
Source: | https://github.com/biometris/statgengwas |
codeMarkers
codes markers in a gData
object and optionally
performs imputation of missing values as well.
The function performs the following steps:
replace strings in naStrings
by NA
.
remove genotypes with a fraction of missing values higher than
nMissGeno
.
remove SNPs with a fraction of missing values higher than
nMiss
.
recode SNPs to numerical values.
remove SNPs with a minor allele frequency lower than MAF
.
optionally remove duplicate SNPs.
optionally impute missing values.
repeat steps 5. and 6. if missing values are imputed.
codeMarkers( gData, refAll = "minor", nMissGeno = 1, nMiss = 1, MAF = NULL, MAC = NULL, removeDuplicates = TRUE, keep = NULL, impute = TRUE, imputeType = c("random", "fixed", "beagle"), fixedValue = NULL, naStrings = NA, verbose = FALSE )
codeMarkers( gData, refAll = "minor", nMissGeno = 1, nMiss = 1, MAF = NULL, MAC = NULL, removeDuplicates = TRUE, keep = NULL, impute = TRUE, imputeType = c("random", "fixed", "beagle"), fixedValue = NULL, naStrings = NA, verbose = FALSE )
gData |
An object of class |
refAll |
A character string indicating the reference allele used when
recoding markers. |
nMissGeno |
A numerical value between 0 and 1. Genotypes with a
fraction of missing values higher than |
nMiss |
A numerical value between 0 and 1. SNPs with a fraction of
missing values higher than |
MAF |
A numerical value between 0 and 1. SNPs with a Minor Allele
Frequency (MAF) below this value will be removed. Only one of |
MAC |
A numerical value. SNPs with Minor Allele Count (MAC) below this
value will be removed. Only one of |
removeDuplicates |
Should duplicate SNPs be removed? |
keep |
A vector of SNPs that should never be removed in the whole process. |
impute |
Should imputation of missing values be done? |
imputeType |
A character string indicating what kind of imputation of
values should be done.
|
fixedValue |
A numerical value used for replacing missing values in
case |
naStrings |
A character vector of strings to be treated as NA. |
verbose |
Should a summary of the performed steps be printed? |
A copy of the input gData
object with markers replaced by
coded and imputed markers.
S R Browning and B L Browning (2007) Rapid and accurate haplotype phasing and missing data inference for whole genome association studies by use of localized haplotype clustering. Am J Hum Genet 81:1084-1097. doi:10.1086/521987
## Create markers markers <- matrix(c( "AA", "AB", "AA", "BB", "BA", "AB", "AA", "AA", NA, "AA", "AA", "AA", "BB", "BB", "AA", "AA", "BB", "AA", NA, "AA", "AA", "BA", "AB", "BB", "AB", "AB", "AA", "BB", NA, "AA", "AA", "AA", "BB", "BB", "AA", "AA", "AA", "AA", NA, "AA", "AA", "AA", "BB", "BB", "AA", "BB", "BB", "BB", "AB", "AA", "AA", "AA", "BB", "BB", "AA", NA, "BB", "AA", NA, "AA", "AB", "AB", "BB", "BB", "BB", "AA", "BB", "BB", NA, "AB", "AA", "AA", NA, "BB", NA, "AA", "AA", "AA", "AA", "AA", "AA", NA, NA, "BB", "BB", "BB", "BB", "BB", "AA", "AA", "AA", NA, "AA", "BB", "BB", "BB", "AA", "AA", NA, "AA"), ncol = 10, byrow = TRUE, dimnames = list(paste0("IND", 1:10), paste0("SNP", 1:10))) ## create object of class 'gData'. gData <- createGData(geno = markers) ## Code markers by minor allele, no imputation. gDataCoded1 <- codeMarkers(gData = gData, impute = FALSE) ## Code markers by reference alleles, impute missings by fixed value. gDataCoded2 <- codeMarkers(gData = gData, refAll = rep(x = c("A", "B"), times = 5), impute = TRUE, imputeType = "fixed", fixedValue = 1) ## Code markers by minor allele, impute by random value. gDataCoded3 <- codeMarkers(gData = gData, impute = TRUE, imputeType = "random")
## Create markers markers <- matrix(c( "AA", "AB", "AA", "BB", "BA", "AB", "AA", "AA", NA, "AA", "AA", "AA", "BB", "BB", "AA", "AA", "BB", "AA", NA, "AA", "AA", "BA", "AB", "BB", "AB", "AB", "AA", "BB", NA, "AA", "AA", "AA", "BB", "BB", "AA", "AA", "AA", "AA", NA, "AA", "AA", "AA", "BB", "BB", "AA", "BB", "BB", "BB", "AB", "AA", "AA", "AA", "BB", "BB", "AA", NA, "BB", "AA", NA, "AA", "AB", "AB", "BB", "BB", "BB", "AA", "BB", "BB", NA, "AB", "AA", "AA", NA, "BB", NA, "AA", "AA", "AA", "AA", "AA", "AA", NA, NA, "BB", "BB", "BB", "BB", "BB", "AA", "AA", "AA", NA, "AA", "BB", "BB", "BB", "AA", "AA", NA, "AA"), ncol = 10, byrow = TRUE, dimnames = list(paste0("IND", 1:10), paste0("SNP", 1:10))) ## create object of class 'gData'. gData <- createGData(geno = markers) ## Code markers by minor allele, no imputation. gDataCoded1 <- codeMarkers(gData = gData, impute = FALSE) ## Code markers by reference alleles, impute missings by fixed value. gDataCoded2 <- codeMarkers(gData = gData, refAll = rep(x = c("A", "B"), times = 5), impute = TRUE, imputeType = "fixed", fixedValue = 1) ## Code markers by minor allele, impute by random value. gDataCoded3 <- codeMarkers(gData = gData, impute = TRUE, imputeType = "random")
This dataset comes from the European Union project DROPS (DROught-tolerant
yielding PlantS). A panel of 256 maize hybrids was grown with two water
regimes (irrigated or rainfed), in seven fields in 2012 and 2013,
respectively, spread along a climatic transect from western to eastern
Europe, plus one site in Chile in 2013. This resulted in 28 experiments
defined as the combination of one year, one site and one water regime, with
two and three repetitions for rainfed and irrigated treatments, respectively.
A detailed environmental characterisation was carried out, with hourly
records of micrometeorological data and soil water status, and associated
with precise measurement of phenology. Grain yield and its components were
measured at the end of the experiment.
10 experiments have been selected from the full data set, two for each of
the five main environmental scenarios that were identified in the data. The
scenarios have been added to the data as well as a classification of the
genotypes in four genetic groups.
The main purpose of this dataset
consists in using the environmental characterization to quantify the genetic
variability of maize grain yield in response to the environmental drivers
for genotype-by-environment interaction. For instance, allelic effects at
QTLs identified over the field network are consistent within a scenario but
largely differ between scenarios.
The data is split in three separate data.frames.
This data.frame contains the 50K genotyping
matrix coded in allelic dose (012) filtered and imputed. Genotyping of
41,722 loci on 246 parental lines were obtained using 50K Illumina Infinium
HD arrays (Ganal et al., 2011). Genotype were coded in allelic dose with 0
for the minor allele, 1 for the heterozygote, and 2 for the major allele.
Genotype were filtered (MAF > 1%) and missing data imputed using Beagle v3.
A data.frame with 246 rows and 41723 columns.
name of the genotype
coded QTLs
This data.frame contains the description of the
41,722 loci genotyped by 50K Illumina Infinium Array on the 246 lines.
A data.frame with 41722 rows and 5 columns.
name of the SNP
number of the B73 reference genome V2
position on the B73 reference genome V2 in basepairs
first original allele (A, T, G or C)
second original allele (A, T, G or C)
This data.frame contains the genotypic means
(Best Linear Unbiased Estimators, BLUEs), with one value per experiment
(Location × year × water regime) per genotype.
A data.frame with 2460 rows and 19 columns.
experiments ID described by the three first letters of the city’s name followed by the year of experiment and the water regime with W for watered and R for rain-fed.
identifier of donor dent line
identifier of the genotype
project in which the genetic material was generated
genotypic mean for yield adjusted at 15% grain moisture, in ton per hectare (t ha^-1)
genotypic mean for number of grain per square meter
genotypic mean for individual grain weight in milligram (mg)
genotypic mean for male flowering (pollen shed), in thermal time cumulated since emergence (d_20°C)
genotypic mean for female flowering (silking emergence), in thermal time cumulated since emergence (d_20°C)
genotypic mean for plant height, from ground level to the base of the flag leaf (highest) leaf in centimeter (cm)
genotypic mean for plant height including tassel, from ground level to the highest point of the tassel in centimeter (cm)
genotypic mean for ear insertion height, from ground level to ligule of the highest ear leaf in centimeter (cm)
year in which the experiment was performed
location where the experiment was performed, a three letter abbreviation
water scenario for the experiment, well watered (WW) or water deficit (WD)
temperature scenario for the experiment, Cool, Hot or Hot(Day)
the full scenario for the experiment, a combination of scenarioWater and scenarioTemp
the genetic group to which the genotype belongs
From the source data, the experiments from 2011 have been removed since these do not contain all genotypes. Also the experiment Gra13W has been removed.
Millet, E. J., Pommier, C., et al. (2019). A multi-site experiment in a network of European fields for assessing the maize yield response to environmental scenarios - Data set. doi:10.15454/IASSTN
Ganal MW, et al. (2011) A Large Maize (Zea mays L.) SNP Genotyping Array: Development and Germplasm Genotyping, and Genetic Mapping to Compare with the B73 Reference Genome. PLoS ONE 6(12): e28334. doi:10.1371/journal.pone.0028334
createGData
creates an object of S3 class gData with genotypic and
phenotypic data for usage in further analysis. All input to the function is
optional, however at least one input should be provided. It is possible to
provide an existing gData
object as additional input in which case
data is added to this object. Existing data will be overwritten with a
warning.
createGData( gData = NULL, geno = NULL, map = NULL, kin = NULL, pheno = NULL, covar = NULL )
createGData( gData = NULL, geno = NULL, map = NULL, kin = NULL, pheno = NULL, covar = NULL )
gData |
An optional gData object to be modified. If |
geno |
A matrix or data.frame with genotypes in the rows and markers in
the columns. A matrix from the |
map |
A data.frame with columns |
kin |
A kinship matrix or list of kinship matrices with genotype in
rows and colums. These matrices can be from the |
pheno |
A data.frame or a list of data.frames with phenotypic data,
with genotypes in the first column |
covar |
A data.frame with extra covariates per genotype. Genotypes should be in the rows. |
An object of class gData
with the following components:
map |
a data.frame containing map data. Map is sorted by chromosome and position. |
markers |
a matrix containing marker information. |
pheno |
a list of data.frames containing phenotypic data. |
kinship |
a kinship matrix. |
covar |
a data.frame with extra covariates. |
Bart-Jan van Rossum
set.seed(1234) ## Create genotypic data. geno <- matrix(sample(x = c(0, 1, 2), size = 15, replace = TRUE), nrow = 3) dimnames(geno) <- list(paste0("G", 1:3), paste0("M", 1:5)) ## Construct map. map <- data.frame(chr = c(1, 1, 2, 2, 2), pos = 1:5, row.names = paste0("M", 1:5)) ## Compute kinship matrix. kin <- kinship(X = geno, method = "IBS") ## Create phenotypic data. pheno <- data.frame(paste0("G", 1:3), matrix(rnorm(n = 12, mean = 50, sd = 5), nrow = 3), stringsAsFactors = FALSE) dimnames(pheno) = list(paste0("G", 1:3), c("genotype", paste0("T", 1:4))) ## Combine all data in gData object. gData <- createGData(geno = geno, map = map, kin = kin, pheno = pheno) summary(gData) ## Construct covariate. covar <- data.frame(C1 = c("a", "a", "b"), row.names = paste0("G", 1:3)) ## Compute alternative kinship matrix. kin2 <- kinship(X = geno, method = "astle") ## Add covariates to previously created gData object and overwrite ## current kinship matrix by newly computed one. gData2 <- createGData(gData = gData, kin = kin2, covar = covar)
set.seed(1234) ## Create genotypic data. geno <- matrix(sample(x = c(0, 1, 2), size = 15, replace = TRUE), nrow = 3) dimnames(geno) <- list(paste0("G", 1:3), paste0("M", 1:5)) ## Construct map. map <- data.frame(chr = c(1, 1, 2, 2, 2), pos = 1:5, row.names = paste0("M", 1:5)) ## Compute kinship matrix. kin <- kinship(X = geno, method = "IBS") ## Create phenotypic data. pheno <- data.frame(paste0("G", 1:3), matrix(rnorm(n = 12, mean = 50, sd = 5), nrow = 3), stringsAsFactors = FALSE) dimnames(pheno) = list(paste0("G", 1:3), c("genotype", paste0("T", 1:4))) ## Combine all data in gData object. gData <- createGData(geno = geno, map = map, kin = kin, pheno = pheno) summary(gData) ## Construct covariate. covar <- data.frame(C1 = c("a", "a", "b"), row.names = paste0("G", 1:3)) ## Compute alternative kinship matrix. kin2 <- kinship(X = geno, method = "astle") ## Add covariates to previously created gData object and overwrite ## current kinship matrix by newly computed one. gData2 <- createGData(gData = gData, kin = kin2, covar = covar)
A collection of functions for calculating kinship matrices using different algorithms. The following algorithms are included: astle (Astle and Balding, 2009), Identity By State (IBS) and VanRaden (VanRaden, 2008) for marker matrices. For method identity an identity kinship matrix is returned.
kinship( X, method = c("astle", "IBS", "vanRaden", "identity"), MAF = NULL, denominator = NULL )
kinship( X, method = c("astle", "IBS", "vanRaden", "identity"), MAF = NULL, denominator = NULL )
X |
An n x m marker matrix with genotypes in the rows (n) and markers in the columns (m). |
method |
The method used for computing the kinship matrix. |
MAF |
The minor allele frequency (MAF) threshold used in kinship computation. A numerical value between 0 and 1. SNPs with MAF below this value are not taken into account when computing the kinship. If NULL all markers are used regardless of their allele frequency. |
denominator |
A numerical value. See details. |
An n x n kinship matrix.
In all algorithms the input matrix X
is first cleaned, i.e. markers
with a variance of 0 are excluded from the calculation of the kinship matrix.
Then some form of scaling is done which differs per algorithm. This gives a
scaled matrix Z
. The matrix is returned.
By default the denominator is equal to the number of columns in
Z
for
astle
and IBS
and where
for
vanRaden
. This denominator
can be overwritten by the user, e.g. when computing kinship matrices by
splitting X
in smaller matrices and then adding the results together
in the end.
Astle, William, and David J. Balding. 2009. “Population Structure and Cryptic Relatedness in Genetic Association Studies.” Statistical Science 24 (4): 451–71. doi:10.1214/09-sts307.
VanRaden P.M. (2008) Efficient methods to compute genomic predictions. Journal of Dairy Science 91 (11): 4414–23. doi:10.3168/jds.2007-0980.
## Create example matrix. M <- matrix(c(1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1), nrow = 4) ## Compute kinship matrices using different methods. kinship(M, method = "astle") kinship(M, method = "IBS") kinship(M, method = "vanRaden") ## Only use markers with a Minor Allele Frequency of 0.3 or more. kinship(M, method = "astle", MAF = 0.3) ## Compute kinship matrix using astle and balding method with denominator 2. kinship(M, method = "astle", denominator = 2)
## Create example matrix. M <- matrix(c(1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1), nrow = 4) ## Compute kinship matrices using different methods. kinship(M, method = "astle") kinship(M, method = "IBS") kinship(M, method = "vanRaden") ## Only use markers with a Minor Allele Frequency of 0.3 or more. kinship(M, method = "astle", MAF = 0.3) ## Compute kinship matrix using astle and balding method with denominator 2. kinship(M, method = "astle", denominator = 2)
gData
Creates a plot of the genetic map in an object of S3 class gData
. A
plot of the genetic map showing the length of the chromosomes and the
positions of the markers in the genetic map is created.
## S3 method for class 'gData' plot(x, ..., highlight = NULL, title = NULL, output = TRUE)
## S3 method for class 'gData' plot(x, ..., highlight = NULL, title = NULL, output = TRUE)
x |
An object of class |
... |
Not used. |
highlight |
A data.frame with at least columns chr and pos, containing marker positions that should be highlighted in the plot. If a column "name" is present that is used for annotation, otherwise the highlighted markers are annotated as chr\@pos#' |
title |
A character string, the title of the plot. |
output |
Should the plot be output to the current device? If
|
An object of class ggplot
is invisibly returned.
set.seed(1234) ## Create genotypic data. geno <- matrix(sample(x = c(0, 1, 2), size = 15, replace = TRUE), nrow = 3) dimnames(geno) <- list(paste0("G", 1:3), paste0("M", 1:5)) ## Construct map. map <- data.frame(chr = c(1, 1, 2, 2, 2), pos = 1:5, row.names = paste0("M", 1:5)) ## Compute kinship matrix. kin <- kinship(X = geno, method = "IBS") ## Create phenotypic data. pheno <- data.frame(paste0("G", 1:3), matrix(rnorm(n = 12, mean = 50, sd = 5), nrow = 3), stringsAsFactors = FALSE) dimnames(pheno) = list(paste0("G", 1:3), c("genotype", paste0("T", 1:4))) ## Combine all data in gData object. gData <- createGData(geno = geno, map = map, kin = kin, pheno = pheno) ## Plot genetic map. plot(gData) ## Plot genetic map. Highlight first marker in map. plot(gData, highlight = map[1, ])
set.seed(1234) ## Create genotypic data. geno <- matrix(sample(x = c(0, 1, 2), size = 15, replace = TRUE), nrow = 3) dimnames(geno) <- list(paste0("G", 1:3), paste0("M", 1:5)) ## Construct map. map <- data.frame(chr = c(1, 1, 2, 2, 2), pos = 1:5, row.names = paste0("M", 1:5)) ## Compute kinship matrix. kin <- kinship(X = geno, method = "IBS") ## Create phenotypic data. pheno <- data.frame(paste0("G", 1:3), matrix(rnorm(n = 12, mean = 50, sd = 5), nrow = 3), stringsAsFactors = FALSE) dimnames(pheno) = list(paste0("G", 1:3), c("genotype", paste0("T", 1:4))) ## Combine all data in gData object. gData <- createGData(geno = geno, map = map, kin = kin, pheno = pheno) ## Plot genetic map. plot(gData) ## Plot genetic map. Highlight first marker in map. plot(gData, highlight = map[1, ])
GWAS
Creates a plot of an object of S3 class GWAS
. The following types of
plot can be made:
a manhattan plot, i.e. a plot of LOD-scores per SNP
a QQ plot of observed LOD-scores versus expected LOD-scores
a qtl plot of effect sizes and directions for multiple traits
Manhattan plots and QQ plots are made for a single trait which
should be indicated using the parameter trait
. If the analysis was
done for only one trait, it is detected automatically. The qtl plot will plot
all traits analyzed.
See details for a detailed description of the plots and the plot options
specific to the different plots.
## S3 method for class 'GWAS' plot( x, ..., plotType = c("manhattan", "qq", "qtl"), trial = NULL, trait = NULL, title = NULL, output = TRUE )
## S3 method for class 'GWAS' plot( x, ..., plotType = c("manhattan", "qq", "qtl"), trial = NULL, trait = NULL, title = NULL, output = TRUE )
x |
An object of class |
... |
Further arguments to be passed on to the actual plotting functions. |
plotType |
A character string indicating the type of plot to be made. One of "manhattan", "qq" and "qtl". |
trial |
A character string or numeric index indicating for which
trial the plot should be made. If |
trait |
A character string indicating for which trait the results
should be plotted. For |
title |
A character string, the title of the plot. |
output |
Should the plot be output to the current device? If
|
A LOD-profile of all marker positions and corresponding LOD-scores is
plotted. Significant markers are highlighted with red dots. By default these
are taken from the result of the GWAS analysis however the LOD-threshold for
significant parameters may be modified using the parameter yThr
. The
threshold is plotted as a horizontal line. If there are previously known
marker effect, false positives and true negatives can also be marked.
Extra parameter options:
xLab
A character string, the x-axis label. Default =
"Chromosomes"
yLab
A character string, the y-axis label. Default =
-log10(p)
effects
A character vector, indicating which SNPs correspond to a real (known) effect. Used for determining true/false positives and false negatives. True positives are colored green, false positives orange and false negatives yellow.
colPalette
A color palette used for plotting. Default coloring is done by chromosome, using black and grey.
yThr
A numerical value for the LOD-threshold. The value from the GWAS analysis is used as default.
signLwd
A numerical value giving the thickness of the points that are false/true positives/negatives. Default = 0.6
lod
A positive numerical value. For the SNPs with a LOD-value below this value, only 5% is plotted. The chance of a SNP being plotted is proportional to its LOD-score. This option can be useful when plotting a large number of SNPs. The 5% of SNPs plotted is selected randomly. For reproducible results use set.seed before calling the function.
chr
A vector of chromosomes to be plotted. By default, all chromosomes are plotted. Using this option allows restricting the plot to a subset of chromosomes.
startPos
A numerical value indicating the start position for
the plot. Using this option allows restricting the plot to a part of a
selected chromosome. Only used if exactly one chromosome is specified in
chr
.
endPos
A numerical value indicating the end position for
the plot. Using this option allows restricting the plot to a part of a
selected chromosome. Only used if exactly one chromosome is specified in
chr
.
From the LOD-scores calculated in the GWAS analysis, a QQ-plot is generated with observed LOD-scores versus expected LOD-scores. Code is adapted from Segura et al. (2012).
A plot of effect sizes for the significant SNPs found in the GWAS analysis
is created. Each horizontal line contains QTLs of one trait,
phenotypic trait or trial. Optionally, vertical white lines can indicate
chromosome subdivision, genes of interest, known QTL, etc. Circle diameters
are proportional to the absolute value of allelic effect. Colors indicate the
direction of the effect: green when the allele increases the trait value,
and blue when it decreases the value.
Extra parameter options:
normalize
Should the snpEffect be normalized? Default =
FALSE
sortData
Should the data be sorted before plotting? Either
FALSE
, if no sorting should be done, or a character string indicating
the data column to use for sorting. This should be a numerical column.
Default = FALSE
binPositions
An optional data.frame containing at leasts two
columns, chr(omosome) and pos(ition). Vertical lines are plotted at those
positions. Default = NULL
printVertGrid
Should default vertical grid lines be plotted.
Default = TRUE
yLab
A character string, the y-axis label. Default =
"Traits"
yThr
A numerical value for the LOD-threshold. The value from the GWAS analysis is used as default.
chr
A vector of chromosomes to be plotted. By default all chromosomes are plotted. Using this option this can be restricted to a subset of chromosomes.
exportPptx
Should the plot be exported to a .pptx file?
Default = FALSE
pptxName
A character string, the name of the .pptx file to
which the plot is exported. Ignored if exportPptx = FALSE
.
Millet et al. (2016) Genome-wide analysis of yield in Europe: Allelic effects vary with drought and heat scenarios. Plant Physiology, October 2016, Vol. 172, p. 749–764
Segura et al. (2012) An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nature Genetics, June 2012, Vol. 44, p. 825–830.
## Create a gData object Using the data from the DROPS project. ## See the included vignette for a more extensive description on the steps. data(dropsMarkers) data(dropsMap) data(dropsPheno) ## Add genotypes as row names of dropsMarkers and drop Ind column. rownames(dropsMarkers) <- dropsMarkers[["Ind"]] dropsMarkers <- dropsMarkers[colnames(dropsMarkers) != "Ind"] ## Add genotypes as row names of dropsMap. rownames(dropsMap) <- dropsMap[["SNP.names"]] ## Rename Chomosome and Position columns. colnames(dropsMap)[match(c("Chromosome", "Position"), colnames(dropsMap))] <- c("chr", "pos") ## Convert phenotypic data to a list. dropsPhenoList <- split(x = dropsPheno, f = dropsPheno[["Experiment"]]) ## Rename Variety_ID to genotype and select relevant columns. dropsPhenoList <- lapply(X = dropsPhenoList, FUN = function(trial) { colnames(trial)[colnames(trial) == "Variety_ID"] <- "genotype" trial <- trial[c("genotype", "grain.yield", "grain.number", "seed.size", "anthesis", "silking", "plant.height", "tassel.height", "ear.height")] return(trial) }) ## Create gData object. gDataDrops <- createGData(geno = dropsMarkers, map = dropsMap, pheno = dropsPhenoList) ## Run single trait GWAS for trait 'grain.yield' for trial Mur13W. GWASDrops <- runSingleTraitGwas(gData = gDataDrops, trials = "Mur13W", traits = "grain.yield") ## Create a manhattan plot. plot(GWASDrops) ## Manually set a threshold for significant snps and add a title. plot(GWASDrops, yThr = 3.5, title = "Manhattan plot for Mur13W") ## Restrict plot to part of chr 6. plot(GWASDrops, yThr = 3.5, chr = 6, startPos = 0, endPos = 110000000) ## Create a qq plot. plot(GWASDrops, plotType = "qq", title = "QQ plot for Mur13W") ## Create a QTL plot. plot(GWASDrops, plotType = "qtl", title = "QTL plot for Mur13W") ## Manually set a threshold and don't show vertical lines. plot(GWASDrops, plotType = "qtl", yThr = 3.5, printVertGrid = FALSE, title = "QTL plot for Mur13W")
## Create a gData object Using the data from the DROPS project. ## See the included vignette for a more extensive description on the steps. data(dropsMarkers) data(dropsMap) data(dropsPheno) ## Add genotypes as row names of dropsMarkers and drop Ind column. rownames(dropsMarkers) <- dropsMarkers[["Ind"]] dropsMarkers <- dropsMarkers[colnames(dropsMarkers) != "Ind"] ## Add genotypes as row names of dropsMap. rownames(dropsMap) <- dropsMap[["SNP.names"]] ## Rename Chomosome and Position columns. colnames(dropsMap)[match(c("Chromosome", "Position"), colnames(dropsMap))] <- c("chr", "pos") ## Convert phenotypic data to a list. dropsPhenoList <- split(x = dropsPheno, f = dropsPheno[["Experiment"]]) ## Rename Variety_ID to genotype and select relevant columns. dropsPhenoList <- lapply(X = dropsPhenoList, FUN = function(trial) { colnames(trial)[colnames(trial) == "Variety_ID"] <- "genotype" trial <- trial[c("genotype", "grain.yield", "grain.number", "seed.size", "anthesis", "silking", "plant.height", "tassel.height", "ear.height")] return(trial) }) ## Create gData object. gDataDrops <- createGData(geno = dropsMarkers, map = dropsMap, pheno = dropsPhenoList) ## Run single trait GWAS for trait 'grain.yield' for trial Mur13W. GWASDrops <- runSingleTraitGwas(gData = gDataDrops, trials = "Mur13W", traits = "grain.yield") ## Create a manhattan plot. plot(GWASDrops) ## Manually set a threshold for significant snps and add a title. plot(GWASDrops, yThr = 3.5, title = "Manhattan plot for Mur13W") ## Restrict plot to part of chr 6. plot(GWASDrops, yThr = 3.5, chr = 6, startPos = 0, endPos = 110000000) ## Create a qq plot. plot(GWASDrops, plotType = "qq", title = "QQ plot for Mur13W") ## Create a QTL plot. plot(GWASDrops, plotType = "qtl", title = "QTL plot for Mur13W") ## Manually set a threshold and don't show vertical lines. plot(GWASDrops, plotType = "qtl", yThr = 3.5, printVertGrid = FALSE, title = "QTL plot for Mur13W")
Read PLINK binary data and save in gData format. This is a wrapper around
read.plink
in the Bioconductor package
snpStats
. This package needs to be installed for the function to
work.
readPLINK(bed, bim, fam, ...)
readPLINK(bed, bim, fam, ...)
bed |
The name of the file containing the packed binary SNP genotype data. It should have the extension .bed; If it doesn't, then this extension will be appended. |
bim |
The file containing the SNP descriptions. If not specified
|
fam |
The file containing subject (and, possibly, family) identifiers.
This is basically a tab-delimited "pedfile". If not specified
|
... |
Further arguments passed to |
An object of class gData
.
runSingleTraitGwas
performs a single-trait Genome Wide Association
Study (GWAS) on phenotypic and genotypic data contained in a gData
object. A covariance matrix is computed using the EMMA algorithm (Kang et
al., 2008) or the Newton-Raphson algorithm (Tunnicliffe, 1989) in the
sommer
package. Then a Generalized Least Squares (GLS) method is used
for estimating the marker effects and corresponding p-values. This is done
using either one kinship matrix for all chromosomes or different
chromosome-specific kinship matrices for each chromosome. Significant SNPs
are selected based on a user defined threshold.
runSingleTraitGwas( gData, traits = NULL, trials = NULL, covar = NULL, snpCov = NULL, kin = NULL, kinshipMethod = c("astle", "IBS", "vanRaden"), remlAlgo = c("EMMA", "NR"), GLSMethod = c("single", "multi"), useMAF = TRUE, MAF = 0.01, MAC = 10, genomicControl = FALSE, thrType = c("bonf", "fixed", "small", "fdr"), alpha = 0.05, LODThr = 4, nSnpLOD = 10, pThr = 0.05, rho = 0.5, sizeInclRegion = 0, minR2 = 0.5, nCores = NULL )
runSingleTraitGwas( gData, traits = NULL, trials = NULL, covar = NULL, snpCov = NULL, kin = NULL, kinshipMethod = c("astle", "IBS", "vanRaden"), remlAlgo = c("EMMA", "NR"), GLSMethod = c("single", "multi"), useMAF = TRUE, MAF = 0.01, MAC = 10, genomicControl = FALSE, thrType = c("bonf", "fixed", "small", "fdr"), alpha = 0.05, LODThr = 4, nSnpLOD = 10, pThr = 0.05, rho = 0.5, sizeInclRegion = 0, minR2 = 0.5, nCores = NULL )
gData |
An object of class |
traits |
A vector of traits on which to run GWAS. These can be either
numeric indices or character names of columns in |
trials |
A vector of trials on which to run GWAS. These can
be either numeric indices or character names of list items in |
covar |
An optional vector of covariates taken into account when
running GWAS. These can be either numeric indices or character names of
columns in |
snpCov |
An optional character vector of snps to be included as covariates. |
kin |
An optional kinship matrix or list of kinship matrices. These
matrices can be from the |
kinshipMethod |
An optional character indicating the method used for
calculating the kinship matrix(ces). Currently "astle" (Astle and Balding,
2009), "IBS" and "vanRaden" (VanRaden, 2008) are supported. If a
kinship matrix is supplied either in |
remlAlgo |
A character string indicating the algorithm used to estimate
the variance components. Either |
GLSMethod |
A character string indicating the method used to estimate
the marker effects. Either |
useMAF |
Should the minor allele frequency be used for selecting SNPs
for the analysis. If |
MAF |
The minor allele frequency (MAF) threshold used in GWAS. A
numerical value between 0 and 1. SNPs with MAF below this value are not taken
into account in the analysis, i.e. p-values and effect sizes are put to
missing ( |
MAC |
A numerical value. SNPs with minor allele count below this value
are not taken into account for the analysis, i.e. p-values and effect sizes
are set to missing ( |
genomicControl |
Should genomic control correction as in Devlin and Roeder (1999) be applied? |
thrType |
A character string indicating the type of threshold used for the selection of candidate loci. See Details. |
alpha |
A numerical value used for calculating the LOD-threshold for
|
LODThr |
A numerical value used as a LOD-threshold when
|
nSnpLOD |
A numerical value indicating the number of SNPs with the
smallest p-values that are selected when |
pThr |
A numerical value just as the cut off value for p-Values for
|
rho |
A numerical value used a the minimum value for SNPs to be
considered correlated when using |
sizeInclRegion |
An integer. Should the results for SNPs close to significant SNPs be included? If so, the size of the region in centimorgan or base pairs. Otherwise 0. |
minR2 |
A numerical value between 0 and 1. Restricts the SNPs included
in the region close to significant SNPs to only those SNPs that are in
sufficient Linkage Disequilibrium (LD) with the significant snp, where LD
is measured in terms of |
nCores |
A numerical value indicating the number of cores to be used by
the parallel part of the algorithm. If |
An object of class GWAS
.
For the selection of candidate loci from the GWAS output four different
methods can be used. The method used can be specified in the function
parameter thrType
. Further parameters can be used to fine tune the
method.
The Bonferroni threshold, a LOD-threshold of
, where m is the number of SNPs and alpha can be
specified by the function parameter
alpha
A fixed LOD-threshold, specified by the function parameter
LODThr
The n SNPs with with the smallest p-Values are selected. n can
be specified in nSnpLOD
Following the algorithm proposed by Brzyski D. et al. (2017),
SNPs are selected in such a way that the False Discovery Rate (fdr) is
minimized. To do this, first the SNPs are restricted to the SNPs with a
p-Value below pThr
. Then clusters of SNPs are created using a
two step iterative process in which first the SNP with the lowest p-Value is
selected as cluster representative. This SNP and all SNPs that have a
correlation with this SNP of or higher will form a cluster.
can be specified as an argument in the function and has a default
value of 0.5, which is a recommended starting value in practice.
The selected SNPs are removed from the list and the procedure repeated until
no SNPs are left. Finally to determine the number of significant clusters
the first cluster is determined for which the p-Value of the cluster
representative is not smaller than
where
m is the number of SNPs and alpha can be specified by the function parameter
alpha
. All previous clusters are selected as significant.
Astle, William, and David J. Balding. 2009. Population Structure and Cryptic Relatedness in Genetic Association Studies. Statistical Science 24 (4): 451–71. doi:10.1214/09-sts307.
Brzyski D. et al. (2017) Controlling the Rate of GWAS False Discoveries. Genetics 205 (1): 61-75. doi:10.1534/genetics.116.193987
Devlin, B., and Kathryn Roeder. 1999. Genomic Control for Association Studies. Biometrics 55 (4): 997–1004. doi:10.1111/j.0006-341x.1999.00997.x.
Kang et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics 178 (3): 1709–23. doi:10.1534/genetics.107.080101.
Millet, E. J., Pommier, C., et al. (2019). A multi-site experiment in a network of European fields for assessing the maize yield response to environmental scenarios - Data set. doi:10.15454/IASSTN
Rincent et al. (2014) Recovering power in association mapping panels with variable levels of linkage disequilibrium. Genetics 197 (1): 375–87. doi:10.1534/genetics.113.159731.
Segura et al. (2012) An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nature Genetics 44 (7): 825–30. doi:10.1038/ng.2314.
Sun et al. (2010) Variation explained in mixed-model association mapping. Heredity 105 (4): 333–40. doi:10.1038/hdy.2010.11.
Tunnicliffe W. (1989) On the use of marginal likelihood in time series model estimation. JRSS 51 (1): 15–27.
VanRaden P.M. (2008) Efficient methods to compute genomic predictions. Journal of Dairy Science 91 (11): 4414–23. doi:10.3168/jds.2007-0980.
## Create a gData object Using the data from the DROPS project. ## See the included vignette for a more extensive description on the steps. data(dropsMarkers) data(dropsMap) data(dropsPheno) ## Add genotypes as row names of dropsMarkers and drop Ind column. rownames(dropsMarkers) <- dropsMarkers[["Ind"]] dropsMarkers <- dropsMarkers[colnames(dropsMarkers) != "Ind"] ## Add genotypes as row names of dropsMap. rownames(dropsMap) <- dropsMap[["SNP.names"]] ## Rename Chomosome and Position columns. colnames(dropsMap)[match(c("Chromosome", "Position"), colnames(dropsMap))] <- c("chr", "pos") ## Convert phenotypic data to a list. dropsPhenoList <- split(x = dropsPheno, f = dropsPheno[["Experiment"]]) ## Rename Variety_ID to genotype and select relevant columns. dropsPhenoList <- lapply(X = dropsPhenoList, FUN = function(trial) { colnames(trial)[colnames(trial) == "Variety_ID"] <- "genotype" trial <- trial[c("genotype", "grain.yield", "grain.number", "seed.size", "anthesis", "silking", "plant.height", "tassel.height", "ear.height")] return(trial) }) ## Create gData object. gDataDrops <- createGData(geno = dropsMarkers, map = dropsMap, pheno = dropsPhenoList) ## Run single trait GWAS for trait 'grain.yield' for trial Mur13W. GWASDrops <- runSingleTraitGwas(gData = gDataDrops, trials = "Mur13W", traits = "grain.yield") ## Run single trait GWAS for trait 'grain.yield' for trial Mur13W. ## Use chromosome specific kinship matrices calculated using vanRaden method. GWASDropsMult <- runSingleTraitGwas(gData = gDataDrops, trials = "Mur13W", traits = "grain.yield", kinshipMethod = "vanRaden", GLSMethod = "multi")
## Create a gData object Using the data from the DROPS project. ## See the included vignette for a more extensive description on the steps. data(dropsMarkers) data(dropsMap) data(dropsPheno) ## Add genotypes as row names of dropsMarkers and drop Ind column. rownames(dropsMarkers) <- dropsMarkers[["Ind"]] dropsMarkers <- dropsMarkers[colnames(dropsMarkers) != "Ind"] ## Add genotypes as row names of dropsMap. rownames(dropsMap) <- dropsMap[["SNP.names"]] ## Rename Chomosome and Position columns. colnames(dropsMap)[match(c("Chromosome", "Position"), colnames(dropsMap))] <- c("chr", "pos") ## Convert phenotypic data to a list. dropsPhenoList <- split(x = dropsPheno, f = dropsPheno[["Experiment"]]) ## Rename Variety_ID to genotype and select relevant columns. dropsPhenoList <- lapply(X = dropsPhenoList, FUN = function(trial) { colnames(trial)[colnames(trial) == "Variety_ID"] <- "genotype" trial <- trial[c("genotype", "grain.yield", "grain.number", "seed.size", "anthesis", "silking", "plant.height", "tassel.height", "ear.height")] return(trial) }) ## Create gData object. gDataDrops <- createGData(geno = dropsMarkers, map = dropsMap, pheno = dropsPhenoList) ## Run single trait GWAS for trait 'grain.yield' for trial Mur13W. GWASDrops <- runSingleTraitGwas(gData = gDataDrops, trials = "Mur13W", traits = "grain.yield") ## Run single trait GWAS for trait 'grain.yield' for trial Mur13W. ## Use chromosome specific kinship matrices calculated using vanRaden method. GWASDropsMult <- runSingleTraitGwas(gData = gDataDrops, trials = "Mur13W", traits = "grain.yield", kinshipMethod = "vanRaden", GLSMethod = "multi")
gData
Gives a summary for an object of S3 class gData
.
## S3 method for class 'gData' summary(object, ..., trials = NULL)
## S3 method for class 'gData' summary(object, ..., trials = NULL)
object |
An object of class |
... |
Not used. |
trials |
A vector of trials to include in the summary. These can
be either numeric indices or character names of list items in |
A list with a most four components:
A list with number of markers and number of chromosomes in the map.
A list with number of markers, number of genotypes and the distribution of the values within the markers.
A list of data.frames, one per trial with a summary of all traits within the trial.
A list of data.frames, one per trial with a summary of all covariates within the trial.
All components are only present in the output if the corresponding content is present in the gData object.
GWAS
Gives a summary for an object of S3 class GWAS
.
## S3 method for class 'GWAS' summary(object, ..., trials = NULL, traits = NULL)
## S3 method for class 'GWAS' summary(object, ..., trials = NULL, traits = NULL)
object |
An object of class |
... |
Not used. |
trials |
A vector of strings or numeric indices indicating for
which trials the summary should be made. If |
traits |
A vector of strings indicating which traits to include in the
summary. If |