Genetic risk score 101
2025-01-03
GRS is the score used to estimate the early risk of disease susceptibility by accumulating the effect of risk alleles from SNPs in our genome1, defined as
\[GRS=\sum_{n=1}^{k} RA_{k}\cdot W_{k}\]Where RA is the number of risk alleles of SNP k, along with the effect size W from GWAS studies. The effect sizes for continuous and binary trait are beta-coefficient and odd ratio, respectively2.
In certain cases, the effect size W might not be available for calculating the weighted GRS (wGRS). We can address this by simply summing the number of risk alleles instead, known as unweighted GRS (uGRS). Although it lacks the effect size, the classification capability of uGRS is comparable with wGRS3.
So, let’s calculate uGRS with simple R code.
First let simulate the genomic dataset of type 2 diabetes (T2D) containing 10,000 samples with 1,000 SNPs each where 0,1,and 2 repersent homozygous of non-risk allele, heterozygous of risk allele, and homozygous of risk alelles, respectively.
set.seed(123)
data <- matrix(round(runif(10000000, min=0, max=2),0),10000,1000)
colnames(data) <- paste0("SNP_",1:1000)
rownames(data) <- paste0("sample_",1:10000)
data[c(1:10),c(1:10)]
## SNP_1 SNP_2 SNP_3 SNP_4 SNP_5 SNP_6 SNP_7 SNP_8 SNP_9 SNP_10
## sample_1 1 1 2 0 0 0 1 0 1 2
## sample_2 2 1 1 2 2 1 2 0 1 1
## sample_3 1 2 1 1 1 0 1 1 2 1
## sample_4 2 1 0 1 1 1 0 2 2 2
## sample_5 2 0 2 0 0 1 1 1 1 1
## sample_6 0 1 0 1 1 2 1 1 2 1
## sample_7 1 2 1 2 0 1 0 0 1 1
## sample_8 2 2 1 1 1 1 1 2 1 1
## sample_9 1 2 1 1 0 1 2 1 2 1
## sample_10 1 0 1 0 1 1 0 1 0 0
Then, we can simply sum the number of risk alleles from 1,000 SNPs across samples and plot it as the uGRS distribution.
hist(rowSums(data),main ="uGRS distribution across 10,000 samples",xlab="uGRS",ylab="Frequency")

One way to interpret this is to (1) calculate the risk percentile or (2) estimate the risk fold compared with the population average.
Here is an example of sample_400’s percentile risk (red line) compared
with population average (blue line).
sample_400_uGRS <- unlist(rowSums(data)[400])
population_average <- mean(rowSums(data))
prk <- dplyr::percent_rank(rowSums(data))[400]
fold_ <- sample_400_uGRS/population_average
hist(rowSums(data),
main = "The risk of sample_400 compared with population",
xlab="uGRS",ylab="Frequency")
abline(v = population_average,col="blue",lwd = 3)
abline(v = sample_400_uGRS,col="red",lwd = 3)

Interpretation: sample_400 will have risk of T2D at 87.9 percentile (red line), or 1.026 times compared with population average (blue line).
References
-
Igo, R. P., Jr, Kinzy, T. G., & Cooke Bailey, J. N. (2019). Genetic Risk Scores. Current protocols in human genetics, 104(1), e95. https://doi.org/10.1002/cphg.95. ↩
-
Sollis, E., Mosaku, A., Abid, A., Buniello, A., Cerezo, M., Gil, L., Groza, T., Güneş, O., Hall, P., Hayhurst, J., Ibrahim, A., Ji, Y., John, S., Lewis, E., MacArthur, J. A. L., McMahon, A., Osumi-Sutherland, D., Panoutsopoulou, K., Pendlington, Z., Ramachandran, S., … Harris, L. W. (2023). The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic acids research, 51(D1), D977–D985. https://doi.org/10.1093/nar/gkac1010. ↩
-
Seral-Cortes, M., Sabroso-Lasa, S., De Miguel-Etayo, P., Gonzalez-Gross, M., Gesteiro, E., Molina-Hidalgo, C., De Henauw, S., Gottrand, F., Mavrogianni, C., Manios, Y., Plada, M., Widhalm, K., Kafatos, A., Erhardt, É., Meirhaeghe, A., Salazar-Tortosa, D., Ruiz, J., Moreno, L. A., Esteban, L. M., & Labayen, I. (2021). Development of a Genetic Risk Score to predict the risk of overweight and obesity in European adolescents from the HELENA study. Scientific reports, 11(1), 3067. https://doi.org/10.1038/s41598-021-82712-4. ↩