Epistasis, and genetic association testing

    W-test: The hypothesis underlying this epistasis test is that if two or more genetic variables contribute to disease outcome, then their joint probability distribution in the disease group and control group shall be different. The W-test is thus proposed as a non-parametric approach to compare the distributional difference, by aggregating the log of odds ratios in the contingency table cells. In addition to the classical framework of aggregating the odds ratio using chi-square statistic, we added in another property allowing the degrees of freedom of the statistic to be estimated from the working data set by bootstrapping samples. This data-adaptive probability distribution is very useful for non-parametric testing in genetic datasets, which genetic architecture varies in data with different sample size and population structure. For example, testing interaction effect of (X1, X2) in a two-way contingency table (3x3) will form a chi-square degrees of freedom (df) = 8. This integer df would be adopted across all types of data sets. While in the W-test, the df is boostrap estimated and may take continuous values. Therefore, more accurate p-values can be efficiently obtained, making the test more robust to various genetic architectures and reduced sample size to main association testing power. (R package: wtest)

    Wang MH, Sun R, Guo J, Weng H, Lee J, Hu I, Sham PC, Zee BC. A fast and powerful W-test for pairwise epistasis testing. Nucleic acids research. 2016 Jul 8;44(12):e115-.https://doi.org/10.1093/nar/gkw347

    Sun R, Weng H, Wang MH*. W-Test for Genetic Epistasis Testing. Methods In Molecular Biology. 2021; 45-53.

    Sun R, Weng H, Hu I, Guo J, Wu WK, Zee BC, Wang MH*. A W-test collapsing method for rare variant association testing in exome sequencing data. Genetic Epidemiology. 2016 Nov;40(7):591-6.

    Sun R, Weng H, Xia X, Chong K, Zee BC, Wang MH*. Gene-methylation epistasis analysis by a W-test identified enriched signals of neuronal genes in patients undergone lipid control treatment. BMC Proceedings

Generic placeholder image

    Zoom-focus algorithm (ZFA): We observed that a gene may consists of thousands of rare variants in whole genome sequencing dataset, however, only a small fraction of these variants are functional. ZFA is devised to identify the optimal testing genomic segments enriched with clustered signals in a given gene. A key component of this method is how to adjust for multiple-testing issue. (R package: zfa)

    The method was inspired from stargazing using a telescope: first a grid-search in sky is performed with an eye-piece to identify rough position of a target planet, and then the observer adjusts focal length of telescope to get a high resolution view of the target. Looking for informative loci in deep sequencing genome could be similar.

    Wang MH, Weng H, Sun R, Lee J, Wu WK, Chong KC, Zee BC. A Zoom-Focus algorithm (ZFA) to locate the optimal testing region for rare variant association tests. Bioinformatics. 2017 Aug 1;33(15):2330-6.

Generic placeholder image

Disease prediction

  • Disease prediction in genotype data:
  • The Prism Vote (PV): The PV takes a novel view the trait of an individual as a composite risk from subpopulations, in which stratum-specific predictors can be formed in data of more homogeneous genetic structure. Since each individual is represented by a composition of subpopulation memberships, the framework enables individualized risk characterization.

    2019 US Provisional Patent No. 62/915,459, 19/MED/902, filed on 15 October 2019, Prediction Models Incorporating Stratification of Data.Wang MH, Xia X, Zee BCY.

    Xia X, Zhang Y, Sun R, Wei Y, Li Q, Chong MKC, Wu WKK, Zee BCY, Tang H, Wang MH* (2022) A Prism Vote Method for Individualized Risk Prediction of Traits in Genotype Data of Multi-population, PLOS Genetics. Accepted

  • Trait classification in gene expression data:
  • Xia X, Weng H, Sun R, Chong K, Zee BC*, Wang MH* Incorporating methylation genome information improves prediction accuracy for drug treatment responses. BMC Genetics. 2018 Sep;19(1):78.

    Wang MH, Lo SH, Zheng T and Hu I (2012) A Classification Method Incorporating Interactions among Variables for High-dimensional Data. Bioinformatics (2012) 28 (21): 2834-2842

  • Prediction in breath mass spectrometry:
  • Wang MH, Chong K, Chung H, Storer M, Pickering J, Endre Z, Lau S, Kwok C, Lai M, Zee BC. Use of a Least Absolute Shrinkage and Selection Operator (LASSO) Model to Selected Ion Flow Tube Mass Spectrometry (SIFT-MS) Analysis of Exhaled Breath to Predict the Efficacy of Dialysis. Journal of Breath Research. 2016 Sep 28;10(4):046004.

    Wang MH, Lau SY, Kwok C, Chong K, Lai M, Chung AH, Ho C, Szeto C, Zee BC. Estimation of clinical parameters of chronic kidney disease (CKD) by exhaled breath full-scan mass spectrometry data and Iterative PCA with Intensity Screening (IPS) algorithm. Journal of Breath Research. 2017 Aug 21;11(3):036007.

  • Book Chapters:
  • Statistical methods for disease risk prediction with genotype data, with Xia X, Zhang Y, Wei Y. To appear. Springer Nature. 2022-

    "Genetic Test, Risk Prediction Model, and Genetic Counselling", with Haoyi Weng, Book Title: Translational Informatics in Smart Healthcare, Springer, 2017

Generic placeholder image