Predict ancestry using this VAE Reference Latent Space

1....2....AIM

In an effort to improve visualizations of genetic variance using ancestry informative markers (AIMs), we have developed a web-based tool that is an adaptation of the popVAE algorithm. This webtool provides a two-dimensional ancestry inference of an unknown individual based on their genetic latent co-ordinates using 104 ancestry informative DNA variants. 102 of these are autosomal ancestry variants from the published VISAGE ET (Enhanced Tool for Appearance and Ancestry) panel, with 2 additional variants found to boost resolution of EURASIA.

A template for input data contains a test individual from a Sanger Wellcome Trust Dataset - information on that dataset can be found here.

Below, we describe the specific adaptations from the downloadable Github version of popVAE.

How this webtool was built.

1) Installed popVAE in the form of a conda environment.
2) Normalized both biallelic and multiallelic genotypes (n=104) between 0 and 1 prior to input.
3) Handled missing genotypes* for samples by inputting the average genotype across each population cluster group.
4) Chose the most optimal model out of 20 runs: We selected the training and testing sets solely from the 1KG and HGDP Anchor references**. The models were balanced and adjusted for a random selection of 5 samples tested from each population cluster.
5) Assessed the visualization performance on an independent test set using their genotype codes (i.e, TC, CC, AA, etc) to project their predicted latent space distribution within the 1KG and HGDP Anchor ancestry space saving the optimal model.
6) Finally, the AIM reference latent space provided here is used as a base to project unknown samples into the AIM genetic space based on their available input data.

* Although this prediction can handle missing input data, specifically in the form of a blank value or NA, by averaging the alleic frequency input of the genotype from the training data,it is encouraged to keep the amount of missing data to a minimum for the best possible results.

** Genotypic information on these Anchor Reference Samples can be found here.

If you use this tool, please credit this website and published work associated with this tool, including the VISAGE ET autosomal ancestry panel.