Showing posts from June, 2018

Improved structural variation prioritisation

Matt Field has made some significant improvements to the the prioritisation of structural variants (SV) and we've updated our database to reflect those changes which include combined report for both SV callers, prioritise SVs where exons are most likely to be impacted, max length filter applied to most SV types, and whether event is novel/known. These changes dramatically reduced the number of high priority SVs from >3000 to around 90 and 449 medium priority SVs. Please note that we do not retrospectively re-analyse and update the SV reports for any of the previous records. This only affects any new data.

Handling control samples

We don't necessarily want to see variants from our control samples in the database, but at the same time we still want to be able to download the VCFs and do SNP validation to ensure we don't have sample mix ups. We've created a separate page of control samples and their corresponding VCF for download.

Automated archiving of BAM files

Our capacity to keep BAM files available for download is a real challenge and we are faced with the constant pressure to free up diskspace as more projects come onboard. We've come up with a way to automatically archive BAM files that are older than 1 year to tape storage without any human intervention.

Health reports: GWAS

In addition to Clinvar and Snpedia, we've recently added GWAS Catalog to the health reports based on the the rsNumbers for a patient. GWAS is particularly useful in a research context by comparing variant frequencies in the affected population against a control (healthy) population using statistical analysis to establish a hypothetical link between variants and disease traits. In the health report under GWAS, we've added the following columns: disease traits, studies, risk allele, initial sample size, replication sample size, p-value and risk allele frequency. In GWAS it has been shown that false positives are not uncommon (false association between variant and disease) due to uncontrolled biases and so it's important to take into consideration whether any replicate studies were done to give more confidence to the hypothesized association.