Variant Database

Matt Field has made some significant improvements to the the prioritisation of structural variants (SV) and we've updated our database to reflect those changes which include combined report for both SV callers, prioritise SVs where exons are most likely to be impacted, max length filter applied to most SV types, and whether event is novel/known. These changes dramatically reduced the number of high priority SVs from >3000 to around 90 and 449 medium priority SVs. Please note that we do not retrospectively re-analyse and update the SV reports for any of the previous records. This only affects any new data.

Handling control samples

We don't necessarily want to see variants from our control samples in the database, but at the same time we still want to be able to download the VCFs and do SNP validation to ensure we don't have sample mix ups. We've created a separate page of control samples and their corresponding VCF for download.

Automated archiving of BAM files

Our capacity to keep BAM files available for download is a real challenge and we are faced with the constant pressure to free up diskspace as more projects come onboard. We've come up with a way to automatically archive BAM files that are older than 1 year to tape storage without any human intervention.

Health reports: GWAS

In addition to Clinvar and Snpedia, we've recently added GWAS Catalog to the health reports based on the the rsNumbers for a patient. GWAS is particularly useful in a research context by comparing variant frequencies in the affected population against a control (healthy) population using statistical analysis to establish a hypothetical link between variants and disease traits. In the health report under GWAS, we've added the following columns: disease traits, studies, risk allele, initial sample size, replication sample size, p-value and risk allele frequency. In GWAS it has been shown that false positives are not uncommon (false association between variant and disease) due to uncontrolled biases and so it's important to take into consideration whether any replicate studies were done to give more confidence to the hypothesized association.

Health reports: Clinvar & Snpedia

May 22, 2018

We've added a new feature where users can generate health reports downloadable in a Excel format from multiple datasources including Clinvar and Snpedia based on the patient's rsNumbers/variants and genotype. The health reports indicate the patient's risk factor associated with a particular disease/trait. It can take up to 20 mins to generate and an email is sent with the attached health report. Magnitude is a subjective measure of interest ranging between 0-10. The higher the number the more significant. A magnitude score of 2 or higher is probably worth investigating. A magnitude score of 4 or higher is definitely worth investigating. More info at: https://www.snpedia.com/index.php/Magnitude.

Excluding variants from search based on Patient study codes

April 17, 2018

When doing our own variant analysis, we often seek variants that shared between affected individuals, and we already provide this capability using the 'shared' filter. We recently added a new filter to take this search one step further by removing variants found in the unaffected individuals (usually from the same family). There is a new textbox called 'Exclude variants' where users can add patient study codes to exclude the variants found in these individuals from the variants found in the other individuals in a single search operation. Keep in mind, that each person will carry thousands of variants, so filtering in this way can be quite slow if no other filters are applied. So it is recommended that users apply as many filters as possible to narrow the search before using this functionality.

GnomAD ethnic frequencies exportable

April 17, 2018

We've added a new option for users to export GnomAD ethnic frequencies to excel which includes south asian, east asian, african american, jewish, non-finnish european, finnish and other minor allele frequencies (MAF). It's optional because we don't actually store the gnomAD frequencies in our database and have to fetch them from elsewhere making export slower especially when exporting thousands of variants. It's best to filter as much as you can before enabling this option.

Affected statuses: Database vs Pipeline

April 04, 2018

We recently introduced a new filter called 'Pipeline affected status'. This is not be confused with the other 'Affected status' or 'Disease status' filter which is taken from our Patient Database. The 'Pipeline affected status' differs such that you can reconfigure the pipeline to use a different affected status from what is set in the database to produce different cohort reports. This is useful in cases when the affected status applies to multiple phenotypes or diagnoses and you want to to do repeated cohort analysis under different conditions.

Phenotype to Genotype based variant searching

February 15, 2018

Expand your variant search based on known phenotype-genotype relationships. This filter only works if you have specified patient ids in the filters. The phenotypes collected from the specified patients are used to query OMIM for gene relationships. A new tab called 'Phenotype-Genotype' is displayed in the results showing the relationships between phenotypes and genes. This only works well for patients that have a good number of phenotypes captured in our databases.

RS number filter

February 15, 2018

Users can now search by rsNumbers in our search fitlers

Variants from cohort reports are now included

February 15, 2018

Previously only the variants from the SNV, INDEL and SV reports were included into our database. We've recently rebuilt our database to include all variants, even the questionable ones of poor quality, found in the cohort report because there are some suggestions of an inheritance pattern discovered during the pipeline pedigree analysis. This means more variants for you to browse than there was before.

Gene interactions - Genes don't work in isolation, and your gene lists shouldn't either

Genes don't work in isolation, and your gene lists shouldn't either. Researchers will often have a list of known genes to look for when prioritizing variants based on the patient's clinical diagnosis. but what should you do if no candidate variants can be found solely based on your gene list? There are many approaches, but one option is to expand the gene list based on known gene interactions and pathways. We rely on the highly curated database called BioGRID to expand the gene list to include the network of genes known to interact either directly or through protein-to-protein interactions. To use this new feature, there is a new checkbox called 'Gene interactions' which users can tick to expand their gene-based search in this way.

Search profiles

Users can now create their own search profiles as a way of storing commonly used search filters without having to repeatedly choose the same options over and over again. One example is to include your gene lists in a search profile. The search profiles are associated with the user only and are not shared.

BAI - BAM index files downloadable

The BAM index files, known as BAI files, are now available for download along with the BAM file. This is particularly useful when using IGV on your desktop.

New search filters: gnomAD frequency and INDEL ExAC frequencies

The bioinformatics pipeline has been updated to include gnomAD frequencies and added INDEL exac frequencies. Any new data generated from Dec 2017 onwards will have these new fields. However, none of the previously analyzed datasets will have them. They will have to be reanalyzed if you want these new fields populated. To go along with these new fields, we've added the new gnomAD frequency filter to our search page.

Exon coverage search

December 13, 2017

The sequencing and alignment process isn't perfect and often there are regions of poor coverage as a result of the pipeline analysis. Previously we made the coverage reports available for download as part of our datasets as 'exonReports'. We've taken it a step further by allowing users to search through these coverage reports based on gene, patient ID and coverage type (NO_COVERAGE, POOR_COVERAGE, PARTIAL_COVERAGE). To use this new feature, in the menus, choose 'Search exon coverage'. Furthermore, we added a new tab to display exon coverage to go along with the variant search results. The tab will only have results if users search by Patient ID and Gene. This way users can browse variants and the coverage results side-by-side providing a broader view over the quality of the variants being presented. In particular, this will be useful for difficult to diagnose patients for which no causal variants have been identified, where potentially disease-causing variants m...

CACPIC Frequencies exportable

December 13, 2017

Chinese frequencies using our healthy chinese controls are now exportable to Excel as an optional column. We've made it optional because these frequencies are calculated at runtime during the export process and can delay the completion of export. For those not interested in the CACPIC frequencies, leave the checkbox unticked.

Supplementary information available for download

December 13, 2017

As part of our datasets for download, we've added some supplementary information (generated as TXT files by the pipeline) to go along with the VCFs and BAMs files. The files contain information such as the cutoffs used to qualify variants as a PASS, which include things like read depth cutoffs, median quality score cutoffs and so on. Furthermore, the files also contain general statistics about total variants passed, the proportion of variants that are exonic, splice sites, number of distinct genes and averages of read depth and median quality. Another file called the 'readReport.summary' includes information about how many were paired, mispaired, aligned, and unaligned. The supplemantary files can be found under the 'Datasets' section.

Measuring Variant Conservation with GERP Score and Siphy

December 08, 2017

The more annotations, the better! We've recently added 2 new annotations to assist with variant prioritisation as a measure of variant conservation and these are GERP scores and Siphy. GERP stands for Genomic Evolutionary Rate Profiling. Conceptually, GERP is a method for the identification of slowly evolving regions in a multiple sequence alignment, defined as ‘constrained elements’. "Constrained elements are identified by comparing the observed to the expected rates of evolution for each window, and defining all those regions whose collective observed rates of evolution are significantly lower than would be expected under a null model." More simply, it is a score used to calculate the conservation of each nucleotide in multi-species alignment with ranges from -12.3 to 6.17, with 6.17 being the most conserved. Positive scores (observed fewer than expected) indicate that a site is under evolutionary constraint. Negative scores may be weak evidence of accelerated rates of...

Allele Frequency filtering

December 01, 2017

The pipeline does the best it can in assigning variants a zygosity based on the allele frequencies and counts. Usually the cutoff is around 90%. However, once we reach below this threshold, it becomes less clear. Hence we now allow users to filter by allele frequency particularly useful in cases where zygosity is not always clear. Users can filter on the VARIABLE allele frequency as well as the REFERENCE allele frequency.

gnomAD frequencies based on ethnicity

December 01, 2017

Previously the gnomAD frequency shown reflected the european frequency. We now display the gnomAD frequency from all ethnicities under the 'Latest annotations' tab.

Gene synonym searching

December 01, 2017

Previously searching by gene was based on the exact match of the provided gene name without consideration of the evolution of gene names over time. Gene names are often given synonyms or replace old gene names with new ones. With this change users will have the option to expand gene name searching to include synonyms. The expanded list of synonyms is shown in the returning page. Our testing has shown this to greatly affect the results returned.

CACPI control frequency calcuation correction

November 17, 2017

Previously there was an error in which the calculation was made for the CACPI control frequency. This has now been corrected.

Provisional Variants

November 08, 2017

Users can now save variants against the particular individual as a provisional variant to ranked and prioritised. Once added to the patient, these provisional variants can be used to nominate suspected variants for sanger sequencing confirmation. When confirmed, the variant can then move on to the next stage and be marked as the 'Genetic Diagnosis'.

Shared variants as a percentage

November 06, 2017

Previously when filtering for shared variants between individuals, the variants returned were always in 100% of the specified individuals. This has now been changed to allow users to specify the sharing of variants between individuals AS A PERCENTAGE. For example, if users wanted to know of all variants shared between 2 out of 3 individuals, users can use a percentage value of 66%. The results returned will be AT LEAST 2 out of 3 individuals will share the variant. This is especially useful in large cohorts of unrelated individuals known to share similar phenotypes where there may be a suspicion of multifactorial gene expression or other complex gene interactions.

Zygosity exportable

November 06, 2017

The Zygosity status assigned to each variant is now exported to Excel

Login Changes: Australian Access Federation integration

We've recently changed our authentication to use the Australian Access Federation. The main benefit is that users from other universities across Australia can use their own institutional credentials to access our database, without having to create a new username/password. However, we still have support for generic username and passwords to support our friends overseas. The other benefit of making this switch is to provide a better user experience in terms of data integration from our ecosystem of databases via single-sign on. We can seamlessly pull information from various places to provide an aggregated view of data. Australian users should use the [Login via AAF] option, while others should use [Basic Login].

IGV viewer for whole family

Our first release of the IGV viewer only allowed a single individual to be displayed at a time, with a single VCF and single BAM file loaded up. With the latest release, IGV viewer now loads all family members.

SNP Validation

The process of sequencing and analysis goes through a series of steps with much human involvement, and therefore is prone to error. In order to verify that a patient indeed does have particular variant, we can go through a step of SNP Validation for a batch of patients. We've created a tool that can be invoked through our web portal to determine the smallest combination of SNPs required to uniquely identify a patient within a batch for a predefined SNP panel. This process requires that the BAM file be available and can take a while, depending on the size of the batch.

Chinese frequencies

Displays the frequency of a variant in the Chinese population using our own Chinese healthy controls database. Currently only supports SNVs. INDELS & Structural Variants UNSUPPORTED (at this stage) and will show 0%

Coverage reports available

The sequencing and variant calling process can sometimes be selective in the regions covered in the genome. Therefore it's also important to know about what regions were not well covered to consider false negatives. These coverage reports are now included as part of the 'summary report' downloads. Look for the file names with 'exonReport'.

On-demand recent annotations

At the push of a button, users can request for the retrieval of the most recent annotations for a variant from Ensembl VEP. This is used to supplement the annotations provided by the pipeline which may not be completely up-to-date with regards to it's source of information. Some of the new annotations include things like Clinical significance, Pub med ID and links, Mutation Taster predictions, rsNumbers and much more ... Because of it's dynamic nature, unfortunately these annotations are NOT searchable, but are there to supplement the annotations from the pipeline.

Restore BAMs files