6 Querying Additional Datasets
Once users have completed all of their filtering using the Basic Filters on the left-hand side of the p53motifDB Shiny App (covered in Chapter 3), they are left with a main data table containing the p53RE that have the characteristics of interest. Chapter 5 describes how to export the Main Data Table information, and this chapter describes how users can download a series of “accessory” datasets that provide additional information about the p53RE in that filtered Main Data Table. Each button allows users to export new information about their p53RE. This works much like cross-referencing in sql or other relational database schemes. Once the Main Data Table filters are applied, users can export accessory data using the buttons at the bottom of the App (shown in Figure 3.1). Data are exported as comma separated value (.csv) files via a system dialog box where files can be named and saved to the location of the users choice.
Data are exported for ALL p53RE found in the filtered Main Data Table, even if there are no records for the particular accessory data. If there are no records, the p53RE unique_id (chromosome number, start location, and stop location) is displayed, but other columns will display “NA”.
We recommend applying multiple Basic Filters before exporting accessory data. Users interested in all information contained in the accessory data tables can find these on our Zenodo repository, which also contains a pre-compiled sqlite database for use of advanced query language.
In the next subsections, we will go through the types of data that users can export from these accessory tables.
6.1 ClinVar
column_name | information |
---|---|
unique_id | chromosome_start_stop for p53RE |
clinVar_chrom | chromosome where the clinVar variant is located |
clinVar_start | start of chromosome position of the clinVar variant |
clinVar_stop | stop/end of chromosome position of the clinVar variant |
clinVar_id | clinVar database standard ID |
clinVar_ref | the reference allele |
clinVar_alt | the alternate allele |
clinVar_type | the type of clinVar variant; options include INSERTION, DELETION, SNV, MNV |
6.2 dbSNP
Single p53RE locations may contain multiple genomic variants in the dbSNP accessory dataset, so the same p53RE unique_id (location) can be found in multiple rows of the exported data. Each row will contain unique data reflecting the specific genomic variant, dbSNP ID, and other information found in additional columns.
column_name | information |
---|---|
unique_id | chromosome_start_stop for p53RE |
dbSNP_chrom | chromosome where the dbSNP variant is located |
dbSNP_start | start of chromosome position of the dbSNP variant |
dbSNP_stop | stop/end of chromosome position of the dbSNP variant |
rs_id | dbSNP standard ID |
dbSNP_ref | the reference allele |
dbSNP_alt | the alternate allele |
dbSNP_type | the type of dbSNP variant; options include INSERTION, DELETION, SNV, MNV |
6.3 ENCODE DHS
DNase Hypersensitive Sites (DHS) have been surveyed across hundred of cell types and conditions. Therefore, the same p53RE unique_id (location) may be found in multiple rows of the exported data, reflecting the observed cell types where that genomic location is accessible to DNase.
column_name | information |
---|---|
unique_id | chromosome_start_stop for p53RE |
dhs_loc | chromosome_start_stop for the DHS (Dnase Hypersensite Site) |
dhs_intersection | Is the p53RE found in a DHS region (YES/NO) |
dhs_cell_id | the ENCODE cell line designation/number |
dhs_score | the score of the DHS, ranges from 1-1000. The larger the number, the larger the DHS signal. |
dhs_cell_type | the cell type where the DHS was identified; linked to the dhs_cell_id |
6.4 RepeatMasker
There are many different types of repetitive elements, ranging from simple DNA repeats to virus-derived elements. This accessory dataset provide additional information about the full location fo the repetitive element, as well as the repeat element name, family, and class.
column_name | information |
---|---|
unique_id | chromosome_start_stop for p53RE |
rmsk_chrom | chromosome where the repetitive element is located |
rmsk_start | start of chromosome position of the repetitive element |
rmsk_stop | stop/end of chromosome position of the repetitive element |
repeat_name | the standardized name of the repeat |
repeat_class | the class of the repeat element |
repeat_family | the family of the repeat element |
rmsk_intersection | is the p53RE found in a repetitive element (YES/NO) |
6.5 Riege p53 Meta
column_name | information |
---|---|
unique_id | chromosome_start_stop for p53RE |
fischer_chrom | chromosome where the full p53 ChIP-seq peak is located |
fischer_start | start of chromosome position of the full p53 ChIP-seq peak |
fischer_stop | stop/end of chromosome position of the full p53 ChIP-seq peak |
fischer_loc | the full location of the p53 ChIP-seq peak in chrom_start_stop format |
fischer_obs | The number of datasets from the Riege et al. meta-analysis where this location is a called p53 ChIP-seq peak/binding site (range of 5-28) |
fischer_percentages | The percent of datasets where this location is a p53 binding site/peak. |
6.6 ReMap 2022
column_name | information |
---|---|
unique_id | chromosome_start_stop for p53RE |
remap_loc | the full location of the p53 ChIP-seq peak in chrom_start_stop format |
remap_chrom | chromosome where the full p53 ChIP-seq peak is located |
remap_start | start of chromosome position of the full p53 ChIP-seq peak |
remap_stop | stop/end of chromosome position of the full p53 ChIP-seq peak |
remap_cell | The type of cell where the p53 ChIP experiment was performed |
remap_obs | The number of ReMap datasets where this location was a p53 peak/binding site |
remap_percentage | The percent of datasets where this location was a p53 peak/binding site |
6.7 ABC 3D Genome
The Activity by Contact (ABC) dataset aims to “connect” p53RE to potential regulated genes. The original manuscript describes this work more in detail, but users of this database can obtain relevant information about potential direct p53 target genes, including the gene connected to the p53RE, the type of cell where this interaction takes place, and the location of the p53RE relative to the gene.
column_name | information |
---|---|
unique_id | chromosome_start_stop for p53RE |
abc_chr | chromosome where the gene connected to the p53RE is located |
abc_TargetGeneTSS_start | start position for the TSS of the gene |
abc_TargetGeneTSS_end | stop position for the TSS of the gene |
abc_TargetGene_loc | TSS location for the p53RE-connected gene in chrom_start_stop format |
TargetGene | name of the Target Gene in HGNC/gene symbol format |
class | the type of regulatory element where the p53RE is located (promoter, intergenic, genic) |
TargetGeneIsExpressed | Based on RNA-seq or other evidence, is the target gene expressed in the cell type of interest |
isSelfPromoter | Is the location of the p53RE within a promoter, and does that promoter likely regulate the gene |
ABC.Score | a proprietary score from the ABC model reflecting the confidence that the p53RE-containing element regulates the listed gene. |
CellType | the type of cell where the prediction for regulation of the promoter by the p53RE-containing element was made |
ABC_intersection | Is this p53RE found in a region covered by the ABC predictions |
6.8 GeneHancer
Like the ABC dataset, the GeneHancer dataset aims to “connect” p53RE to potential regulated genes. The original manuscript describes this work more in detail, but the current release is proprietary and owned by the academic institution and non-profit that runs the GeneCards website. The full dataset for ALL GeneHancer data can be obtained from this company for academic and non-profit users by request.
In our database, users can obtain relevant information about potential direct p53 target genes determined by the GeneHancer group, including the gene connected to the p53RE, the “strength” of the association, and the predicted type of regulatory element where the p53RE is located.
column_name | information |
---|---|
unique_id | chromosome_start_stop for p53RE |
gh_loc | the location of the p53RE-containing regulatory element connected to the gene of interest. |
GHid | a unique identifier for the GeneHancer database |
gh_connected_gene | name of the Target Gene in HGNC/gene symbol format |
gh_score | a proprietary score from the GeneHancer model reflecting the confidence that the p53RE-containing element regulates the listed gene. |
gh_elite | Elite sites represent the highest confidence that the gene is regulated by the p53RE-containing element(0=no, 1=yes) |
gh_regulatory_element_type | the type of regulatory element where the p53RE is found, as defined by GeneHancer |
6.9 MCF10A MicroC
column_name | information |
---|---|
unique_id | chromosome_start_stop for p53RE |
bait_chrom | chromosome location for the bait region |
bait_start | chromosome start position for the bait |
bait_end | chromosome stop position for the bait |
prey_chrom | chromosome location for the prey region |
prey_start | chromosome start position for the prey |
prey_end | chromosome stop position for the prey |
bait_gene_name | the HGNC/gene symbol for the gene of interest |
baitprey_interaction | is the p53RE found within the bait region or the prey region? |
6.10 HCT116 pcHIC
column_name | information |
---|---|
unique_id | chromosome_start_stop for p53RE |
bait_chrom | chromosome location for the bait region |
bait_start | chromosome start position for the bait |
bait_end | chromosome stop position for the bait |
prey_chrom | chromosome location for the prey region |
prey_start | chromosome start position for the prey |
prey_end | chromosome stop position for the prey |
baitprey_interaction | is the p53RE found within the bait region or the prey region? |
bait_ensembl_gene_id | the ENSEMBL id for the gene of interest |
bait_hgnc | the HGNC/gene symbol for the gene of interest |