6 Querying Additional Datasets

Once users have completed all of their filtering using the Basic Filters on the left-hand side of the p53motifDB Shiny App (covered in Chapter 3), they are left with a main data table containing the p53RE that have the characteristics of interest. Chapter 5 describes how to export the Main Data Table information, and this chapter describes how users can download a series of “accessory” datasets that provide additional information about the p53RE in that filtered Main Data Table. Each button allows users to export new information about their p53RE. This works much like cross-referencing in sql or other relational database schemes. Once the Main Data Table filters are applied, users can export accessory data using the buttons at the bottom of the App (shown in Figure 3.1). Data are exported as comma separated value (.csv) files via a system dialog box where files can be named and saved to the location of the users choice.

Data are exported for ALL p53RE found in the filtered Main Data Table, even if there are no records for the particular accessory data. If there are no records, the p53RE unique_id (chromosome number, start location, and stop location) is displayed, but other columns will display “NA”.

We recommend applying multiple Basic Filters before exporting accessory data. Users interested in all information contained in the accessory data tables can find these on our Zenodo repository, which also contains a pre-compiled sqlite database for use of advanced query language.

In the next subsections, we will go through the types of data that users can export from these accessory tables.

6.1 ClinVar

column_name information
unique_id chromosome_start_stop for p53RE
clinVar_chrom chromosome where the clinVar variant is located
clinVar_start start of chromosome position of the clinVar variant
clinVar_stop stop/end of chromosome position of the clinVar variant
clinVar_id clinVar database standard ID
clinVar_ref the reference allele
clinVar_alt the alternate allele
clinVar_type the type of clinVar variant; options include INSERTION, DELETION, SNV, MNV

6.2 dbSNP

Single p53RE locations may contain multiple genomic variants in the dbSNP accessory dataset, so the same p53RE unique_id (location) can be found in multiple rows of the exported data. Each row will contain unique data reflecting the specific genomic variant, dbSNP ID, and other information found in additional columns.

column_name information
unique_id chromosome_start_stop for p53RE
dbSNP_chrom chromosome where the dbSNP variant is located
dbSNP_start start of chromosome position of the dbSNP variant
dbSNP_stop stop/end of chromosome position of the dbSNP variant
rs_id dbSNP standard ID
dbSNP_ref the reference allele
dbSNP_alt the alternate allele
dbSNP_type the type of dbSNP variant; options include INSERTION, DELETION, SNV, MNV

6.3 ENCODE DHS

DNase Hypersensitive Sites (DHS) have been surveyed across hundred of cell types and conditions. Therefore, the same p53RE unique_id (location) may be found in multiple rows of the exported data, reflecting the observed cell types where that genomic location is accessible to DNase.

column_name information
unique_id chromosome_start_stop for p53RE
dhs_loc chromosome_start_stop for the DHS (Dnase Hypersensite Site)
dhs_intersection Is the p53RE found in a DHS region (YES/NO)
dhs_cell_id the ENCODE cell line designation/number
dhs_score the score of the DHS, ranges from 1-1000. The larger the number, the larger the DHS signal.
dhs_cell_type the cell type where the DHS was identified; linked to the dhs_cell_id

6.4 RepeatMasker

There are many different types of repetitive elements, ranging from simple DNA repeats to virus-derived elements. This accessory dataset provide additional information about the full location fo the repetitive element, as well as the repeat element name, family, and class.

column_name information
unique_id chromosome_start_stop for p53RE
rmsk_chrom chromosome where the repetitive element is located
rmsk_start start of chromosome position of the repetitive element
rmsk_stop stop/end of chromosome position of the repetitive element
repeat_name the standardized name of the repeat
repeat_class the class of the repeat element
repeat_family the family of the repeat element
rmsk_intersection is the p53RE found in a repetitive element (YES/NO)

6.5 Riege p53 Meta

column_name information
unique_id chromosome_start_stop for p53RE
fischer_chrom chromosome where the full p53 ChIP-seq peak is located
fischer_start start of chromosome position of the full p53 ChIP-seq peak
fischer_stop stop/end of chromosome position of the full p53 ChIP-seq peak
fischer_loc the full location of the p53 ChIP-seq peak in chrom_start_stop format
fischer_obs The number of datasets from the Riege et al. meta-analysis where this location is a called p53 ChIP-seq peak/binding site (range of 5-28)
fischer_percentages The percent of datasets where this location is a p53 binding site/peak.

6.6 ReMap 2022

column_name information
unique_id chromosome_start_stop for p53RE
remap_loc the full location of the p53 ChIP-seq peak in chrom_start_stop format
remap_chrom chromosome where the full p53 ChIP-seq peak is located
remap_start start of chromosome position of the full p53 ChIP-seq peak
remap_stop stop/end of chromosome position of the full p53 ChIP-seq peak
remap_cell The type of cell where the p53 ChIP experiment was performed
remap_obs The number of ReMap datasets where this location was a p53 peak/binding site
remap_percentage The percent of datasets where this location was a p53 peak/binding site

6.7 ABC 3D Genome

The Activity by Contact (ABC) dataset aims to “connect” p53RE to potential regulated genes. The original manuscript describes this work more in detail, but users of this database can obtain relevant information about potential direct p53 target genes, including the gene connected to the p53RE, the type of cell where this interaction takes place, and the location of the p53RE relative to the gene.

column_name information
unique_id chromosome_start_stop for p53RE
abc_chr chromosome where the gene connected to the p53RE is located
abc_TargetGeneTSS_start start position for the TSS of the gene
abc_TargetGeneTSS_end stop position for the TSS of the gene
abc_TargetGene_loc TSS location for the p53RE-connected gene in chrom_start_stop format
TargetGene name of the Target Gene in HGNC/gene symbol format
class the type of regulatory element where the p53RE is located (promoter, intergenic, genic)
TargetGeneIsExpressed Based on RNA-seq or other evidence, is the target gene expressed in the cell type of interest
isSelfPromoter Is the location of the p53RE within a promoter, and does that promoter likely regulate the gene
ABC.Score a proprietary score from the ABC model reflecting the confidence that the p53RE-containing element regulates the listed gene.
CellType the type of cell where the prediction for regulation of the promoter by the p53RE-containing element was made
ABC_intersection Is this p53RE found in a region covered by the ABC predictions

6.8 GeneHancer

Like the ABC dataset, the GeneHancer dataset aims to “connect” p53RE to potential regulated genes. The original manuscript describes this work more in detail, but the current release is proprietary and owned by the academic institution and non-profit that runs the GeneCards website. The full dataset for ALL GeneHancer data can be obtained from this company for academic and non-profit users by request.

In our database, users can obtain relevant information about potential direct p53 target genes determined by the GeneHancer group, including the gene connected to the p53RE, the “strength” of the association, and the predicted type of regulatory element where the p53RE is located.

column_name information
unique_id chromosome_start_stop for p53RE
gh_loc the location of the p53RE-containing regulatory element connected to the gene of interest.
GHid a unique identifier for the GeneHancer database
gh_connected_gene name of the Target Gene in HGNC/gene symbol format
gh_score a proprietary score from the GeneHancer model reflecting the confidence that the p53RE-containing element regulates the listed gene.
gh_elite Elite sites represent the highest confidence that the gene is regulated by the p53RE-containing element(0=no, 1=yes)
gh_regulatory_element_type the type of regulatory element where the p53RE is found, as defined by GeneHancer

6.9 MCF10A MicroC

column_name information
unique_id chromosome_start_stop for p53RE
bait_chrom chromosome location for the bait region
bait_start chromosome start position for the bait
bait_end chromosome stop position for the bait
prey_chrom chromosome location for the prey region
prey_start chromosome start position for the prey
prey_end chromosome stop position for the prey
bait_gene_name the HGNC/gene symbol for the gene of interest
baitprey_interaction is the p53RE found within the bait region or the prey region?

6.10 HCT116 pcHIC

column_name information
unique_id chromosome_start_stop for p53RE
bait_chrom chromosome location for the bait region
bait_start chromosome start position for the bait
bait_end chromosome stop position for the bait
prey_chrom chromosome location for the prey region
prey_start chromosome start position for the prey
prey_end chromosome stop position for the prey
baitprey_interaction is the p53RE found within the bait region or the prey region?
bait_ensembl_gene_id the ENSEMBL id for the gene of interest
bait_hgnc the HGNC/gene symbol for the gene of interest