Revision as of 11:18, 17 January 2021 by Mirimia
- 1 How do I cite PastDB?
- 2 How is the inclusion level (PSI) of a given AS event quantified?
- 3 How is the gene expression (GE) of a given gene quantified?
- 4 What AS events are displayed in PastDB?
- 5 What do the colors and block thickness in the UCSC track mean?
- 6 How are the splice site scores calculated?
- 7 How is the impact on the ORF predicted?
- 8 How should I interpret the domain information?
- 9 How are the primers for RT-PCR validation designed?
- 10 What are the quality scores (QC) in the PSI plots?
- 11 Where does the PastDB logo come from?
How do I cite PastDB?
If you use data from PastDB, please cite our paper in Genome Biology:
Martin, G., Márquez, Y., Duque, P., Irimia, M. (2021). Alternative splicing landscapes in Arabidopsis thaliana across tissues and stress conditions highlight major functional differences with animals. Genome Biol, 22:35.
How is the inclusion level (PSI) of a given AS event quantified?
AS event quantification is performed using vast-tools. vast-tools uses different modules to quantify cassette exons, microexons, alternative 5' and 3' splice sites and intron retention (reflected in the 'vast-tools module' field in the ‘VastDB Features’ section of each event). For detailed information about how the quantification works, please refer to the Supplementary Information of Irimia et al., Cell 2014. Current inclusion data in PastDB corresponds to vast-tools v2.5.1.
How is the gene expression (GE) of a given gene quantified?
GE quantification is also performed using vast-tools. vast-tools maps the first 50 nucleotides of the forward read (if longer and paired end) to a library with one reference transcript per gene. GE levels are provided using the cRPKM metric (corrected [for mappability] Reads Per Kilobasepair and Million mapped reads), as detailed in Labbé et al., Stem Cells 2012. cRPKM can be converted to TPMs applying the following formula: TPM = 10^6 * cRPKM/sum_all(cRPKM). Moreover, vast-tools can provide tables with TPMs and raw counts.
What AS events are displayed in PastDB?
PastDB contains information for all AS events detected and quantified in vast-tools. However, only a selection of them are displayed in the UCSC track and in the Gene page. These are the events that have the higher PSI variation across samples. If you are interested in an event that is not displayed, you can directly look for it using the search box in the main page.
What do the colors and block thickness in the UCSC track mean?
The colors signify the different types of AS events, whereas the block thickness inform about the type of sequence.
- For any individual cassette exon event (including microexons), each C1, A and C2 exons are represented. The alternative exon (A) thus corresponds to the exon in between.
- Blue: simple cassette exon. “Simple” is defined as cassette exons for which ≥95% of the reads used to quantify their PSI come from the three reference exon-exon junctions, which are C1A, AC2 and C1C2. It corresponds to “S” or “MIC_S” in ‘Average complexity’.
- Purple: cassette exon event of intermediate complexity. This is defined as those alternative exons for which ≥50% and ≤95% of the reads used to quantify their PSI come from the three reference exon-exon junctions. Corresponds to “C1” or “C2” in ‘Average complexity’.
- Red: complex cassette exon event, for which <50% of the reads used to quantify their PSI come from the three reference exon-exon junctions. Corresponds to “C3”, “ME” or “MIC_M” in ‘Average complexity’.
- Black: groups multiple neighboring cassette exon events. Black tracks are only informative and do not link to any page in VASTDB.
- For Intron Retention events: Orange track. Thick blocks correspond to the intronic sequence, and the thin blocks to the adjoining exons (C1 and C2).
- For Alternative 3' and 5' splice site choice event: Dark Green and Light Green, respectively. In both cases, thick block corresponds to the alternative sequence, whereas the thin blocks are the constant exonic sequences (C1 and C2). For these events, at least two tracks are shown: for sequence exclusion (the most internal splice site; EventID-1/N) and for sequence inclusion.
How are the splice site scores calculated?
These scores were calculated using score5.pl and score3.pl from Yeo and Burge, 2004 . This method uses a position weight matrix and calculates deviation from the consensus. For 5’ splice sites, three exonic and six intronic positions surrounding the exon-intron junction were analyzed, and for the 3’ splice sites, 20 intronic and 3 exonic positions were analyzed.
How is the impact on the ORF predicted?
The pipeline to predict ORF impact is described in Irimia et al., 2014. Several things must be kept in mind when using this information as is:
- The prediction is based on the impact that the specific alternative sequence is likely to have when included or excluded from the transcript in isolation. That is, if there are other associated AS events (e.g. mutually exclusive or coordinated exons) the prediction may not be accurate.
- We keep improving and polishing these annotations, and new versions are often released. Make sure you use the most up-to-date version.
- Like any other prediction, our annotations may be inaccurate. Please check your results carefully and, as with any other dataset in PastDB, use at your own risk.
How should I interpret the domain information?
Domain information is currently available for cassette exons as well as for adjacent constitutive regions for INT, ALTA and ALTD events. When an exon (either C1, A or C2) overlap a PROSITE or PFAM domain, it shows the following information:
The meaning of each field is explained below:
- Dom_ID: Domain ID in either PROSITE or PFAM databases. For PROSITE, domains with ID P0* (high frequency motifs) are excluded.
- Dom_Name: Domain name as provided by PROSITE or PFAM databases.
- Type_Overlap: There are four possible ways in which an exon can overlap a protein domain:
- The whole exonic sequence fully overlaps with a domain (FE, Full Exon).
- The whole domain is fully encoded within an exon (WD, Whole Domain).
- The upstream (5') of the exon overlaps the domain (PU, Partial Upstream).
- The downstream (3') of the exon overlaps the domain (PD, Partial Downstream).
- %Dom_overlap: percent of the domain encode by the exon.
- %Exon_overlap: percent of the exon that overlaps the domain.
How are the primers for RT-PCR validation designed?
Primers are designed automatically using Primer3 (optimal primer lenght = 21 nt; optimal Tm = 61 ºC). As a general rule, primers are located in the C1 and C2 exonic sequences, so two RT-PCR products will be produced: a shorter one (from C1 to C2, skipping the A sequence) and a longer one (including the A sequence). This is provided in ‘Band lengths’. To minimize PCR amplification bias towards shorter amplicons (i.e. over-representation of the skipping form) and, at the same time, optimize the visualization in agarose gels, primers are designed based on the size relationship between the two predicted amplicons. This is based on the following rules:
- Alternative sequence LE < 15 nt => optimal skipping band size = 100 nt.
- Alternative sequence 15 ≤ LE < 25 nt => optimal skipping band size = 110 nt.
- Alternative sequence 25 ≤ LE < 40 nt => optimal skipping band size = 120 nt.
- Alternative sequence 40 ≤ LE < 65 nt => optimal skipping band size = 140 nt.
- Alternative sequence 65 ≤ LE < 100 nt => optimal skipping band size = 175 nt.
- Alternative sequence 100 ≤ LE < 200 nt => optimal skipping band size = 250 nt.
- Alternative sequence 200 ≤ LE < 300 nt => optimal skipping band size = 300 nt.
- Alternative sequence 300 ≤ LE < 1000 nt => optimal skipping band size = 350 nt.
- Alternative sequence LE > 1000 nt => primers not designed. A three-primer strategy is recommended.
What are the quality scores (QC) in the PSI plots?
As provided by vast-tools; from the README: Quality scores, and number of corrected inclusion and exclusion reads (qual@inc,exc):
- Score 1: Read coverage, based on actual reads (as used in Irimia et al., Cell 2014:
- For EX: OK/LOW/VLOW: (i) ≥20/15/10 actual reads (i.e. before mappability correction) mapping to all exclusion splice junctions, OR (ii) ≥20/15/10 actual reads mapping to one of the two groups of inclusion splice junctions (upstream or downstream the alternative exon), and ≥15/10/5 to the other group of inclusion splice junctions.
- For EX (microexon module): OK/LOW/VLOW: (i) ≥20/15/10 actual reads mapping to the sum of exclusion splice junctions, OR (ii) ≥20/15/10 actual reads mapping to the sum of inclusion splice junctions.
- For INT: OK/LOW/VLOW: (i) ≥20/15/10 actual reads mapping to the sum of skipping splice junctions, OR (ii) ≥20/15/10 actual reads mapping to one of the two inclusion exon-intron junctions (the 5' or 3' of the intron), and ≥15/10/5 to the other inclusion splice junctions.
- For ALTD and ALTA: OK/LOW/VLOW: (i) ≥40/20/10 actual reads mapping to the sum of all splice junctions involved in the specific event.
- For any type of event: SOK: same thresholds as OK, but a total number of reads ≥100.
- For any type of event: N: does not meet the minimum threshold (VLOW).
- Score 2: Read coverage, based on corrected reads (similar values as per Score 1).
- Score 3: Read coverage, based on uncorrected reads mapping only to the reference C1A, AC2 or C1C2 splice junctions (similar values as per Score 1). Always NA for intron retention events.
- Score 4: Imbalance of reads mapping to inclusion splice junctions (only for exon skipping events quantified by the splice site-based or transcript-based modules; For intron retention events, numbers of reads mapping to the upstream exon-intron junction, downstream intron-exon junction, and exon-exon junction in the format A=B=C)
- OK: the ratio between the total number of reads supporting inclusion for splice junctions upstream and downstream the alternative exon is < 2.
- B1: the ratio between the total number of reads supporting inclusion for splice junctions upstream and downstream the alternative exon is > 2 but < 5.
- B2: the ratio between the total number of reads supporting inclusion for splice junctions upstream and downstream the alternative exon is > 5.
- Bl/Bn: low/no read coverage for splice junctions supporting inclusion.
- Score 5: Complexity of the event (only for exon skipping events quantified by the splice site-based or transcript-based modules); For intron retention events, p-value of a binomial test of balance between reads mapping to the upstream and downstream exon-intron junctions, modified by reads mapping to a 200-bp window in the centre of the intron (see Braunschweig et al., 2014).
- S: percent of complex reads (i.e. those inclusion- and exclusion-supporting reads that do not map to the reference C1A, AC2 or C1C2 splice junctions) is < 5%.
- C1: percent of complex reads is > 5% but < 20%.
- C2: percent of complex reads is > 20% but < 50%.
- C3: percent of complex reads is > 50%.
- NA: low coverage event.
- inc,exc: total number of reads, corrected for mappability, supporting inclusion and exclusion.
Where does the PastDB logo come from?
The image depicts a pair of alternative splice acceptor sites (yellow) as the bridge between a seedling and a mature plant, representing plant development. The image is an original design by Yamile Márquez.