Downloads

v1.1 - 1000 amino-acid maximum length

All isoform summary | .xlsx | .csv

MANE comparison summary | .xlsx | .csv

MANE comparison summary, filtered subset | .xlsx | .csv


Protein annotation on hg38 | .gtf | .gff

Transcript CDS | nucleotide .fa | amino-acid .faa

Predicted isoform protein structures | *.pdb.tar.gz

Additional isoform structure info | *.json.tar.gz

v1.0 - 500 amino-acid maximum length

All isoform summary | .xlsx | .csv

MANE comparison summary | .xlsx | .csv

MANE comparison summary, filtered subset | .xlsx | .csv


Protein annotation on hg38 | .gtf | .gff

Transcript CDS | nucleotide .fa | amino-acid .faa

Predicted isoform protein structures | *.pdb.tar.gz

Additional isoform structure info | *.json.tar.gz

README

All isoform summary:

Supplementary Table S1 in Sommer et al. 2022. Folding scores from ColabFold for all transcripts from a preliminary new build of the CHESS database that contained a protein-coding sequence (CDS) that was at most 500aa in length. For transcripts already contained in the published CHESS 2.2 database, the identifier from that database is provided. If the transcript maps to a known gene locus X but is a novel isoform, it is shown with the identifier CHS.X.altY. If a transcript occurs at a novel locus X, the identifier is hypothetical.X.Y, where Y identifies the isoform number. Additional columns show the gene name, the RefSeq ID (release 110), the GENCODE ID (release 40), the pLDDT (folding) score, and a flag indicated whether all intron boundaries (for multi-exon genes) are conserved in the mouse genome.

MANE comparison summary:

Supplementary Table S2 in Sommer et al. 2022. Folding scores and additional data for all CHESS transcripts that match genes in the MANE v1.0 dataset, limited to protein sequences under 500aa in length. Transcripts must overlap the annotated CDS of the MANE transcript to be included. Columns include: CHESS_ID_isoform, the CHESS identifier of the alternate isoform transcript; CHESS_ID_MANE, the CHESS identifier of the MANE transcript; gene, the gene name; aa_length_isoform, the amino-acid length of the alternate isoform transcript CDS; aa_length_MANE, the amino-acid length of the MANE transcript CDS; length_ratio, the ratio of the alternate isoform length to the MANE isoform length; pLDDT_isoform, the predicted folding score of the alternate isoform; pLDDT_MANE, the predicted folding score of the MANE isoform; pLDDT_ratio, the ratio of the alternate isoform folding score to the MANE isoform folding score; GTEx_samples_observed_isoform, the total number of GTEx samples where the alternate isoform was observed at least once; GTEx_samples_observed_MANE, the total number of GTEx samples where the MANE isoform was observed at least once; GTEx_top_tissue_name_isoform, the name of the tissue in which the alternate isoform was observed in the highest number of samples; GTEx_top_tissue_name_MANE, the name of the tissue in which the MANE isoform was observed in the highest number of samples; GTEx_top_tissue_TPM_isoform, the observed TPM of the alternate isoform in the named tissue; GTEx_top_tissue_TPM_MANE, the observed TPM of the MANE isoform in the named tissue; introns_conserved_in_mouse_isoform, an indicator of whether introns are conserved between the alternate human isoform and any annotated isoform in the GRCm38 mouse reference genome; introns_conserved_in_mouse_MANE, an indicator of whether introns are conserved between the MANE human isoform and any annotated isoform in the GRCm38 mouse reference genome.

MANE comparison summary, filtered subset:

A filtered set of CHESS transcripts compared to MANE according to the criteria detailed in the “Filtering MANE comparisons” section of the Methods. Uses the same column names as Supplementary Table S2.

Supplementary Table S3 in Sommer et al. 2022.

Protein annotation on hg38:

Annotations of all full-length transcripts and exons on hg38 in .GTF and .GFF format.

Transcript CDS sequences:

Nucleotide and amino-acid sequences of the CDS for each transcript in FASTA format.

Predicted isoform protein structures:

All predicted protein structures in .pdb format compressed to tar.gz

Additional isoform structure info:

All data used to generate contact and distogram plots for all predicted protein structures in .json format compressed to tar.gz