Data Resources


Welcome to our Data Resources page. Below you will find a list of relevant DNA databases as well as some links to free online training.

You can find a list of available data sets from VALIDATE projects on our Data Sharing page.


Expand All


Addgene Logo


Plasmid repository, archives and distributes plasmids for scientists, while also providing free molecular biology resources.


blast logo

Basic Local Alignment Search Tool

BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance. 


biocyc logo


Microbial genome Web portal that combines thousands of genomes. It  provides an extensive range of query tools, visualization services and analysis software.


cattle gene atlas logos

Cattle Gene Altas

Website shows the expression of genes of interest based on Ensembl gene ID or gene symbol, and plot them according to tissue types. 


centre for genomic epidemiology

Center for Genomic Epidemiology

Website For  the analysis of  bacterial genome.



chip atlas logo


ChIP-Atlas is an integrative and comprehensive database for visualizing and making use of public ChIP-seq data. ChIP-Atlas covers almost all public ChIP-seq data submitted to the SRA (Sequence Read Archives) in NCBI, DDBJ, or ENA, and is based on over 144,000 experiments.


dna vax db logo


A web-based DNA vaccine database and analysis system that curates, stores, and analyzes DNA vaccines and DNA vaccine plasmid vectors. DNAVaxDB includes only those DNA vaccines that have been verified to induce protection in at least a laboratory animal model.






Interactive and Collaborative Gene List Enrichment Analysis Tool.



e ensembl logo


Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.



veupath db logo


Web Portal for accessing genomic-scale datasets associated with the diverse eukaryotic microbes.



gprofiler logo


A gene-centric data integrator with web UI and API services.



Gene atlas


Gene ATLAS is a large database of associations between hundreds of traits and millions of variants using the UK Biobank cohort.




Gene Ontology Enrichment analysis and visualization tool.


Huvax: Licensed Human Vaccines

Huvax Logo

A web-based human licensed vaccine database. Huvax collects, annotates and analyses licensed human vaccines around the world. Currently it contains all licensed human vaccines in the US and Canada, and many licensed human vaccines from other countries. Huvax provides a user-friendly web interface for you to search, compare, and analyze different vaccines.





Website in support of the NIH mission to share data with the public.



iedar logo

Immune Epitope Database and Analysis Resources

IEDB catalogs experimental data on antibody and T cell epitopes studied in different species  in the context of infectious disease, allergy, autoimmunity and transplantation.  IEDB could help in the prediction and analysis of epitopes.



Innate DB


Publicly available database of the genes, proteins, experimentally-verified interactions and signaling pathways involved in the innate immune response of humans, mice and bovines to microbial infection. The database captures an improved coverage of the innate immunity interactome by integrating known interactions and pathways from major public databases together with manually-curated data into a centralised resource. 



intergrated dna technologies

Integrated DNA Technologies

Tool to design qPCR primers.



metascape logo


Metascape is a free gene annotation and analysis resource that helps biologists make sense of one or multiple gene lists.



premier biosfot logo

Net Primer

Primer Analysis Software. It analyzes the secondary structure, melting temperature, and  the best primer pairs for given experimental conditions.



DTU Logo

NetCTL - 1.2


Server predicts CTL epitopes in protein sequences.

NetMHCIIpan - 4.0

The NetMHCIIpan-4.0 server predicts peptide binding to any MHC II molecule of known sequence using Artificial Neural Networks.


plasmodb logo


Genome database for the genus Plasmodium.


polyphen 2 logo

Polyphen 2

PolyPhen-2 (Polymorphism Phenotyping v2) is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations.


TB Database

tb database logo

TBDB is an integrated database to genome sequence, expression data and literature for tuberculosis. It contains genome sequence data for Mycobacterium tuberculosis strains and other sequenced Mycobacteria. It offers a collection of tools for the visualization, analysis and data download.


The Human Protein Altas

the human protien atlas logo

This Atlas contains information regarding the expression profiles of human genes both on the mRNA and protein level. The protein expression data from 44 normal human tissue types is derived from antibody-based protein profiling using immunohistochemistry. The protein data covers 15313 genes (78%) for which there are available antibodies. The mRNA expression data is derived from deep sequencing of RNA (RNA-seq) from 37 different normal tissue types. It also contain information about the expression and spatio-temporal distribution of proteins within human cells. 


tm calculator

Tm Calculator

This tool calculates the Tm of primers and estimates an appropriate annealing temperature when using different DNA polymerases.


vaxign vaccine design logo

Vaxign: Vaccine Design

Vaxign (Vaccine Design) is a vaccine target prediction and analysis system based on the principle of reverse vaccinology.




Server for alignment-independent prediction of protective antigens and subunit vaccines.



VaxJo Logo

A program to analyze vaccine adjuvants used in the vaccines collected in the VIOLIN vaccine database. A program to analyze vaccine adjuvants used in the vaccines collected in the VIOLIN vaccine database.



Vevax: Licensed Veterinary Vaccines

Vevax Logo

A web-based licensed veterinary vaccine database. Vevax collects, annotates and analyses licensed veterinary vaccines around the world. Current Vevex focuses on the USA-licensed veterinary vaccines. Vevex contains all licensed veterinary vaccines in the US. Vevax provides a user-friendly web interface for you to search, compare, and analyze different vaccines.




A database collect and analyze vaccine vectors used in vaccine development and research for diseases important for the public health.




A Database of Virulent Genes used for Development of Live Attenuated Vaccines.

Public Access Training

Expand All

A basic task in the analysis of count data from RNA-seq is the detection of differentially expressed genes. The count data are presented as a table which reports, for each sample, the number of sequence fragments that have been assigned to each gene. Analogous data also arise for other assay types, including comparative ChIP-Seq, HiC, shRNA screening, and mass spectrometry. An important analysis question is the quantification and statistical inference of systematic changes between conditions, as compared to within-condition variability. The package DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models; the estimates of dispersion and logarithmic fold changes incorporate data-driven prior distributions. This vignette explains the use of the package and demonstrates typical workflows. An RNA-seq workflow on the Bioconductor website covers similar material to this vignette but at a slower pace, including the generation of count matrices from FASTQ files. DESeq2 package version: 1.30.0

Michael I. Love, Simon Anders, and Wolfgang Huber

Find the full course online.


In this course you will discuss some of the questions that can be addressed using scRNA-seq as well as the available computational and statistical methods available. The course is taught through the University of Cambridge Bioinformatics training unit, but the material found on these pages is meant to be used for anyone interested in learning about computational analysis of scRNA-seq data. The course is taught twice per year and the material here is updated prior to each event.

Find out more online.

R Logo

R is a licence free programming language used for statistical computing and data science and has been used to visualise everything from Market trends to vaccine efficacy. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. 

Mirvat has selected training that will help you enhance your bioinformatic skills:


Online Courses: