PharmaGEO Help Guide
Module 1: RNA-seq Datasets
Purpose
The RNA-seq Datasets module provides access to drug-related bulk RNA-seq and microarray transcriptomic datasets systematically curated from the GEO database. This section serves as the primary entry point for dataset-specific analyses of pharmaco-transcriptomic studies.

Key Functionalities
1. Dataset Selection and Exploration
- Interactive Table: Browse through 7,780 curated pharmaco-transcriptomic datasets
- Metadata Filtering: Filter datasets by drug name, organism, cell type, dose, duration, and experimental conditions
- GEO Integration: Direct links to original GEO accession numbers (GSE) for data provenance
- Comprehensive Metadata: View dataset-specific information including drug name, dose, duration, cell type, organism, and platform details

- Standardized Annotations: Each dataset includes the following metadata fields:
- standard_name: Standardized drug name using PubChem database nomenclature
- drug_name: Original drug name as recorded in the GEO database
- organism: Species of the sequenced samples
- cell_type: Cell line or tissue type used in the sequencing experiment
- dose: Drug dosage concentration administered
- duration: Duration of drug treatment
- gse_id: Unique identifier for each dataset in the GEO database (clickable link to GEO page)
- GPL: Sequencing platform identifier providing technical parameter information
- ctrl_ids: Sample identifiers used as control group in differential expression analysis
- pert_ids: Sample identifiers used as treatment group in differential expression analysis
- type: Perturbation type - PharmGEO database exclusively contains drug-type perturbations
- Exp_type: Experimental platform type: includes 4,438 microarray datasets (Expression profiling by array) and 3,342 high-throughput sequencing datasets (Expression profiling by high throughput sequencing)

3. Differential Expression Analysis
- Volcano Plot Visualization: Interactive plots showing log fold change vs. statistical significance
- Gene Expression Tables: Comprehensive lists of differentially expressed genes with:
- genes: Gene symbols with direct links to GeneCards
- logFC: Log fold change values
- p.value: Filtered p.value < 0.05

4. Enrichment Analysis
Three complementary enrichment analysis approaches with interactive dot plot visualizations:
Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Analysis
- Interactive dot plots showing enriched pathways
- X-axis: Gene count, Y-axis: Pathway descriptions
- Color coding by statistical significance (p-value)
- Point size representing gene count
Gene Ontology (GO) Analysis
- Interactive dot plots for biological processes, molecular functions, and cellular components
- X-axis: Gene count, Y-axis: GO term descriptions
- Color coding by statistical significance (p-value)
- Point size representing gene count
Gene Set Enrichment Analysis (GSEA)
- Interactive dot plots showing gene set enrichment
- X-axis: Normalized enrichment score (RichFactor), Y-axis: Gene set descriptions
- Color coding by statistical significance (p-value)
- Point size representing gene count

Purpose
The Drug Information module enables detailed exploration of individual drugs and their associated gene signatures. This section focuses on drug-centric analysis with emphasis on consistency and variability metrics.

Key Functionalities
1. Drug Selection and Properties
- Comprehensive Drug Database: Access to 1,311 unique drugs.

- Molecular Information: Detailed drug properties including:
- Chemical Structure Visualization: 2D molecular structure display
- Drug Name: Standardized drug name
- CID: Canonical chemical identifier
- Formula: Molecular formula
- Weight: Molecular weight in g/mol
- Description: Comprehensive pharmacological description.
- IUPAC Name: International Union of Pure and Applied Chemistry nomenclature
- InChI: International Chemical Identifier string
- InChIKey: Standardized InChI key for database searching
- SMILES: Simplified Molecular Input Line Entry System representation

2. Highly Consistent Gene Analysis
Gene Mean Consistency Score (GMCS)
- Formula: GMCS = (∑ᵢ₌₁ⁿ countᵢ) / n
- Purpose: Measures gene expression consistency across datasets
- Interpretation: Higher values indicate more consistent drug-gene associations
3. Gene Association Tables
- High-Confidence Genes: Filtered for Top 25% consistency and Bottom 25% variability
- Literature Support: PubMed co-occurrence counts for drug-gene pairs
- ATC Classification: Anatomical Therapeutic Chemical classification system integration
- Cross-References: Direct links to external databases (GeneCards, PubChem, WHO-ATC)

Purpose
The Gene Information module provides gene-centric analysis, allowing users to explore which drugs affect specific genes and understand gene-drug relationship patterns.

Key Functionalities
1. Gene Selection Options
- Hot Genes: Quick access to 20 frequently studied genes (TP53, EGFR, KRAS, etc.)
- Comprehensive Search: Search through complete gene database with live filtering

2. Gene Annotation
- Gene Cards Integration: Comprehensive gene information including:
- Official gene names and symbols
- Chromosomal location
- Entrez and MIM identifiers
- Gene aliases and alternative names
- Quick link to GeneCards

3. Drug Association Analysis
- Associated Drugs Table: Comprehensive list of drugs affecting the selected gene
- Quality Metrics: GMCS values for each drug-gene pair
- Therapeutic Classification: ATC Level 3 codes and descriptions
- Literature Evidence: PubMed co-occurrence statistics

Module 4: Drug-Drug Interaction
Purpose
The Drug-Drug Interaction module represents the first systematic transcriptomic-level analysis of drug interactions, providing mechanistic insights through shared gene targets.

Key Functionalities
1. Drug Pair Selection
- Interactive Selection: Choose two drugs from the comprehensive database
- Validation: Automatic filtering for available interaction data
- Real-time Updates: Dynamic drug B options based on drug A selection

- Dual Drug Profiles: Side-by-side comparison of drug properties
- Chemical Structures: Visual representation of molecular structures
- Pharmacological Properties: Detailed molecular and pharmacokinetic information

3. Gene Intersection Analysis
- Shared Target Identification: Genes affected by both drugs
- Directional Analysis: Expression change patterns for each drug
- Interaction Categories (where A = Drug A, B = Drug B):
- A_up_B_down: Gene upregulated by Drug A but downregulated by Drug B
- A_down_B_up: Gene downregulated by Drug A but upregulated by Drug B
- A_up_B_up: Gene upregulated by both Drug A and Drug B
- A_down_B_down: Gene downregulated by both Drug A and Drug B

4. Multi-Source DDI Evidence
Integration of three major drug interaction databases:
- DDInter: Comprehensive drug-drug interaction database
- MecDDI: Mechanism-based drug-drug interactions
- RxNav: Clinical drug interaction information

5. Network Visualization
- Interactive Network Graph: ECharts-based visualization showing:
- Drug nodes (distinct colors)
- Gene nodes (categorized by interaction type)
- Connection patterns between drugs and genes
- Customizable Display: Adjustable top N genes per category
- Dynamic Updates: Real-time network modification
