Automated Chest X-ray Classification with Deep Learning
Developed a multimodal AI system that classifies chest X-ray abnormalities and generates diagnostic reports. Achieved near-radiologist accuracy on key pathologies using ResNet-18 and LLaMA-3.2-11B Vision models trained on 220,000+ medical images.
Bridging the gap in medical imaging with AI
Chest X-rays remain the most common medical imaging procedure worldwide, yet their interpretation is prone to error and requires significant expertise. The MIMIC-CXR dataset provides an unprecedented opportunity to develop AI systems that can assist radiologists by automating initial screening and generating preliminary reports.
This project leverages the MIMIC-CXR database containing over 370,000 chest X-ray images with corresponding radiology reports. Our goal was to build a system capable of both classifying chest abnormalities and generating coherent diagnostic reports - essentially replicating key aspects of a radiologist's workflow.
The challenge extends beyond simple image classification. Medical imaging requires understanding subtle visual patterns, handling significant class imbalance (rare diseases), and generating reports that use precise medical terminology while remaining clinically useful.
A three-stage pipeline from raw images to diagnostic insights
We processed the massive MIMIC-CXR dataset by first filtering for posterior-anterior (PA) views to ensure diagnostic consistency. Images underwent standardization to 224×224 resolution with normalization (μ=0.5, σ=0.5). We merged metadata from multiple sources including patient records, CheXpert labels, and DICOM headers to create unified training manifests.
The final curated dataset contained 4,742 PA chest X-rays with binary labels (case/control) across 14 pathologies. We implemented medical-aware data augmentation including controlled rotation (±10°) and horizontal flips while preserving anatomical validity.
We evaluated multiple state-of-the-art CNN architectures including VGG-16, ResNet-18, ResNet-50, and DenseNet-121. Each model was initialized with ImageNet pretrained weights and fine-tuned on our medical imaging dataset. The final fully connected layer was modified for binary classification per pathology.
ResNet-18 emerged as the optimal architecture, balancing accuracy with computational efficiency. Training employed the AdamW optimizer (learning rate: 1e-4, weight decay: 0.01) with cosine annealing schedule. We implemented early stopping monitoring validation loss over 5 epochs to prevent overfitting.
Training configuration: Batch size 32, gradient accumulation for effective batch 128, mixed precision training on NVIDIA A100 GPUs. The model processed approximately 150 images/second during inference.
For report generation, we fine-tuned LLaMA-3.2-11B-Vision-Instruct, a cutting-edge multimodal transformer. To handle the model's massive parameter count on limited hardware, we implemented several optimization strategies:
The model was trained with instruction-based prompting using expert radiographer templates. Training spanned 2 epochs with 200 maximum steps, warmup over 5 steps, and a learning rate of 1e-4. We used an effective batch size of 32 (8 per device × 4 gradient accumulation steps) on 2,562 training samples.
Performance evaluation across multiple chest pathologies
Our ResNet-18 model achieved varying performance across different pathologies, with notable success in detecting common abnormalities. The results demonstrate both the promise and challenges of automated chest X-ray interpretation.
Disease | Accuracy | F1 Score | ROC-AUC |
---|---|---|---|
Pleural Effusion | 73.85% | 0.7733 | 0.8618 |
Atelectasis | 76.47% | 0.8421 | 0.8279 |
Consolidation | 73.17% | 0.2667 | 0.7270 |
Pneumonia | 67.92% | 0.5405 | 0.6914 |
Pneumothorax | 68.00% | 0.3333 | 0.6667 |
Cardiomegaly | 27.27% | 0.1111 | 0.6000 |
Strong Performance on Common Pathologies: The model excelled at detecting pleural effusion (AUC: 0.862) and atelectasis (F1: 0.842). These conditions present clear visual markers that CNNs can reliably identify - fluid levels for effusions and collapsed lung regions for atelectasis. The high F1 score for atelectasis indicates balanced precision-recall performance crucial for clinical deployment.
Challenges with Rare Conditions: Cardiomegaly detection proved particularly challenging (accuracy: 27.27%), likely due to severe class imbalance in the training data. This pathology requires subtle assessment of cardiac silhouette size relative to the thoracic cavity - a task that benefits from more training examples and potentially ensemble approaches.
Report Generation Quality: LLaMA-3.2-11B-Vision successfully generated coherent diagnostic narratives that appropriately used medical terminology. The model learned to structure reports with findings, impressions, and recommendations sections. However, occasional hallucinations of findings not present in images highlight the need for human oversight in clinical settings.
Our results suggest that AI-assisted chest X-ray interpretation is approaching clinical viability for common pathologies. The system could serve as an effective initial screening tool, prioritizing cases for radiologist review and providing preliminary reports to accelerate workflows.
The performance gap between common and rare conditions underscores a critical challenge in medical AI: dataset representation. Future work should focus on targeted data collection for underrepresented pathologies and synthetic data generation techniques to balance training distributions.
Several avenues could improve model performance:
Access the complete research paper and source code for the MIMIC-CXR medical imaging system.