Malware Benchmark Dataset
The Malware Benchmark Dataset Application is designed to streamline the creation and management of standardized malware datasets. It enables researchers and practitioners to analyze malware features efficiently with a secure and user-friendly interface.
Overview
The application is composed of the following components:
- Web Interface: Built with Flask to facilitate user interaction.
- Database: PostgreSQL ensures secure and efficient data storage.
- Docker Containers: For isolated and reproducible execution of feature extraction scripts.
Features
- File Uploads: Users can securely upload raw binary files for analysis.
- Feature Extraction: Automated analysis includes strings extraction and EMBER dataset analysis.
- User Management: Provides secure login and user profiles.
- Result Visualization: Enables viewing and downloading analysis data.
- Secure Execution: Docker ensures environment isolation and reproducibility.
Architecture
The system architecture illustrates the interaction between the web interface, database, Docker containers, and user workflows.
System Diagram

Sequence Diagram

Getting Started
Follow these steps to set up and run the application:
Prerequisites
- Python 3.9+: Required for running the Flask application.
- PostgreSQL: For database management.
- Docker: For running isolated analysis containers.
Follow the steps outlined in the Application_Setup_Steps.txt file.
Future Work
- Automated Container Orchestration: Transition to Kubernetes for automated scaling, deployment, and monitoring of Docker containers.
- Expanded Feature Extraction Tools:
- Integration with Ghidra for reverse engineering.
- Graphical byte views for visual analysis.
- Entropy analysis for deeper insights.
- Enhanced User Management: Implement role-based access control (RBAC) for tailored user permissions.
- Improved Visualization: Develop dashboards with enhanced analytics for malware pattern comparison.