๐
Privacy First
Your file is processed entirely in memory and never stored.
All data is cleared automatically after analysis.
We use dynamic rendering โ nothing is saved to disk or database.
๐ Upload Your Dataset
Supported formats: .csv, .xlsx, .xls, .json, .feather
๐ง How It Works
This app runs a full data profiling and quality audit using statistical methods โ all locally in your browser session. No code, no setup, just results.
-
๐ Comprehensive Column Classification
Columns are automatically categorised into numeric, boolean, categorical, datetime, or timeseries types. Timeseries checks include monotonicity and time span detection. -
๐ Descriptive Statistics
For every numeric column, we computemean
,standard deviation
,min
,max
,variance
,skewness
, andkurtosis
. Outliers are detected using Z-scores with thresholds at 3ฯ, 4ฯ, and 5ฯ. -
๐ Distribution Diagnostics
Skewed columns are flagged and classified (moderate or severe). You'll get transformation suggestions like log, Box-Cox, and Yeo-Johnson โ with histograms for visual inspection. -
๐จ Outlier Detection & Visualisation
Outliers are split by severity level and shown with plots. This helps identify influential points or data errors before they affect your models. -
๐งช Data Quality Audit
The app automatically checks for:- โ๏ธ Missing data (0โ29% or 30%+ severity)
- โ๏ธ Fully null columns
- โ๏ธ Duplicate rows
- โ๏ธ Constant and low-variance columns (including categorical and boolean)
- โ๏ธ High and medium cardinality features
- โ๏ธ Imbalanced boolean features (over 70% one class)
-
๐ Correlation Analysis
Computes Pearson correlation for numeric features and Cramรฉrโs V for categoricals. Visualises numeric correlation heatmaps and flags highly collinear pairs. -
๐งฎ Multicollinearity Detection (VIF)
Applies preprocessing (null filtering, constant drop, imputation), then calculates Variance Inflation Factors. Warns about numeric features with VIF > 5 or 10. -
๐ก Actionable Insights
Recommendations are shown for each issue โ complete with example code, severity badges, and justifications so you can clean data efficiently.
๐ All data stays in memory โ nothing is stored or shared. This is a fully stateless, secure analysis workflow.
Processing your dataset... โณ