Research
Formal investigations in machine learning, quantitative modeling, and data-driven analysis, oriented toward reproducibility and tractable results.
Studies & analyses
DistressX
Financial distress prediction modeled across approximately 105,000 firm-year observations. The pipeline combines structured financial signals with temporal sequence models. Gradient-boosted ensembles and LSTM variants used for multi-horizon forecasting on holdout evaluation sets.
Student Engagement Analysis
Clustering and PCA applied to longitudinal student engagement records from ProofX infrastructure. Identifies behavioral cohorts and participation trajectory signals from institutional data. Conducted in collaboration with MIT E14 Lab partners; documented as a technical report.
Integer Explorations
A working archive of exploratory mathematical inquiry, not a catalog of proven results. Entries document observations, computational traces, and open questions encountered while building and running ProofX infrastructure. The focus is on process and tractable partial findings rather than claims of resolution.
Collatz Stopping Time Distribution
Distribution of stopping times across initial values up to 228. Measured trajectory length to 1 for each seed. Distribution exhibits a heavy tail with outlier clustering near powers of 2. No deterministic pattern in outlier placement identified.
Parity Transition Density
Computed odd-to-even transition ratios within Collatz trajectories across a stratified sample of initial values. Investigated whether parity alternation frequency correlates with stopping time. Preliminary analysis shows no consistent monotonic relationship.
Generalized 3n + k Extensions
Investigated Collatz-type variants using 3n + k for odd values of k. Examined cycle existence and divergence behavior across selected parameter values. Some variants exhibit verifiably periodic behavior under constrained domains; the general case remains open.
Sequence Compression Analysis
Applied Lempel-Ziv complexity to binary-encoded Collatz trajectory strings. Compression ratios suggest moderate structure; sequences are neither maximally compressible nor maximally random under LZ encoding. Boundary behavior at convergence accounts for a disproportionate fraction of compressible segments.