Bridging Science & Industry Through AI
I'm a programmer with deep academic research roots, recently completing an ML engineering bootcamp focused on building dependable AI systems. My passion lies in creating AI that's not just smart, but trustworthy and adaptable—systems that learn from data and solve real problems.
Currently: Developing cloud migration strategies and intelligent AI tools to accelerate scientific research teams.
My Journey: Where Science Meets AI Innovation
The Problem I Solve: As a programmer with deep academic roots, I’ve spent over a decade turning complex research questions into elegant software solutions. I’ve completed an intensive ML engineering bootcamp to pursue my passion for building AI systems that are dependable, adaptable, and impactful.
My unique perspective combines rigorous scientific methodology with modern ML engineering practices—creating solutions that researchers and industry professionals can trust with their most important data.
My Mission: Transform cutting-edge research into production-ready AI systems that researchers, organizations, and communities can actually use and trust.
⭐ Featured Projects
AI4citations: Combating Scientific Misinformation with AI
The Challenge: Scientific misinformation spreads when citations don’t actually support the claims being made.
The Solution: Shuffled training across multiple datasets lays the groundwork for breakthrough improvements in citation verification.
Impact & Innovation:
- Developed the pyvers library (based on PyTorch) for data preprocessing and shuffled training
- 7% improvement in F1 score over state-of-the-art models through shuffled training methodology
- Real-time web app with continuous feedback collection for model improvement
- Production-ready deployment with CI/CD pipeline ensuring reliability with every update
- Processes 500k+ verifiable claims with normalized multi-dataset training
This project showcases my ability to take research from concept to production, demonstrating skills in model optimization, deployment, and building systems that improve over time.
R-help chat: Making Knowledge Accessible Through Conversational AI
The Challenge: Decades of valuable programming discussions buried in email archives, difficult to search effectively.
My Innovation: A RAG-powered chatbot that transforms static archives into interactive knowledge discovery.
Technical Achievements:
- Local models for better privacy and cost reduction vs OpenAI
- 10+ % accuracy improvement through hybrid dense+sparse retrieval
- LangGraph implementation with source citations for trustworthy responses
- Multi-turn conversation interface for complex technical queries
This project demonstrates my expertise in modern NLP architectures, cost-effective AI deployment, and creating user experiences that unlock hidden value in existing data.
CHNOSZ: Building Scientific Infrastructure That Lasts
The Vision: Scientific software that researchers worldwide can depend on for their most critical work.
15+ Years of Impact:
- 200+ citations from researchers worldwide since 2009
- 90% test coverage with automated data consistency checks
- Maintained on CRAN for 15+ years through multiple R versions
- Active community supported through GitHub Discussions
Architecture for Longevity:
- Extensible API supporting third-party integrations (Shiny frontend, Python interface)
- Comprehensive documentation ecosystem (help pages, examples, demos, vignettes)
- Automated data validation to catch common data entry errors
This isn’t just software—it’s infrastructure that enables scientific discovery. The longevity and reliability demonstrate my commitment to building systems that stand the test of time.
🔬 Emerging Projects: Pushing AI Boundaries
Statistical AI Agents: Autonomous Data Analysis
🚧 [In Development] 🚧 Building AI agents that can independently perform statistical analysis and generate insights
Docker Microservices for Science: Scalable Computing Architecture
🚧 [Planning Phase] 🚧 Containerized scientific computing services for cloud-native research workflows
💡 What Sets Me Apart
I don’t just implement algorithms—I solve meaningful problems:
🎯 Problem-First Thinking: Academic training taught me to ask the right questions before building solutions.
🛠️ Production-Ready Mindset: 15+ years maintaining production software means I build for reliability, scalability, and long-term sustainability from day one.
🤝 Community Builder: Successfully grew and maintained global research communities. I understand that great AI systems require great user experiences and ongoing support.
📊 Data Storyteller: Published 70+ peer-reviewed papers requiring clear communication of complex technical concepts to diverse audiences.
🔄 Continuous Learner: From R packages to PyTorch models to LangChain applications—I adapt to new technologies while maintaining deep expertise.
🔧 Core Technical Skills
AI & Machine Learning: PyTorch • scikit-learn • NLP • Large Language Models • Fine-tuning • RAG Systems
MLOps & Production: Docker • AWS • CI/CD • Testing • Monitoring • Model Deployment • Hugging Face
Data Engineering: Python • SQL • R • Data Pipelines • Multi-source Integration • Quality Validation
Development: Git • Linux • Shell • Jupyter • API Design • Open Source Development
🎓 Academic Foundation Meets Industry Innovation
My academic background isn’t just about degrees—it’s about transferable skills that make me a stronger ML engineer:
🔬 Research Methodology: Hypothesis formation, experimental design, and rigorous evaluation
📝 Technical Communication: Translating complex concepts for diverse stakeholders
🏆 Project Leadership: Managing long-term projects from conception to community adoption
🌍 Global Collaboration: Working with international teams across time zones and cultures
⚡ Innovation Under Constraints: Creating solutions with limited resources and high quality standards
Let’s build AI that doesn’t just work today—but works reliably for years to come.