Jeffrey Dick

Highlighted projects

AI4citations screenshot

Checks the veracity of scientific claims against cited text
Web app built with Gradio frontend, deployed on Hugging Face
Built API integration tests with continuous integration (CI) in GitHub Actions to ensure working product with each commit
App collects user feedback into Hugging Face datasets
The culmination of my ML engineering capstone project, demonstrating skills in deploying and monitoring models and feedback collection for constant improvements

pyvers banner

Implemented data processing from multiple data sources with normalized labels
Devised shuffled training method to achieve 7% improvement in F1 score over SOTA models (see blog post for details)
Implemented in PyTorch lightning for reproducible & scalable training
This project was the foundation for improved training using multiple datasets and leveraged software frameworks for model deployment locally or on cloud services, building my skills in data and software engineering

CHNOSZ banner

Developed open-source R package with reproducible workflows to model chemical systems and make intuitive visualizations
Maintained on CRAN since 2009 and cited more than 200 times by researchers around the world
Automated data consistency checks increase confidence in the community-driven thermodynamic database
Massive documentation effort, including help pages, examples, demos, and vignettes
API supports third-party contributions, including a Shiny frontend and a Python interface

CRAN packages I maintain:
- CHNOSZ: Thermodynamic calculations and diagrams for geochemistry
- canprot: Chemical analysis of proteins
- chem16S: Chemical features of microbial communities
orpML: Predicting oxidation-reduction potential from microbial abundances
- Supporting code for a research project in environmental microbiology
- Classical machine learning with scikit-learn and deep learning with PyTorch
- Improved performance of ML models by deriving features from thermodynamic models
R-svg-intepreter: R script to visualize an SVG file with base R graphics
- Written with AI assistance using Cursor
- Posted on the R-help mailing list to answer a user’s request

Created CHNOSZ Discussions forum on GitHub for user support and engagement
Answered LangChain question: Getting top k documents for ParentDocumentRetriever
- Made sense of the documentation to show how to pass keyword arguments to the search function
Answered LangChain issue: Using local Hugging Face pipeline
- Digested the error messsage and docs to correctly specify a missing component needed to build private chatbots
Answered LangGraph question: Extract tool name during streaming
- Found solution for displaying tool names used in a chatbot application (first answer, 6 months after OP)
Posted LangSmith issue: Fixes for experiment logging
- Updated evaluation notebooks for compatibility with current API (allows reproducing steps in LangSmith onboarding videos)
Committed to Gradio: Fix ValueError in Controlling the Reload demo
- Fixed bug in documentation example to correctly handle pipeline output (PR accepted by Gradio maintainers)

Reviewer for software and ML papers in science journals
Published 20+ first-author papers and 50+ coauthored papers
Long-term committment to reproducible research
- JMDplots package reproduces plots for 20 of my papers going back to 2006
- See my research overview for more info
Transferable skills: scientific computing, project management, writing, critical thinking