My AI research assistant read hundreds of pages of climate action plans to help my team brainstorm Milton's Climate Action Plan. Here's how I did it.
If you work on operational tools and products powered by AI/ML, you may find that deployment in production is highly automated, while the model development process is still frustratingly manual and dominated by prototyping tools like Jupyter notebooks. Automating model development as well doesn’t just help Data Science teams get results more quickly, but also makes those results better and more trustworthy. Over the last year, I’ve been working with Mission Lane’s incredibly talented Data Science team to put this principle into practice.
Among those who create and use data analyses, it is commonly understood that a trustworthy analysis must be reproducible. In other words, given the same problem statement, an independent analyst should be able to gather their own data and analyze it to reach the same conclusions. If this isn’t true, it raises strong suspicions about the accuracy of the original analysis.
In practice, there are rarely sufficient resources available to fully replicate an analysis from scratch, except for the most controversial, significant, or risky decisions.
This may be the understatement of the year, but Generative AI is having a moment. Even as a long-time practitioner in Machine Learning, I was amazed how suddenly AI passed the Turing test and how rapidly it has advanced since hitting the mass market. Like many professionals, I’ve tried hard to incorporate it in my day-to-day workflows both because it obviously has immense promise, but also because I’m afraid of being left behind if I ignore it.
If you’re in the data business, analytical integrity is critical Selling data and analytics is a bit like practicing medicine. Your product is complex and error-prone, and if it’s not delivered with quality and accuracy you can do more harm than good. But your customer, while desperate to get a quality product from you, is in no position to judge the quality of your work directly. So instead they judge you on shallow proxies that are easier to see: how fast they can get an appointment, how professional your office decor looks, your bedside manner, etc.
I recently did an exploratory analysis combining several publicly available geospatial datasets about the town of Milton, MA (where I live!) and surrounding communities. I had a few goals in putting this together:
Provide an example of a Python project structure that is portable, transparent and reproducibile using modern tools for data versioning, document generation, and python environment management. Namely, dvc for data version control and DAGs, sphinx for document generation, and conda for environment management.