This is a book about the whole process of putting a machine learning model into production, all the engineering that sometimes the data scientists are not aware of or take for granted. It does not explain any algorithms or how to train a model but what to do before and after we have already trained one. I highly recommend getting this book, especially if you are:
- About to start a new machine learning development
- Coming from the engineering side of software
- Want to know what does it take for your new model to become production-ready
The book itself is light on code; there is a GitHub repo that the author uses to showcase an entire use case throughout the whole book. However, it is possible to just read the book without the need to look at the code since it is not the main reason someone would buy this book.
For each successful result published in a research paper or a corporate blog, there are hundreds of reasonable-sounding ideas that have entirely failed.
(...) much of the challenge in ML is similar to one of the biggest challenges in software—resisiting the urge to build pieces that are not needed yet.
An ML program doesn't just have to run-it should produce accurate predictive outputs.
Testing a model's behavior is hard. The majority of code in an ML pipeline is not about the training pipeline or the model itself, however.
In reality, most datasets are a collection of approximate measurements that ignore a larger context.
- The idea of using clustering algorithms to guide the exploratory data analysis when it comes to examine individual datapoints, rather than just randomly selecting instances.
- What needs to be happen with a model to be deployed on a mobile device.
- The idea of having a filtering model, before our actual inference model, that predicts whether the current input will yield an acceptable answer.
- The idea of multi-armed bandits to test variants of experiments
- The idea of federated learning
- Automated front-end development using deep learning
- pix2code: Generating Code from a Graphical User Interface Screenshot
- Entity Embeddings of Categorical Variables
- The Debugging Guide by The University of Chicago
- The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction
- Model Cards for Model Reporting
- On Challenges in Machine Learning Model Management