Sale!

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications, ISBN-13: 978-1098107963

Original price was: $50.00.Current price is: $14.99.

Description

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications, ISBN-13: 978-1098107963

[PDF eBook eTextbook] – Available Instantly

  • Publisher: ‎ O’Reilly Media; 1st edition (June 21, 2022)
  • Language: ‎ English
  • 386 pages
  • ISBN-10: ‎ 1098107969
  • ISBN-13: ‎ 978-1098107963

Machine learning systems are both complex and unique. Complex because they consist of many different components and involve many different stakeholders. Unique because they’re data dependent, with data varying wildly from one use case to the next. In this book, you’ll learn a holistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.

Author Chip Huyen, co-founder of Claypot AI, considers each design decision–such as how to process and create training data, which features to use, how often to retrain models, and what to monitor–in the context of how it can help your system as a whole achieve its objectives. The iterative framework in this book uses actual case studies backed by ample references.

This book will help you tackle scenarios such as:

  • Engineering data and choosing the right metrics to solve a business problem
  • Automating the process for continually developing, evaluating, deploying, and updating models
  • Developing a monitoring system to quickly detect and address issues your models might encounter in production
  • Architecting an ML platform that serves across use cases
  • Developing responsible ML systems

Table of Contents:

Preface
Who This Book Is For
What This Book Is Not
Navigating This Book
GitHub Repository and Community
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
1. Overview of Machine Learning Systems
When to Use Machine Learning
Machine Learning Use Cases
Understanding Machine Learning Systems
Machine Learning in Research Versus in Production
Machine Learning Systems Versus Traditional Software
Summary
2. Introduction to Machine Learning Systems Design
Business and ML Objectives
Requirements for ML Systems
Reliability
Scalability
Maintainability
Adaptability
Iterative Process
Framing ML Problems
Types of ML Tasks
Objective Functions
Mind Versus Data
Summary
3. Data Engineering Fundamentals
Data Sources
Data Formats
JSON
Row-Major Versus Column-Major Format
Text Versus Binary Format
Data Models
Relational Model
NoSQL
Structured Versus Unstructured Data
Data Storage Engines and Processing
Transactional and Analytical Processing
ETL: Extract, Transform, and Load
Modes of Dataflow
Data Passing Through Databases
Data Passing Through Services
Data Passing Through Real-Time Transport
Batch Processing Versus Stream Processing
Summary
4. Training Data
Sampling
Nonprobability Sampling
Simple Random Sampling
Stratified Sampling
Weighted Sampling
Reservoir Sampling
Importance Sampling
Labeling
Hand Labels
Natural Labels
Handling the Lack of Labels
Class Imbalance
Challenges of Class Imbalance
Handling Class Imbalance
Data Augmentation
Simple Label-Preserving Transformations
Perturbation
Data Synthesis
Summary
5. Feature Engineering
Learned Features Versus Engineered Features
Common Feature Engineering Operations
Handling Missing Values
Scaling
Discretization
Encoding Categorical Features
Feature Crossing
Discrete and Continuous Positional Embeddings
Data Leakage
Common Causes for Data Leakage
Detecting Data Leakage
Engineering Good Features
Feature Importance
Feature Generalization
Summary
6. Model Development and Offline Evaluation
Model Development and Training
Evaluating ML Models
Ensembles
Experiment Tracking and Versioning
Distributed Training
AutoML
Model Offline Evaluation
Baselines
Evaluation Methods
Summary
7. Model Deployment and Prediction Service
Machine Learning Deployment Myths
Myth 1: You Only Deploy One or Two ML Models at a Time
Myth 2: If We Don’t Do Anything, Model Performance Remains the Same
Myth 3: You Won’t Need to Update Your Models as Much
Myth 4: Most ML Engineers Don’t Need to Worry About Scale
Batch Prediction Versus Online Prediction
From Batch Prediction to Online Prediction
Unifying Batch Pipeline and Streaming Pipeline
Model Compression
Low-Rank Factorization
Knowledge Distillation
Pruning
Quantization
ML on the Cloud and on the Edge
Compiling and Optimizing Models for Edge Devices
ML in Browsers
Summary
8. Data Distribution Shifts and Monitoring
Causes of ML System Failures
Software System Failures
ML-Specific Failures
Data Distribution Shifts
Types of Data Distribution Shifts
General Data Distribution Shifts
Detecting Data Distribution Shifts
Addressing Data Distribution Shifts
Monitoring and Observability
ML-Specific Metrics
Monitoring Toolbox
Observability
Summary
9. Continual Learning and Test in Production
Continual Learning
Stateless Retraining Versus Stateful Training
Why Continual Learning?
Continual Learning Challenges
Four Stages of Continual Learning
How Often to Update Your Models
Test in Production
Shadow Deployment
A/B Testing
Canary Release
Interleaving Experiments
Bandits
Summary
10. Infrastructure and Tooling for MLOps
Storage and Compute
Public Cloud Versus Private Data Centers
Development Environment
Dev Environment Setup
Standardizing Dev Environments
From Dev to Prod: Containers
Resource Management
Cron, Schedulers, and Orchestrators
Data Science Workflow Management
ML Platform
Model Deployment
Model Store
Feature Store
Build Versus Buy
Summary
11. The Human Side of Machine Learning
User Experience
Ensuring User Experience Consistency
Combatting “Mostly Correct” Predictions
Smooth Failing
Team Structure
Cross-functional Teams Collaboration
End-to-End Data Scientists
Responsible AI
Irresponsible AI: Case Studies
A Framework for Responsible AI
Summary
Epilogue
Index
About the Author

Chip Huyen (https://huyenchip.com) is a co-founder of Claypot AI, a platform for real-time machine learning. Through her work at NVIDIA, Netflix, and Snorkel AI, she has helped some of the world’s largest organizations develop and deploy machine learning systems. She teaches CS 329S: Machine Learning Systems Design at Stanford, whose lecture notes this book is based on.

LinkedIn included her among Top Voices in Software Development (2019) and Top Voices in Data Science & AI (2020). She is also the author of four bestselling Vietnamese books, including the series Xach ba lo len va Di (Pack Your Bag and Go). She also runs a Discord server on MLOps with over 6,000 members.

What makes us different?

• Instant Download

• Always Competitive Pricing

• 100% Privacy

• FREE Sample Available

• 24-7 LIVE Customer Support