Model optimization, quantization, and deployment to mobile and IoT
Quantization (INT8, INT4, dynamic vs static), pruning, knowledge distillation, and neural architecture search for efficiency
TF Lite conversion and inference, Core ML with coremltools, ONNX Runtime, TensorRT, and model benchmarking
TF Lite Micro, edge TPUs, Jetson deployment, real-world constraints, and OTA model updates