When I started the electronic component sorting project, I assumed the hard part would be the model. It wasn't. The hard part was the data , and the lighting.
Across four classes I ended up labeling more than 3,000 images. The mAP@50 number that eventually landed at 95% only stabilized once I rebuilt the dataset around the actual conveyor: the same backdrop, the same camera angle, and crucially, the same LED strip that the production rig would use.
A few things I'd do differently next time: start with a hardware-locked capture rig before labeling anything, hold back a true cold-start test set captured on a different day, and treat motion blur as a first-class class of failure rather than something to filter out.