Audition Form Scanner
Local macOS desktop app that uses a connected webcam to live-preview an talent competition feedback form, auto-detects document alignment, captures a photo, runs OCR (including handwriting recognition) fully locally with no cloud calls, and appends extracted data to a fixed-schema CSV.
Brief
Judges of a talent competition would fill out forms on paper by hand and as they would be completed they would get live scanned in by placing them one at a time under the camera to be able to quickly tally up the results and return the scores AND handwritten feedback anonymized and averaged. Able to test results on audition forms where performers were in front of a panel of judges and had the same form filled out.
Stack
Python 3.11+. Tkinter for the camera preview GUI. OpenCV for the webcam feed and document-edge detection. Ollama running locally with llama3.2-vision:11b (about 7.9 GB, fits in 16 GB unified RAM on an M4 Mac Mini) as the primary OCR. Tesseract via pytesseract as the printed-text fallback. Document detection via adaptive-threshold contour and four-point perspective warp. Output: Python csv module writing to a fixed-schema CSV.
Why local
The forms include performer information that should not leave the operating laptop. Local-only OCR keeps the entire pipeline inside the device, so the tool can sit at an audition desk without a network.
Outcome
Ultimately i just ended up using my iPhone with built in OCR for a number of reasons. 1) The software just wasn't picking up the edges of the page very well. I didn't want to have to set up a tripod to use this and the amount of time it was requiring me to "hold still" felt laborious. And I just wanted to get this done. 2) I needed to move on because this project was an actual task that had someone waiting on the results on the other end of it.(The results of the auditions) Therefore i decided to cut my loses on this project and went ahead and used the tools I already had right in my pocket.
"There's no reason to try and reinvent the wheel." Has never felt more true.