Image Recognition

Image recognition configuration guide, supporting both OCR and VLM recognition methods.

When enabled, text recognition will be performed automatically when images are uploaded.

Select OCR or VLM as the primary recognition method.

Uses traditional OCR algorithms to recognize text in images.

Select the language packs needed for recognition, supporting multiple languages. Use commas to separate multiple languages, e.g., chi_sim,eng.

When adding new language packs, corresponding data files will be downloaded automatically.

Currently uses Tesseract as the OCR engine, supporting multiple language packs.

Uses AI large models for image understanding and text recognition.

Select the AI model for image recognition. Needs to be pre-configured in model configuration.

Feature	OCR	VLM
Speed	Fast	Slower
Accuracy	Higher for standard text	Better for complex scenes
Network Requirement	Local execution	Network required
Cost	Free	May incur costs
Understanding	Text only	Can understand image content

Pure text extraction: Use OCR for speed and free usage
Complex scenarios: Use VLM for better recognition of tables, handwriting, complex layouts
Batch processing: Use OCR to save costs and time
High accuracy: Use VLM for more accurate recognition results