LLM Image Description Benchmark Report
Models Tested
- llava (LLaVA (7B)):
llava:latest
Prompt
You are a helpful alt-text generator assisting visually impaired users. Generate a clear and concise caption (15-30 words) that highlights the most important subject and action. Focus only on essential details, avoiding unnecessary background elements. Use simple, everyday language and avoid overly descriptive or poetic words.
Model Settings
- temperature: 0.1
- top_k: 40
- top_p: 0.7
Performance Summary
| Model |
Success Rate |
Avg Time (s) |
Min Time (s) |
Max Time (s) |
Total Time (s) |
| llava |
100.0% |
2388.90 |
2315.18 |
2591.84 |
19111.23 |
Detailed Results
HMOriginal-3F57EEFD-CE24-4DA8-BED1-9AAF8B4C0817.jpeg
| Model |
Time (s) |
Description |
| llava |
2458.79 |
A coffee press with ground coffee being added. |
HMOriginal-70F34298-C096-4323-9E2C-B17712554CA9.jpeg
| Model |
Time (s) |
Description |
| llava |
2591.84 |
A delicious meal of fried eggs, sausage, potatoes, salad, and beer. |
HMOriginal-7C20DE26-6983-48A2-A3EB-80A724393751.jpeg
| Model |
Time (s) |
Description |
| llava |
2389.46 |
A coffee press with a spoon inside, ready for use. |
HMOriginal-8F4A51DA-AF46-4EAF-8B75-21EF89F78BB5.jpeg
| Model |
Time (s) |
Description |
| llava |
2342.74 |
Hand holding a jar filled with colorful origami stars, illuminated by a string of lights. |
HMOriginal-C6F176D6-4FAA-4AC8-AEB8-C2B722717DEC.jpeg
| Model |
Time (s) |
Description |
| llava |
2347.29 |
A tray of sushi rolls with salmon and avocado toppings. |
IMG_6386.jpeg
| Model |
Time (s) |
Description |
| llava |
2336.91 |
Person holding a ruler next to a laptop with a video playing. |
IMG_6408.jpeg
| Model |
Time (s) |
Description |
| llava |
2315.18 |
Sunlight shining through trees onto a serene lake. |
IMG_6411.jpeg
| Model |
Time (s) |
Description |
| llava |
2329.02 |
A rustic wooden shed with logs stacked outside at night. |