Microsoft Model Performance

7 Microsoft models evaluated

Model Performance

Rank	Model	Accuracy	Correct	Total	Incorrect	Errors
1	`Microsoft/Phi-4-Reasoning-Plus`	90.4 ± 5.7%	53	58	4	1
2	`Microsoft/Phi-4-Multimodal-Instruct`	73.9 ± 13.4%	21	28	7	0
3	`Microsoft/Phi-4`	65.5 ± 18.2%	12	18	5	1
4	`Microsoft/Wizardlm-2-8x22b`	46.0 ± 26.4%	5	11	2	4
5	`Microsoft/Phi-3-Medium-128k-Instruct`	22.8 ± 35.0%	1	6	2	3
6	`Microsoft/Phi-3-Mini-128k-Instruct`	12.9 ± 39.2%	0	4	2	2
6	`Microsoft/Phi-3.5-Mini-128k-Instruct`	12.9 ± 39.2%	0	4	1	3