Try to improve mishkal by profiling source code, after profiling code, we noticed a time consuming in Qalsadi, we improve Qalsadi in 2023, as 0.5.0 version,

Jul 19 '25 15:07 linuxscout

📊 Mishkal Diacritization Evaluation Report

🗓️ Date: 19 جويلية, 2025 – CET

🧪 Evaluation on `samples/vocalized/aljazeera.txt`

🔹 Using Qalsadi 0.4.5

Function calls: 426,394,179 (426,391,468 primitive)
Time: 225.037 seconds

ID	Fully Correct	Strip Correct	Fully WER	Strip WER	LER	Total	Line Fully Correct	Line Strip Correct
1447	76.98%	89.80%	9601	4256	12168	41716	90.00%	90.00%
1448	76.98%	89.79%	9614	4264	12187	41756	67.50%	80.00%
1449	76.98%	89.79%	9614	4264	12187	41756	—	—

🔹 Using Qalsadi 0.5.0 (speedup) – Improved Version

Function calls: 333,277,281 (333,274,551 primitive)
Time: 173.994 seconds

ID	Fully Correct	Strip Correct	Fully WER	Strip WER	LER	Total	Line Fully Correct	Line Strip Correct
1447	77.62%	90.38%	9336	4011	11792	41716	90.00%	90.00%
1448	77.61%	90.38%	9348	4019	11809	41756	70.00%	80.00%
1449	77.61%	90.38%	9348	4019	11809	41756	—	—

🧪 Evaluation on `samples/vocalized/rndlines.txt`

🔹 Using Qalsadi 0.5.0 (speedup)

Function calls: 7,342,645,429 (7,342,642,699 primitive)
Time: 3428.477 seconds

ID	Fully Correct	Strip Correct	Fully WER	Strip WER	LER	Total	Line Fully Correct	Line Strip Correct
9996	73.88%	87.74%	252,311	127,245	357,698	1,037,636	77.59%	87.93%
9997	73.88%	87.74%	252,337	127,259	357,736	1,037,746	76.36%	87.27%

🔹 Using Qalsadi 0.4.5

Function calls: 10,561,428,832 (10,561,426,121 primitive)
Time: 5091.545 seconds

ID	Fully Correct	Strip Correct	Fully WER	Strip WER	LER	Total	Line Fully Correct	Line Strip Correct
9996	71.98%	86.77%	272,017	137,320	380,440	1,037,636	74.14%	84.48%
9997	71.98%	86.77%	272,044	137,335	380,479	1,037,746	75.45%	86.36%

✅ Summary

Accuracy: Qalsadi 0.5.0 outperforms 0.4.5 in all metrics across both datasets.
Speed: Execution time reduced by 23%–33%.
Efficiency: Function calls significantly reduced in 0.5.0, indicating better optimization.

Jul 19 '25 15:07 linuxscout

Analysis

Metric	Description
Fully Correct	% of tokens where full diacritization matches exactly.
Strip Correct	% of tokens correct ignoring short vowels (i.e., stripped tashkeel).
WER	Word Error Rate – number of incorrect tokens.
LER	Letter Error Rate – more fine-grained than WER.
Total	Total number of characters or tokens.
Function Calls	Profiling data (used for performance analysis).
Time (s)	Total runtime in seconds.

✅ Results Summary & Comparison

📰 Dataset: `aljazeera.txt`

Metric	Qalsadi 0.4.5	Qalsadi 0.5.0	Improvement
Fully Correct (%)	76.98%	77.61%	✅ +0.63%
Strip Correct (%)	89.79%	90.38%	✅ +0.59%
Fully WER	9614	9348	✅ lower
Strip WER	4264	4019	✅ lower
LER	12187	11809	✅ lower
Total Tokens	41756	41756	⏸️ same
Runtime (sec)	225.037	173.994	✅ faster
Function Calls	426M	333M	✅ fewer

🧠 Insight: Qalsadi 0.5.0 provides both higher accuracy and better performance on aljazeera.txt, showing it is a clear improvement.

📄 Dataset: `rndlines.txt`

Metric	Qalsadi 0.4.5	Qalsadi 0.5.0	Improvement
Fully Correct (%)	71.98%	73.88%	✅ +1.9%
Strip Correct (%)	86.77%	87.74%	✅ +0.97%
Fully WER	272044	252337	✅ lower
Strip WER	137335	127259	✅ lower
LER	380479	357736	✅ lower
Total Tokens	1037746	1037746	⏸️ same
Runtime (sec)	5091.545	3428.477	✅ faster
Function Calls	10.56B	7.34B	✅ fewer

🧠 Insight: Qalsadi 0.5.0 shows clear improvements in both speed and quality, particularly reducing word and letter errors.

🧾 Overall Conclusion

Aspect	Result
Accuracy	✅ Qalsadi 0.5.0 improved `Fully Correct` by 0.6–1.9%.
Speed	✅ Reduced processing time by ~23% on `aljazeera.txt` and ~33% on `rndlines.txt`.
Efficiency	✅ Function calls reduced by 20–30%, indicating better optimization.
Stability	✅ Results are consistent across datasets.

🔧 Recommendations

Adopt Qalsadi 0.5.0 as default — it’s better in both accuracy and speed.
Visualize metrics over versions to track progress.
Break down WER/LER by error types for fine-grained evaluation.
Add benchmark suite to the repo to automate these comparisons.

Jul 19 '25 15:07 linuxscout

Improve tashkeel speed

📊 Mishkal Diacritization Evaluation Report

🗓️ Date: 19 جويلية, 2025 – CET

🧪 Evaluation on samples/vocalized/aljazeera.txt

🔹 Using Qalsadi 0.4.5

🔹 Using Qalsadi 0.5.0 (speedup) – Improved Version

🧪 Evaluation on samples/vocalized/rndlines.txt

🔹 Using Qalsadi 0.5.0 (speedup)

🔹 Using Qalsadi 0.4.5

✅ Summary

Analysis

✅ Results Summary & Comparison

📰 Dataset: aljazeera.txt

📄 Dataset: rndlines.txt

🧾 Overall Conclusion

🔧 Recommendations

🧪 Evaluation on `samples/vocalized/aljazeera.txt`

🧪 Evaluation on `samples/vocalized/rndlines.txt`

📰 Dataset: `aljazeera.txt`

📄 Dataset: `rndlines.txt`