database
database copied to clipboard
Add Mahan Kosh dictionary / reference
Hi Guys, Firstly, keep up the good work and its amazing you have an open source project related to Gurbani. Waheguru hor vi kirpa karey..
Is there any interest in adding Mahankosh by Kahn Singh Nabha?
Yes there is interest, we do have the database file for Mahan Kosh.
The problem is that there are certain Gurmukhi characters created just for Mahan Kosh and we haven't spent much time on how to display them in a proper manner that follows standards.
Related to #35
may I know what kind of characters. I was also doing the same but stopped because I can't read and convert Urdu. Now I got someone and will start it soon.
@harpreetkhalsagtbit these characters are given in the Mahan Kosh, in the beginning introduction. They represent special Hindi and Urdu characters/sounds.
Are they written in gurmukhi though or what? I'm confused with why we can't convert these characters to their respective languages or all the characters to some unicode format
Thanks for the picture. It looks like letters from different languages are expressed in some kind of custom akhar.
Could someone replace the custom akhars found in Mahan Kosh with the relevant language's unicode?
Could also the gurmukhi be converted to unicode? Or are there any sequences which break (I think you know which examples I'm thinking of)?
The only issue I see is 1st column, 3rd row for two hindi letters being expressed with one custom akhar.
Are they written in gurmukhi though or what? I'm confused with why we can't convert these characters to their respective languages or all the characters to some unicode format
This is what I intended to do.
Could someone replace the custom akhars found in Mahan Kosh with the relevant language's unicode?
I actually started with this, converting them into unicode 6 years back. Now I want to start it again.
https://github.com/harpreetkhalsagtbit/MahankoshUnicode
Bingo: I figured this out - 'कष्प', "कश्प" . But these half chars cant be used alone.
I guess 1st col 4th row is this character:
कळ्प
@bhajneet the Database is already in Unicode, but has non-standard out of spec ways to display these chars. I am not sure how to properly display them in Unicode.
https://unicode.org/L2/L2006/06030-gurmukhi.pdf https://unicode.org/L2/L2006/06233-gurmukhi.pdf
Some of these were requested to be encoded, I will look into what was recommend.
I honestly think we can just map it to the Hindi characters directly. But Urdu might be difficult.
@harpreetkhalsagtbit The Database file is from the same team that made the digital Mahan Kosh pdf. Not sure if decoding is needed.
Depends https://github.com/GurbaniNow/gurmukhi-fonts/issues/8
@harpreetkhalsagtbit The Database file is from the same team that made the digital Mahan Kosh pdf. Not sure if decoding is needed.
Can I have that? If it is not some proprietary thing.
@harpreetkhalsagtbit The Database file is from the same team that made the digital Mahan Kosh pdf. Not sure if decoding is needed.
Can I have that? If it is not some proprietary thing.
It will eventually be added to the Shabad OS DB, or as a add-on DB.
Originally Posted in Slack by @Harjot1Singh
Considerations
- [x] One definition to many words
- [x] One word to many definitions
- [x] Dekho smart-linking
- [ ] Font research
Structure
{
An: {
definitions: ['definitionsArray1', 'definitionsArray2'],
synonyms: []
},
Har: {
definitions: ['definitionsArray4'],
synonyms: ['Hari', 'Parmaatma']
},
}
Storage
dictionaries/Mahan Kosh.json
etc
Smart Linking
Automatically pick up "dekho" references and add the correct destination as a separate field in the database
Output Transformation
dictionary
table, with dictionary_source_id
that references to another table, dictionary_sources
@Harjot1Singh points for instigating? Not for adding any content.
I honestly think we can just map it to the Hindi characters directly. But Urdu might be difficult.
Looking back on this, this is not possible. Mahan Kosh has Hindu and Urdu words, so their charmaps cannot be used (and it would still break the standard).
The custom characters in mahan kosh are not according to the standards, nor will be for the far foreseeable future. Meeting standards should not be a goal, since it's unattainable.
Can the Unicode characters for each language be used? Perhaps using them together breaks rendering of particular words/phrases, but I don't see evidence of that yet.
On Mon, Jun 22, 2020, 00:45 Sarabveer Singh [email protected] wrote:
I honestly think we can just map it to the Hindi characters directly. But Urdu might be difficult.
Looking back on this, this is not possible. Mahan Kosh has Hindu and Urdu words, so their charmaps cannot be used (and it would still break the standard).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ShabadOS/database/issues/867#issuecomment-647272622, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADLZ3B2N7OTNGAKVYQ4G6O3RX3OXLANCNFSM4IL7ZUUA .
Can the Unicode characters for each language be used? Perhaps using them together breaks rendering of particular words/phrases, but I don't see evidence of that yet.
Yes it would break because Hindi and Urdu is used in the dictionary.