database icon indicating copy to clipboard operation
database copied to clipboard

Add Mahan Kosh dictionary / reference

Open dakkusingh opened this issue 4 years ago • 22 comments

Hi Guys, Firstly, keep up the good work and its amazing you have an open source project related to Gurbani. Waheguru hor vi kirpa karey..

Is there any interest in adding Mahankosh by Kahn Singh Nabha?

dakkusingh avatar Aug 15 '19 16:08 dakkusingh

Yes there is interest, we do have the database file for Mahan Kosh.

The problem is that there are certain Gurmukhi characters created just for Mahan Kosh and we haven't spent much time on how to display them in a proper manner that follows standards.

sarabveer avatar Aug 16 '19 02:08 sarabveer

Related to #35

sarabveer avatar Oct 17 '19 17:10 sarabveer

may I know what kind of characters. I was also doing the same but stopped because I can't read and convert Urdu. Now I got someone and will start it soon.

harpreetkhalsagtbit avatar Jan 31 '20 11:01 harpreetkhalsagtbit

@harpreetkhalsagtbit these characters are given in the Mahan Kosh, in the beginning introduction. They represent special Hindi and Urdu characters/sounds.

sarabveer avatar Jan 31 '20 16:01 sarabveer

Are they written in gurmukhi though or what? I'm confused with why we can't convert these characters to their respective languages or all the characters to some unicode format

bhajneet avatar Jan 31 '20 16:01 bhajneet

sarabveer avatar Jan 31 '20 17:01 sarabveer

Thanks for the picture. It looks like letters from different languages are expressed in some kind of custom akhar.

Could someone replace the custom akhars found in Mahan Kosh with the relevant language's unicode?

Could also the gurmukhi be converted to unicode? Or are there any sequences which break (I think you know which examples I'm thinking of)?

bhajneet avatar Jan 31 '20 17:01 bhajneet

The only issue I see is 1st column, 3rd row for two hindi letters being expressed with one custom akhar.

bhajneet avatar Jan 31 '20 17:01 bhajneet

Are they written in gurmukhi though or what? I'm confused with why we can't convert these characters to their respective languages or all the characters to some unicode format

This is what I intended to do.

harpreetkhalsagtbit avatar Jan 31 '20 17:01 harpreetkhalsagtbit

Could someone replace the custom akhars found in Mahan Kosh with the relevant language's unicode?

I actually started with this, converting them into unicode 6 years back. Now I want to start it again.

https://github.com/harpreetkhalsagtbit/MahankoshUnicode

harpreetkhalsagtbit avatar Jan 31 '20 17:01 harpreetkhalsagtbit

Bingo: I figured this out - 'कष्प', "कश्प" . But these half chars cant be used alone.

harpreetkhalsagtbit avatar Jan 31 '20 17:01 harpreetkhalsagtbit

I guess 1st col 4th row is this character:

कळ्प

harpreetkhalsagtbit avatar Jan 31 '20 17:01 harpreetkhalsagtbit

@bhajneet the Database is already in Unicode, but has non-standard out of spec ways to display these chars. I am not sure how to properly display them in Unicode.

https://unicode.org/L2/L2006/06030-gurmukhi.pdf https://unicode.org/L2/L2006/06233-gurmukhi.pdf

Some of these were requested to be encoded, I will look into what was recommend.

I honestly think we can just map it to the Hindi characters directly. But Urdu might be difficult.

sarabveer avatar Jan 31 '20 18:01 sarabveer

@harpreetkhalsagtbit The Database file is from the same team that made the digital Mahan Kosh pdf. Not sure if decoding is needed.

sarabveer avatar Jan 31 '20 18:01 sarabveer

Depends https://github.com/GurbaniNow/gurmukhi-fonts/issues/8

sarabveer avatar Jan 31 '20 21:01 sarabveer

@harpreetkhalsagtbit The Database file is from the same team that made the digital Mahan Kosh pdf. Not sure if decoding is needed.

Can I have that? If it is not some proprietary thing.

harpreetkhalsagtbit avatar Feb 01 '20 13:02 harpreetkhalsagtbit

@harpreetkhalsagtbit The Database file is from the same team that made the digital Mahan Kosh pdf. Not sure if decoding is needed.

Can I have that? If it is not some proprietary thing.

It will eventually be added to the Shabad OS DB, or as a add-on DB.

sarabveer avatar Feb 02 '20 18:02 sarabveer

Originally Posted in Slack by @Harjot1Singh

Considerations

  • [x] One definition to many words
  • [x] One word to many definitions
  • [x] Dekho smart-linking
  • [ ] Font research

Structure

{
   An: {
       definitions: ['definitionsArray1', 'definitionsArray2'],
       synonyms: []
   },
   Har: {
       definitions: ['definitionsArray4'],
       synonyms: ['Hari', 'Parmaatma']
   },
}

Storage

dictionaries/Mahan Kosh.json etc

Smart Linking

Automatically pick up "dekho" references and add the correct destination as a separate field in the database

Output Transformation

dictionary table, with dictionary_source_id that references to another table, dictionary_sources

bhajneet avatar Feb 09 '20 20:02 bhajneet

@Harjot1Singh points for instigating? Not for adding any content.

bhajneet avatar Mar 06 '20 22:03 bhajneet

I honestly think we can just map it to the Hindi characters directly. But Urdu might be difficult.

Looking back on this, this is not possible. Mahan Kosh has Hindu and Urdu words, so their charmaps cannot be used (and it would still break the standard).

sarabveer avatar Jun 22 '20 04:06 sarabveer

The custom characters in mahan kosh are not according to the standards, nor will be for the far foreseeable future. Meeting standards should not be a goal, since it's unattainable.

Can the Unicode characters for each language be used? Perhaps using them together breaks rendering of particular words/phrases, but I don't see evidence of that yet.

On Mon, Jun 22, 2020, 00:45 Sarabveer Singh [email protected] wrote:

I honestly think we can just map it to the Hindi characters directly. But Urdu might be difficult.

Looking back on this, this is not possible. Mahan Kosh has Hindu and Urdu words, so their charmaps cannot be used (and it would still break the standard).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ShabadOS/database/issues/867#issuecomment-647272622, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADLZ3B2N7OTNGAKVYQ4G6O3RX3OXLANCNFSM4IL7ZUUA .

bhajneet avatar Jun 22 '20 06:06 bhajneet

Can the Unicode characters for each language be used? Perhaps using them together breaks rendering of particular words/phrases, but I don't see evidence of that yet.

Yes it would break because Hindi and Urdu is used in the dictionary.

sarabveer avatar Jun 22 '20 19:06 sarabveer