hypothesis-to-bullet icon indicating copy to clipboard operation
hypothesis-to-bullet copied to clipboard

Conversion of diacritics

Open jsmm opened this issue 5 years ago • 8 comments

I would like to let you know that this script doesn't convert diacritics correctly, such as the Spanish words generación, podrá, engañar, that come out to the clipboard as generaci√≥n, podr√≠a, enga√±ar.

Thank you for making this script available.

jsmm avatar Dec 31 '19 01:12 jsmm

It also does this for single quotes and some dashes. I think it has to do with the way HTML text entities are being converted.

lswank avatar Jan 01 '20 02:01 lswank

It looks like this is not an HTML problem, but a problem with the encoding on the command line. PBCopy seems to mangle the encoding. A good fix for this is to do the following

/Users/YOUR-USER/.nvm/versions/node/v13.5.0/bin/node ~/PATH-TO-PROJECT/hypothesis-to-bullet/index.mjs SOURCE-URL | iconv -t utf8 | pbcopy

lswank avatar Jan 04 '20 08:01 lswank

Thanks for the feedback. I did add decoding for entities in the twitter part. However, the issue above seems to be utf8 related. I haven't seen it - but I have this in my terminal exports - do you?

declare -x LC_ALL="en_US.UTF-8"
declare -x LC_CTYPE="UTF-8"

houshuang avatar Jan 04 '20 18:01 houshuang

I don't, but running locale in the terminal yields this

`LANG="en_US.UTF-8"

LC_COLLATE="en_US.UTF-8"

LC_CTYPE="en_US.UTF-8"

LC_MESSAGES="en_US.UTF-8"

LC_MONETARY="en_US.UTF-8"

LC_NUMERIC="en_US.UTF-8"

LC_TIME="en_US.UTF-8"

LC_ALL=`

The last one is blank, and may be the issue. I believe this is default on macOS 15.0, but I am not sure.

On Sun, Jan 05, 2020 at 3:57 AM, Stian Håklev < [email protected] > wrote:

Thanks for the feedback. I did add decoding for entities in the twitter part. However, the issue above seems to be utf8 related. I haven't seen it

  • but I have this in my terminal exports - do you?

declare -x LC_ALL="en_US.UTF-8" declare -x LC_CTYPE="UTF-8"

— You are receiving this because you commented. Reply to this email directly, view it on GitHub ( https://github.com/houshuang/hypothesis-to-bullet/issues/2?email_source=notifications&email_token=AAHUA2RMOBRHPV6DCSZ7PXLQ4DL3BA5CNFSM4KBRTV32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIC556I#issuecomment-570810105 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAHUA2UQADA6GEH5ZZ7U563Q4DL3BANCNFSM4KBRTV3Q ).

lswank avatar Jan 04 '20 19:01 lswank

(I'm also using Fish shell so there are many variables in play)

houshuang avatar Jan 04 '20 19:01 houshuang

If we can get a third person to confirm that my workaround is the path forward, we should just add it to the readme. It's been working beautifully for me.

On Sun, Jan 05, 2020 at 4:04 AM, Stian Håklev < [email protected] > wrote:

(I'm also using Fish shell so there are many variables in play)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub ( https://github.com/houshuang/hypothesis-to-bullet/issues/2?email_source=notifications&email_token=AAHUA2W7UDALOUXZXI4GAHDQ4DMSPA5CNFSM4KBRTV32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIC6CEQ#issuecomment-570810642 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAHUA2RRKW6TXIUP662754TQ4DMSPANCNFSM4KBRTV3Q ).

lswank avatar Jan 04 '20 19:01 lswank

Just go ahead and make a PR, I don't think it hurts at the very least :)

On Sat, Jan 4, 2020 at 8:06 PM Lorenzo Swank [email protected] wrote:

If we can get a third person to confirm that my workaround is the path forward, we should just add it to the readme. It's been working beautifully for me.

On Sun, Jan 05, 2020 at 4:04 AM, Stian Håklev < [email protected]

wrote:

(I'm also using Fish shell so there are many variables in play)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub (

https://github.com/houshuang/hypothesis-to-bullet/issues/2?email_source=notifications&email_token=AAHUA2W7UDALOUXZXI4GAHDQ4DMSPA5CNFSM4KBRTV32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIC6CEQ#issuecomment-570810642

) , or unsubscribe (

https://github.com/notifications/unsubscribe-auth/AAHUA2RRKW6TXIUP662754TQ4DMSPANCNFSM4KBRTV3Q

).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/houshuang/hypothesis-to-bullet/issues/2?email_source=notifications&email_token=AAAPBBZHJD24DMZO2IZTOKTQ4DMZRA5CNFSM4KBRTV32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIC6DWA#issuecomment-570810840, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPBB34ARD6ACZXWPQDCCLQ4DMZRANCNFSM4KBRTV3Q .

-- http://reganmian.net/blog -- Random Stuff that Matters

houshuang avatar Jan 04 '20 19:01 houshuang

I also had the problem that pbcopy was not encoding quotation marks correctly. I couldn't be bothered to learn that much about character encoding and copied a solution from here: http://hints.macworld.com/article.php?story=20081231012753422

Hence I changed the AppleScript to be like so:

do shell script "export __CF_USER_TEXT_ENCODING=0x1F5:0x8000100:0x8000100; /usr/local/bin/node ~/hypothesis-to-bullet/index.mjs " & theText & " and so on.

Thanks for making this! 😎

coyotespike avatar Feb 18 '20 19:02 coyotespike