Non ascii chars in train file
Hi I was training the redshift for an input with some non ascii characters and I encountered errors I passed errors by replacing them but my goal is to train it for persian data and it will surely encounter with errors I heared about some solution like transliterals but i know nothing about I want to khow is that the best solution or you suggest better solutions? thanks
Hi,
I haven't tried this just yet, but first: are you sure you've decoded the text into bytes correctly before you pass it to the parser?
(Skip this if you know it, but: unicode is "serialized" into a byte-stream by the method unicode.encode(
I made no change to input text file do you mean to encode input file that contains train data and then pass it to script or edit your code to encode file after read? (I hope I could understand what you meant well)
Hi,
Sorry to leave you hanging out such a simple problem.
It turns out that I wasn't decoding the text into bytes properly in my train.py and parse.py scripts, as the files I've been running my experiments on have all been ASCII, and I'm using Python 2.
I've pushed a quick patch to the "develop" branch for the train.py and parse.py scripts, but I still haven't tested this for you unfortunately. I thought I'd get this out now, rather than waiting longer for time to do it properly.
I'll be returning to development on this project in about a month --- at the moment I'm finishing a tokenizer and lexicon, which will also improve unicode support for the parser and tagger. I'll then clean up the parser and finally write documentation.
So: checkout the branch "develop", try now, and let me know how you go.
Hi again special thanks for your attention I checked out branch develop (maybe) and tried to use it but I encountered lots of errors for modules first for index.lexicon and then for perception and I'm not sure about my checkout so I think maybe it is my mistake so if you could check it your self it would be very good or help me about the modules problem I solved index.lexicon issue by $ pip install index and I am not sure about it but the error passed lots of thanks again
Did you recompile?
fab clean make
No I cloned again and done every thing in ReadMe again
Okay well, apart from some OSX-specific installation problems (grr), cloning, and checking out develop works for me. Can you do git log and tell me what SHA hash your develop branch is on? It should say:
commit d4460a048116e79a9e635b47695c2a69d84fb20b
Merge: 1478a44 35e9db5
Author: Matthew Honnibal
Date: Fri Sep 5 21:10:37 2014 +0200
* Merge
Otherwise, maybe try running "git fetch" and then "git pull origin develop"? Or...something. I'm not sure how git synchs remote branches to you.
I did git log and it said what it should say but first of all the line from redshift.sentence import Input has problem and also other similar problems