human-in-the-loop-machine-learning-tool-tornado icon indicating copy to clipboard operation
human-in-the-loop-machine-learning-tool-tornado copied to clipboard

NLP Problems

Open nbogda opened this issue 4 years ago • 1 comments

I would like to label some text strings with a binary label. My data is a CSV file with two columns, one with the string of text, and another with human labels (for ground-truth purposes). The text fields are pretty long, with some reaching 4,000 characters. The file has 6,281 rows. Whenever I try to upload the CSV, I get the following error:

image

I figured it might have been an encoding problem, so I changed all string encoding in the file to UTF-8 and uploaded that version instead. Whenever I upload the UTF-8 version it hangs on "processing" for a long time, and opening the file reveals the image below. This is the first row of the data, truncated at 83 characters, and repeating 12 times. However, this particular string only appears in the data set twice.

image

I tried shortening the data set to only 50 rows and got the same behavior as above. Then I tried to shorten the actual text string to 50 characters because I figured it might be an issue with the string's length. The result of uploading the full file with all fields truncated at 50 characters results in the behavior below:

image

Then I tried shortening the text even more, to 10 characters, and found that it managed to upload the file! However, it is still stuck in "processing". I also discovered that the upload with only 10 characters works for both the original data and the UTF-8 encoded data, but longer text strings will throw the error shown in the first image in this issue.

image

My question is, is there a way to do NLP with the long text strings? Is there a limit on how long the text strings can be? Thanks in advance.

nbogda avatar Apr 24 '20 15:04 nbogda

You need to upgrade Tornado to the last version by making a git pull. NLP data should be 1 column CSV that contais text. You can have a look on https://www.youtube.com/watch?v=xcX-95iGKxY for more details.

slrbl avatar Apr 28 '20 18:04 slrbl