self-critical.pytorch icon indicating copy to clipboard operation
self-critical.pytorch copied to clipboard

Getting error when running the script make_bu_data.py on Python3

Open gsrivas4 opened this issue 4 years ago • 6 comments

I am getting error when running the script make_bu_data.py on Python3 environment. The script runs fine on Python2 environment. I get the error at this line - https://github.com/ruotianluo/self-critical.pytorch/blob/master/scripts/make_bu_data.py#L40. The error is _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?). I tried to change the code for Python3 but could not find the equivalent code in Python3.

gsrivas4 avatar Jul 13 '20 17:07 gsrivas4

have you tried: open(os.path.join(args.downloaded_feats, infile), "r+b") -> open(os.path.join(args.downloaded_feats, infile), "r") ?

ruotianluo avatar Jul 13 '20 17:07 ruotianluo

I tried that. This gave another error TypeError: memoryview: a bytes-like object is required, not 'str' on this line - https://github.com/ruotianluo/self-critical.pytorch/blob/master/scripts/make_bu_data.py#L44-L45.

gsrivas4 avatar Jul 13 '20 17:07 gsrivas4

Try base64.decodestring(item[field].encode('ascii')) or base64.decodestring(item[field].encode('utf-8'))

this may fix memoryview error./

ruotianluo avatar Jul 13 '20 17:07 ruotianluo

Both ascii and utf-8 encoding work. How would I know which one gives correct output files? I guess, one way to check is see check the evaluation results generated by both the encodings, and the one generating expected results should be the correct one, right?

gsrivas4 avatar Jul 13 '20 18:07 gsrivas4

that's interesting. you can check the size of item[field]. My guess was only one would work, because numpy would tell if the decoded bytes are valid or not.

Can you check the output of item[field] with different encodings, my guess is two encodings give the same result.

ruotianluo avatar Jul 13 '20 18:07 ruotianluo

I checked the output of item[field] for both the encodings, 'utf-8' and 'ascii', and they give the same result arrays.

gsrivas4 avatar Jul 13 '20 20:07 gsrivas4