comment_parser
comment_parser copied to clipboard
Suggest add an option to ignore special encoding characters
Hi, this tool works well in many cases. But I found two problems.
- Encoding problem
If a file contains other encoding characters, e.g., Chinese characters and ½, an exception will occur in extract_comments method.
I added "errors='ignore'" in the following statement on my local computer, and it can ignore the above special characters and continue to extract the rest characters of a comment.
def extract_comments(filename, mime=None):
with open(filename, 'r', errors='ignore') as code:
So I think we can provide this option to users and let them determine to ignore or not.
- Complex string
The tool throws an exception when parser this java file. I found the cause may be the complex string in line 99.
Thanks for your tool, it helps me a lot. Hope better~
If a file contains other encoding characters, e.g., Chinese characters and ½, an exception will occur in extract_comments method.
Do you have an example to reproduce this? I played around with some Chinese characters and everything worked as it should; including the Server.java you linked.
The tool throws an exception when parser this java file. I found the cause may be the complex string in line 99.
Thanks for pointing this out, I've fixed this yesterday in #26.
$ wget https://raw.githubusercontent.com/88250/symphony/master/src/main/java/org/b3log/symphony/Server.java
$ python3
>> from comment_parser import comment_parser
>> len(comment_parser.extract_comments('Server.java', 'text/x-java'))
7
#include "stdio.h"
int main(char** argv, int argc) {
// Prints ½,
printf("½\n");
// Prints 你好,世界
printf("你好,世界\n");
return 0;
}
$ python3 -m comment_parser.comment_parser test.c
Prints ½,
Prints 你好,世界
I got some Unicode Error a few weeks back while executing it on Linux, but didn't in windows. I used encoding='utf-8' format while opening a .java file . But didn't solve the issue either.