ResumeParser
ResumeParser copied to clipboard
Parse Chinese cvs
This is a good project, while it can not deal with Chinese resumes. do you have any ideas how to make full use of Chinese parser to parse Chinese resumes?
Thanks.. The parser relies heavily on the structure of English grammar and its usage of verbs, nouns and adjectives. I think Chinese is fundamentally a different language and I don't know its details to give you pointers. Sorry.
Thank you all the same! I will leaf through your code and make a study how to change your code to fit the Chinese resumes. Any good result I will share with you here
much appreciated..
btw.. I developed the code for a competition; so I did not bother with much documentation. Ping me if you have questions.
Hi,
I need to add more fields like "PERSONAL PROFILE", "Date of Birth" . kindly help.
Hi, what do you mean? add the field in Chinese? (the PERSONAL PROFILE means “个人档案” or "个人简介" “Date of Birth” means 出生日期) or add the filed to program? liang
On Tue, Oct 6, 2015 at 3:47 PM, ManuRajS [email protected] wrote:
Hi,
I need to add more fields like "PERSONAL PROFILE", "Date of Birth" . kindly help.
— Reply to this email directly or view it on GitHub https://github.com/antonydeepak/ResumeParser/issues/2#issuecomment-145771170 .
Name: Liang Tian (田亮) Lab: Natural Language Processing& Portuguese Chinese Machine Translation Laboratory (澳门大学自然语言处理与中葡机器翻译实验室NLP2CT) Email: [email protected] Phone: (+853)62059782
(+86)18676421801
Hi Antony Deepak, I just tried your solution. It works really well. I never thought GATE is so powerful. I have never worked on GATE, I want to start learning any pointers would be really helpful. Also, what are the obvious steps moving forward to improve the accuracy here ?
Can you explain me your approach, how did you build it, considering me a newbie in GATE but I have enough background on NLP and ML.
Hello - Yes GATE provides a lot of language tools on which you can build on. I wrote the code in like 4 days for a contest, so it is not neat but my approach in general is to use all the annotations (nouns, adjectives, adverbs, etc..) provided by GATE and use them to parse out text. I basically thought about how people write English grammar while writing resume and wrote JAPE rules to extract that piece of text. For ex, if you want to find section heading, you know for a fact that people use title case and use more adjectives and nouns. So, I wrote rules for that and ran it through different resumes and tweaked the grammar from my finding.
There are lots of ways to improve it. One approach would be to use machine learning to find patterns and use the found patterns to improve grammar. You can also take one step further to write a tool that would let people correct wrongly identified sections and hence machine learn through mistakes etc..
Let me know if you want to know about anything specific in code. I wanted to work more on this but just don't have much time. :)
Thanks Antony
On Wed, Nov 4, 2015 at 10:15 PM, Ashutosh Trivedi [email protected] wrote:
Hi Antony Deepak, I just tried your solution. It works really well. I never thought GATE is so powerful. I have never worked on GATE, I want to start learning any pointers would be really helpful. Also, what are the obvious steps moving forward to improve the accuracy here ?
Can you explain me your approach, how did you build it, considering me a newbie in GATE but I have enough background on NLP and ML.
— Reply to this email directly or view it on GitHub https://github.com/antonydeepak/ResumeParser/issues/2#issuecomment-153966694 .
Antony
Thank you. Really nice approach, I guess I have to move deeper into how to write JAPE rules, and how to annotate using Annie. I would love to work on it (initially I though I would use computer vision to identify bold fields and map them back to some vocabulary to standardize the fields) I would have to research more on comp. vision.
Once I'll get my hands on GATE I'll try to push some improvements. Thanks for making it open source. really appreciate :)
Regards, Ashutosh
I am more interested in identifying information pattern on docs and extract in standard format, such as identifying a product information on e-commerce pages using some model. Since every webpage is different like every Resume, but serves same purpose to convey some kind of information (only the organization is different).
I did not optimize the program for handling multiple resumes because I created this project as a hack, however you can accomplish the same using Powershell. It could be slow, but it can get your job done.
The PS statement below assumes that you want all the files processed under the folder "UnitTests". Feel free to modify the statement.
ls .\UnitTests\ | % {java -cp '.\bin*;..\GATEFiles\lib*;..\GATEFILES\bin\gate.jar;.\lib*' code4goal.antony.resumeparser.ResumeParserProgram $.fullname ([io.path]::GetFileNameWithoutExtension($.name)+".json")}
Yes, if the parser could not find certain information it will not output it. Say, if it could not find information about address it would not output it. You can modify this behavior in ResumeParserProgram.java.
I think you may have to write custom script to convert .JSON to CSV. Try digging around some powershell cmdlets. you may find something.
On Thu, Mar 10, 2016 at 11:23 AM, bravedream [email protected] wrote:
Hello Antony. Thanks for the great script. I wonder if there is way to convert multiple (say 300) resumes at the same time in CLI, rather than one by one? Also, for each resume, the parsed field names do not align (one has education, name, gender and the next has gender, skills, education). Is there a way to solve this or do I have to do it in excel VBA when converting JSON to csv? Thanks!
— Reply to this email directly or view it on GitHub https://github.com/antonydeepak/ResumeParser/issues/2#issuecomment-195008771 .
Antony
Thanks Antony. The script worked for volume conversion. I did not have luck in finding the right JSON->CSV converter for "resume" type of data. Lots of them have been generic converter, which does not work well here. But I appreciate all your help so far! If you come across some good JSON converter, please let me know.
Thanks!
Xiang Ji
On Sat, Mar 12, 2016 at 3:06 AM, Antony Deepak Thomas < [email protected]> wrote:
I did not optimize the program for handling multiple resumes because I created this project as a hack, however you can accomplish the same using Powershell. It could be slow, but it can get your job done.
The PS statement below assumes that you want all the files processed under the folder "UnitTests". Feel free to modify the statement.
ls .\UnitTests\ | % {java -cp '.\bin*;..\GATEFiles\lib*;..\GATEFILES\bin\gate.jar;.\lib*' code4goal.antony.resumeparser.ResumeParserProgram $.fullname ([io.path]::GetFileNameWithoutExtension($.name)+".json")}
Yes, if the parser could not find certain information it will not output it. Say, if it could not find information about address it would not output it. You can modify this behavior in ResumeParserProgram.java.
I think you may have to write custom script to convert .JSON to CSV. Try digging around some powershell cmdlets. you may find something.
On Thu, Mar 10, 2016 at 11:23 AM, bravedream [email protected] wrote:
Hello Antony. Thanks for the great script. I wonder if there is way to convert multiple (say 300) resumes at the same time in CLI, rather than one by one? Also, for each resume, the parsed field names do not align (one has education, name, gender and the next has gender, skills, education). Is there a way to solve this or do I have to do it in excel VBA when converting JSON to csv? Thanks!
— Reply to this email directly or view it on GitHub < https://github.com/antonydeepak/ResumeParser/issues/2#issuecomment-195008771
.
Antony
— Reply to this email directly or view it on GitHub https://github.com/antonydeepak/ResumeParser/issues/2#issuecomment-195694900 .