Relation-Classification-using-Bidirectional-LSTM-Tree icon indicating copy to clipboard operation
Relation-Classification-using-Bidirectional-LSTM-Tree copied to clipboard

Can we use our own data set to train the models and predict our own test set?

Open xinxu1018 opened this issue 6 years ago • 15 comments

Can we use our own data set to train the models and predict our own test set?

xinxu1018 avatar Oct 17 '18 01:10 xinxu1018

You can, just make sure the input format remains the same.

On Wed, Oct 17, 2018 at 7:24 AM xinxu1018 [email protected] wrote:

Can we use our own data set to train the models and predict our own test set?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Sshanu/Relation-Classification/issues/14, or mute the thread https://github.com/notifications/unsubscribe-auth/APvEHzSOPblJDTxAoi1H6gAbCHnDKeb7ks5ulo3DgaJpZM4Xi-p_ .

Sshanu avatar Oct 17 '18 09:10 Sshanu

@Sshanu Thanks so much for your quick response! Please allow me to ask one more question. Since I am using word embeddings trained over my specific corpus instead of your given Glove embeddings, how can I get my embeddings in the same format with the Glove embedding file you provided in the data folder and use it in your designed LSTM model?

All the best!

xinxu1018 avatar Oct 17 '18 15:10 xinxu1018

I first extracted words and stored it in a list named as vocab, then extracted word embedding and stored it in a numpy array. If 2nd word in vocab is "the", the 2nd row in numpy array will have word embedding corresponding to "the". I then saved both vocab and word embedding numpy array using pickle. So, you can create a similar array and vocab, or you can change the code to load embeddings.

On Wed, Oct 17, 2018 at 9:14 PM xinxu1018 [email protected] wrote:

@Sshanu https://github.com/Sshanu Thanks so much for your quick response! Please allow me to ask one more question. Since I am using word embeddings trained over my specific corpus instead of your given Glove embeddings, how can I get my embeddings in the same format with the Glove embedding file you provided in the data folder and use it in your designed LSTM model?

All the best!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sshanu/Relation-Classification/issues/14#issuecomment-430681418, or mute the thread https://github.com/notifications/unsubscribe-auth/APvEH5M7hwIaEYdC0p093V4MPlrF7FFrks5ul1BmgaJpZM4Xi-p_ .

Sshanu avatar Oct 17 '18 17:10 Sshanu

@xinxu1018 That is so informative! Thanks a lot!

xinxu1018 avatar Oct 17 '18 19:10 xinxu1018

@Sshanu It works! Thanks a lot! Can I ask a follow-up question again? If I wanna classify relations between multi-word terms (in your case it is one-word term pairs), how can I preprocess the sentences before I go to the step of dependency path extraction? Do you have any suggestions? One way I am considering is to connect every word within a multi-word term using underscores (like, "system configuration" to "system_configuration" ) and then treat them as a one-word term. Then follow your designed procedures. Not sure if it will work. Do you have any ideas?

Many thanks!

xinxu1018 avatar Oct 17 '18 19:10 xinxu1018

You can try this approach but its shortcoming is that you don't have word embedding for the multi-word term. Create a dependency tree, then choose entity from the two word which is below another one in the tree, then the information regarding the other word will be computed from the lstm -tree, and instead of only using features of lca, entity1, and entity2 from the lstm-tree for relation classification, also use the features of the other word from the lstm-tree.

I did not work on Relation Classification or any related field after this project, this project was my first in NLP, that's why I have very less knowledge in this relation classification or extraction.

On Thu, Oct 18, 2018 at 12:57 AM xinxu1018 [email protected] wrote:

@Sshanu https://github.com/Sshanu It works! Thanks a lot! Can I ask a follow-up question again? If I wanna classify relations between multi-word terms (in your case it is one-word term pairs), how can I preprocess the sentences before I go to the step of dependency path extraction? Do you have any suggestions? One way I am considering is to connect every word within a multi-word term using underscores (like, "system configuration" to "system_configuration" ) and then treat them as a one-word term. Then follow your designed procedures. Not sure if it will work. Do you have any ideas?

Many thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sshanu/Relation-Classification/issues/14#issuecomment-430757521, or mute the thread https://github.com/notifications/unsubscribe-auth/APvEH10qEJGzLGNtyWXIfXWIEnXIPJDxks5ul4SZgaJpZM4Xi-p_ .

Sshanu avatar Oct 18 '18 01:10 Sshanu

@Sshanu Thanks a lot! Hope you everything goes very well!

xinxu1018 avatar Oct 18 '18 03:10 xinxu1018

@Sshanu Could you please provide your word_embd_wiki file? I cannot find the embedding file in your given data folder. Thanks for you help!

Best,

xinxu1018 avatar Oct 18 '18 15:10 xinxu1018

My google drive is full, please share a folder with me, where I will upload the word_embed file.

On Thu, Oct 18, 2018 at 9:03 PM xinxu1018 [email protected] wrote:

@Sshanu https://github.com/Sshanu Could you please provide your word_embd_wiki file? I cannot find the embedding file in your given data folder. Thanks for you help!

Best,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sshanu/Relation-Classification/issues/14#issuecomment-431055793, or mute the thread https://github.com/notifications/unsubscribe-auth/APvEH44UwKs0jDTAByJd8PV4ORR6PXK2ks5umJ9SgaJpZM4Xi-p_ .

Sshanu avatar Oct 18 '18 16:10 Sshanu

@Sshanu How can I share a folder with you? What's your address? Sorry, I am new here!

xinxu1018 avatar Oct 18 '18 16:10 xinxu1018

@Sshanu Hi Sshanu, I just shared a google drive folder to the email you provided in your Github profile. Not sure am I doing right! Many thanks!

xinxu1018 avatar Oct 18 '18 16:10 xinxu1018

Oh, please share it with [email protected]

On Thu, Oct 18, 2018 at 9:43 PM xinxu1018 [email protected] wrote:

@Sshanu https://github.com/Sshanu Hi Sshanu, I just shared a google drive folder to the email you provided in your Github profile. Not sure am I doing right! Many thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sshanu/Relation-Classification/issues/14#issuecomment-431070355, or mute the thread https://github.com/notifications/unsubscribe-auth/APvEHwTaPKHc9kpPbTxI8xl0QvV6dsahks5umKi3gaJpZM4Xi-p_ .

Sshanu avatar Oct 19 '18 04:10 Sshanu

@Sshanu I have shared to your gmail. Please check and many thanks!

xinxu1018 avatar Oct 19 '18 04:10 xinxu1018

@Sshanu Hi Sshanu, got your shared file! You helped me a lot! I am just wondering do you have the original code that was used to separate embedding file into vocab and word_embedding arrays? Then I can generate my own trained embeddings into the format aligned with your designed method. Could you please share me the code? Thanks again!

xinxu1018 avatar Oct 19 '18 04:10 xinxu1018

I don't have the file, if you are having a problem in generating the exact file, then simply try storing the embeddings in numpy array and vocabulary in a list, my code will work afterward.

On Fri, Oct 19, 2018 at 10:17 AM xinxu1018 [email protected] wrote:

@Sshanu https://github.com/Sshanu Hi Sshanu, got your shared file! You helped me a lot! I am just wondering do you have the original code that was used to separate embedding file into vocab and word_embedding arrays? Then I can generate my own trained embeddings into the format aligned with your designed method. Could you please share me the code? Thanks again!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sshanu/Relation-Classification/issues/14#issuecomment-431243694, or mute the thread https://github.com/notifications/unsubscribe-auth/APvEH2QJJ7qTVg8HHwGhi8TqBQvRMP3Dks5umVlLgaJpZM4Xi-p_ .

Sshanu avatar Oct 19 '18 06:10 Sshanu