rasa icon indicating copy to clipboard operation
rasa copied to clipboard

Watson Training Data importer

Open souvikg10 opened this issue 4 years ago • 28 comments

Proposed changes:

  • Added a watson data importer that takes exported watson training for and converts them into rasa format for training an NLU model Currently accepted transformations
  • Intents
  • Entities
  • Entity Synonyms

Links to Issue #1154

Status (please check what you already did):

  • [x] added some tests for the functionality
  • [x] updated the documentation
  • [x] updated the changelog (please check changelog for instructions)
  • [x] reformat files using black (please check Readme for instructions)

souvikg10 avatar Feb 10 '21 18:02 souvikg10

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Feb 10 '21 18:02 CLAassistant

Thanks for opening a draft pull request 🚀If you have any questions, you can direct them to @alopez ✨

sara-tagger avatar Feb 11 '21 07:02 sara-tagger

@akelad , @tmbo - I am not sure whether the reviewer is active, it automatically assigned.

Thanks for checking. I may be missing some prep for a PR but happy to add whatever is missing.

souvikg10 avatar Feb 15 '21 16:02 souvikg10

Hey @souvikg10

Thanks for working on this! I will review the PR this week 👍

degiz avatar Mar 09 '21 13:03 degiz

@degiz - any update on this?

thanks

souvikg10 avatar Mar 30 '21 07:03 souvikg10

👋 Thanks for your contribution. The failing step wait for docs test in the CI is an error and it is fixed. If you update the PR with the base branch, the wait for docs test step should be green. 🙌🏻

HotThoughts avatar Apr 08 '21 18:04 HotThoughts

@degiz Hi, I added some updates to use .get() for the dictionaries to handle better the loops if keys dont exist. Could you review please.

Thanks

souvikg10 avatar Apr 23 '21 08:04 souvikg10

@degiz - i made the changes as requested. Thanks for the feedback.

souvikg10 avatar May 10 '21 09:05 souvikg10

Hey @souvikg10 Thanks for addressing the comments! 🚀 You can re-request the review if the changes are ready 👍

degiz avatar May 11 '21 11:05 degiz

Hi @degiz , not sure if i did it right, ,but i think i requested review.. can you please check and let me know if there is anything else to add on this?

souvikg10 avatar Jun 01 '21 14:06 souvikg10

Hey @souvikg10

Could you please merge Rasa main into your branch? Let's see if the tests are passing now 🤔

degiz avatar Jun 01 '21 14:06 degiz

I rebased and fixed the failling tests. i hope :) @degiz - let's see if this passes

souvikg10 avatar Jun 04 '21 12:06 souvikg10

All tests are passing from my side. @degiz . there is just one about unused imports but i am not sure why

souvikg10 avatar Jun 08 '21 07:06 souvikg10

rasa/shared/nlu/training_data/formats/watson.py:4:1: F401 'rasa.shared.nlu.constants.VALID_FEATURE_TYPES' imported but unused

You need to remove the unsed import there.

rasa/shared/nlu/training_data/formats/init.py:10:1: F401 'rasa.shared.nlu.training_data.formats.rasa.RasaReader' imported but unused

Please put # noqa: F401 on the same line as import

degiz avatar Jun 09 '21 14:06 degiz

@wochinge - updated changelog as well.

souvikg10 avatar Jun 17 '21 11:06 souvikg10

@wochinge yeah it makes sense to convert the data first specially I only wrote the part for watson NLU data and not the conversation part which i don't even know how to integrate just. could be something i can add in the future, convert watson flows into rules in Rasa.

I actually followed the Dialogflow implementation. I suppose there will be one for Dialogflow training data converter as well. I can work with that. no problem.

I will let you know if I am stuck at some point.

souvikg10 avatar Jun 18 '21 08:06 souvikg10

could be something i can add in the future, convert watson flows into rules in Rasa.

That sounds like a super cool idea!

suppose there will be one for Dialogflow training data converter as well.

Actually there isn't 😬 For historic reasons we treat DialogFlow etc. as first class citizen training data (cc @TyDunn ). It's done by this part of the code.

wochinge avatar Jun 18 '21 08:06 wochinge

@wochinge - sorry for the delay.

I have added the ref to create a watson training data converter. Could you take a look. I will remove the other implementation if this is all okay. wrote some tests but i don't know where to add the docs..

I had to add a param in the CLI to indicate the source type as watson using -s arg. let me know if that is okay...

souvikg10 avatar Jul 16 '21 13:07 souvikg10

Thanks @souvikg10 I'm just heading out and on vacation next week: Will have a look the week after 🤞🏻

wochinge avatar Jul 16 '21 15:07 wochinge

Sorry @wochinge I have been taking a loong time to get this done 😢 Time constraints

souvikg10 avatar Aug 09 '21 15:08 souvikg10

@souvikg10 How is it going with this? Any way I can help you out?

wochinge avatar Oct 18 '21 08:10 wochinge

@wochinge - sorry, i thought the latest build was successful, because i refactored and removed the older implementation out. Just need to fix some tests it seems. I will do it now. Was off for a couple of weeks.

souvikg10 avatar Oct 19 '21 07:10 souvikg10

@wochinge - this is becoming too big for me 😢 I don't understand some of the failing tests. the implementation is done but could you please take a look at the failing tests. I will try my best to get them sorted but i am under a lot of workload recently. sorry for this dragging this PR for a long time

souvikg10 avatar Nov 08 '21 08:11 souvikg10

@wochinge - i have some code complexity issues here. Cognitive complexity is too high ☹️ I am going to try to rework some of it

souvikg10 avatar Dec 02 '21 10:12 souvikg10

@souvikg10 Code complexity is not a required check and we haven't adjusted the code climate setting for this setting yet to the codebase. Unless it's an easy change / a very obvious thing, I'd ignore it.

I'll have a look at your changes right away. Sorry having you wait so long!

wochinge avatar Dec 03 '21 14:12 wochinge

@souvikg10 Do you still plan on completing this PR? If you don't have time for it right now, I'd suggest we close it for now.

indam23 avatar Apr 22 '22 11:04 indam23

@melindaloubser1 - Unfortunately i won't have time on this until summer when things are a little relaxed :( there has been a lot of review on this including a rewrite few months back. I didn't get the time afterwards to work on the tests as tobias reviewed. my apologies

souvikg10 avatar Apr 25 '22 07:04 souvikg10

All good, just wanted to check if you needed support on it - stalebot might close it in the future, but you can reopen it if you have capacity later in the summer.

indam23 avatar Apr 25 '22 09:04 indam23