rasa
rasa copied to clipboard
Watson Training Data importer
Proposed changes:
- Added a watson data importer that takes exported watson training for and converts them into rasa format for training an NLU model Currently accepted transformations
- Intents
- Entities
- Entity Synonyms
Links to Issue #1154
Status (please check what you already did):
Thanks for opening a draft pull request 🚀If you have any questions, you can direct them to @alopez ✨
@akelad , @tmbo - I am not sure whether the reviewer is active, it automatically assigned.
Thanks for checking. I may be missing some prep for a PR but happy to add whatever is missing.
Hey @souvikg10
Thanks for working on this! I will review the PR this week 👍
@degiz - any update on this?
thanks
👋 Thanks for your contribution. The failing step wait for docs test
in the CI is an error and it is fixed. If you update the PR with the base branch, the wait for docs test
step should be green. 🙌🏻
@degiz Hi, I added some updates to use .get() for the dictionaries to handle better the loops if keys dont exist. Could you review please.
Thanks
@degiz - i made the changes as requested. Thanks for the feedback.
Hey @souvikg10 Thanks for addressing the comments! 🚀 You can re-request the review if the changes are ready 👍
Hi @degiz , not sure if i did it right, ,but i think i requested review.. can you please check and let me know if there is anything else to add on this?
Hey @souvikg10
Could you please merge Rasa main
into your branch? Let's see if the tests are passing now 🤔
I rebased and fixed the failling tests. i hope :) @degiz - let's see if this passes
All tests are passing from my side. @degiz . there is just one about unused imports but i am not sure why
rasa/shared/nlu/training_data/formats/watson.py:4:1: F401 'rasa.shared.nlu.constants.VALID_FEATURE_TYPES' imported but unused
You need to remove the unsed import there.
rasa/shared/nlu/training_data/formats/init.py:10:1: F401 'rasa.shared.nlu.training_data.formats.rasa.RasaReader' imported but unused
Please put # noqa: F401
on the same line as import
@wochinge - updated changelog as well.
@wochinge yeah it makes sense to convert the data first specially I only wrote the part for watson NLU data and not the conversation part which i don't even know how to integrate just. could be something i can add in the future, convert watson flows into rules in Rasa.
I actually followed the Dialogflow implementation. I suppose there will be one for Dialogflow training data converter as well. I can work with that. no problem.
I will let you know if I am stuck at some point.
could be something i can add in the future, convert watson flows into rules in Rasa.
That sounds like a super cool idea!
suppose there will be one for Dialogflow training data converter as well.
Actually there isn't 😬 For historic reasons we treat DialogFlow etc. as first class citizen training data (cc @TyDunn ). It's done by this part of the code.
@wochinge - sorry for the delay.
I have added the ref to create a watson training data converter. Could you take a look. I will remove the other implementation if this is all okay. wrote some tests but i don't know where to add the docs..
I had to add a param in the CLI to indicate the source type as watson using -s
arg. let me know if that is okay...
Thanks @souvikg10 I'm just heading out and on vacation next week: Will have a look the week after 🤞🏻
Sorry @wochinge I have been taking a loong time to get this done 😢 Time constraints
@souvikg10 How is it going with this? Any way I can help you out?
@wochinge - sorry, i thought the latest build was successful, because i refactored and removed the older implementation out. Just need to fix some tests it seems. I will do it now. Was off for a couple of weeks.
@wochinge - this is becoming too big for me 😢 I don't understand some of the failing tests. the implementation is done but could you please take a look at the failing tests. I will try my best to get them sorted but i am under a lot of workload recently. sorry for this dragging this PR for a long time
@wochinge - i have some code complexity issues here. Cognitive complexity is too high ☹️ I am going to try to rework some of it
@souvikg10 Code complexity is not a required check and we haven't adjusted the code climate setting for this setting yet to the codebase. Unless it's an easy change / a very obvious thing, I'd ignore it.
I'll have a look at your changes right away. Sorry having you wait so long!
@souvikg10 Do you still plan on completing this PR? If you don't have time for it right now, I'd suggest we close it for now.
@melindaloubser1 - Unfortunately i won't have time on this until summer when things are a little relaxed :( there has been a lot of review on this including a rewrite few months back. I didn't get the time afterwards to work on the tests as tobias reviewed. my apologies
All good, just wanted to check if you needed support on it - stalebot might close it in the future, but you can reopen it if you have capacity later in the summer.