DataProfiler icon indicating copy to clipboard operation
DataProfiler copied to clipboard

Issues with Transfer Learning for Default Labeler

Open DylanVig opened this issue 2 months ago • 0 comments

I attempted to create my own custom labeler by using the transfer learning example in the documentation. I attempted to add three labels to the labeler: Name, Datetime (which is already a category in the default labeler), and Nationality. This is my code for training my labeler, as well as the csv I used to train it. large_fake_data.csv Screenshot 2024-06-25 at 3 06 44 PM

I attempted to use this custom labeler on a csv that had 6 columns: Name, Datetime, Phone Number, SSN, Email, and Nationality. When I ran it with the custom labeler, it correctly identified the names, datetimes, and nationalities. However, it also falsely identified the phone numbers, SSNs, and emails incorrectly (usually identifying email and SSN and nationality and phone number as datetime). When I run it with the default labeler, it seems to pick up on those three fields just fine. Is there a problem with how I am programming my labeler, how I'm training it, etc? Here is my code for testing my labeler, as well as the csv I used to get these results: three_cat_labeler_test_data.csv Screenshot 2024-06-25 at 3 11 02 PM

Thank you!

DylanVig avatar Jun 25 '24 19:06 DylanVig