data-prep-kit
data-prep-kit copied to clipboard
[Feature] Add license filtering for code modules
Search before asking
- [X] I searched the issues and found no similar issues.
Component
Transforms/code/code_quality
Feature
Capability to filter by permissive licenses for any new code data as a new module.
Are you willing to submit a PR?
- [x] Yes I am willing to submit a PR!
Needed for release 1
@Bytes-Explorer i caved on proglang_select, but now we're repeating this pattern for a different column name? We can't have a new transform for every attribute we want to annotate/filter on. I recommend 1 of the following
- Extend filter transform so that instead of removing rows, it optionally allows, keeping all rows, but adds a new column that is set to True for what "would have been filtered" cc: @cmadam
- Create a new transform that generalizes what proglang_select does for an arbitrary column and set of values.
I would strongly recommend one of these alternatives to creating a new special purpose transform. We're a toolkit after all.
These are two separate functionalities and we need to think from a view of how an end user will use it. There can be many transforms that will use filtering at the end to filter. However coupling all the functionalities together may make it hard for an end user to use the toolkit, esp when the number of functionalities grow.
This is in progress in PR #257