data
data copied to clipboard
[Linter] Ability to disable some lints
🚀 The feature
There are several options to disable specific linters.
Option 1. Disable with linter-ignore: code
Pros:
- Similar to known syntax of various linters
Cons:
- Need to modify code of datasets to disable something
datapipe = datapipe.sharding_filter().shuffle() # linter-ignore: shuffle-shard
Option 2. Global & Context disables
Pros:
- Can control datasets without modification of the code
Cons:
- Global might disable important errors
- Context requires additional indent
- Syntax feels weird
- Annoying to disable construct time linters (see below)
from torchdata import linter
linter.disable('shuffle-shard') # global
with linter.disable('shuffle-shard'): # context based
dl = DataLoader2(...)
Option 3. DLv2 argument / ReadingService argument
Pros:
- Local to specific DataLoader
- Can control datasets without modication of the code
Cons:
- Syntax feels weird
- Some linters might trigger/not in various ReadingServices
- Annoying to disable construct time linters (see below)
dl = DataLoader2(dp_graph, [adapter], disable_lint = ['shuffle-shard'])
Option 4. DataPipe 'attribute'
Pros:
- Can be defined by DataSet developer or by the user
- Can impact construct time error handling
Cons:
- Syntax feels weird
datapipe = datapipe.sharding_filter().shuffle().disable_lint('shuffle-shard')
and/or (as we can have an adapter to do the same job)
dl = DataLoader(dp_graph,[DisableLint('shuffle-shard')], ...)
Personally, I prefer the last variant, but I'm open to discussion.
PTAL @ejguan @NivekT
There is another problem with the option 1 (even though IMO the syntax seems the cleanest):
- DL linter would be a runtime linter rather than a static linter. It would be hard to implement such linter to detect comment in the dataset.
- If the targeted
DataPipes
are separated into multiple lines, it's unknown for us where to place the comment to disable the linter.
datapipe = datapipe. sharding_filter()
datapipe = datapipe.shuffle()
And, comparing with the other three options, I prefer the syntax of using Adapter
from the last option: dl = DataLoader(dp_graph,[DisableLint('shuffle-shard')], ...)
But, I am worried about disabling linter via an attribute from DataPipe
. If the Dataset is more complicated like
datapipe = datapipe.sharding_filter().map().shuffle().map().filter()....
, where should developers place the disable_lint()
in the pipeline?. And, do we want to allow developers to disable multiple linters at different places in the pipeline?
How about we always append a linter DataPipe at the end of the pipeline if it's not presented at the end? And, the Adapter
is used to disable certain linters. For Dataset
developers, they could provide the graph of DataPipe
with a Linter
DataPipe appended by explicitly disabling the linters.
And, before iteration, we can drop the linter DataPipe from the graph.
I share the same concern as @ejguan regarding option 1 since we need the linter to work during runtime.
I personally prefer Option 2, specially the Global based rather than context based, since that will allow linter to work outside of DataLoaderV2
(i.e. users call just call linter whenever they want on a DataPipeGraph
. Option 3 can also work but users will not be able to call/disable certain linters wherever they want, and linter can only work when passed into a DataLoaderV2
.