data [Linter] Ability to disable some lints

🚀 The feature

There are several options to disable specific linters.

Option 1. Disable with linter-ignore: code

Pros:

Similar to known syntax of various linters

Cons:

Need to modify code of datasets to disable something

datapipe = datapipe.sharding_filter().shuffle()  # linter-ignore: shuffle-shard

Option 2. Global & Context disables

Pros:

Can control datasets without modification of the code

Cons:

Global might disable important errors
Context requires additional indent
Syntax feels weird
Annoying to disable construct time linters (see below)

from torchdata import linter
linter.disable('shuffle-shard') # global
with linter.disable('shuffle-shard'): # context based
    dl = DataLoader2(...)

Option 3. DLv2 argument / ReadingService argument

Pros:

Local to specific DataLoader
Can control datasets without modication of the code

Cons:

Syntax feels weird
Some linters might trigger/not in various ReadingServices
Annoying to disable construct time linters (see below)

dl = DataLoader2(dp_graph, [adapter], disable_lint = ['shuffle-shard'])

Option 4. DataPipe 'attribute'

Pros:

Can be defined by DataSet developer or by the user
Can impact construct time error handling

Cons:

Syntax feels weird

datapipe = datapipe.sharding_filter().shuffle().disable_lint('shuffle-shard')

and/or (as we can have an adapter to do the same job)

dl = DataLoader(dp_graph,[DisableLint('shuffle-shard')], ...)

Personally, I prefer the last variant, but I'm open to discussion.

Jul 08 '22 17:07 VitalyFedyunin

PTAL @ejguan @NivekT

Jul 08 '22 17:07 VitalyFedyunin

There is another problem with the option 1 (even though IMO the syntax seems the cleanest):

DL linter would be a runtime linter rather than a static linter. It would be hard to implement such linter to detect comment in the dataset.
If the targeted DataPipes are separated into multiple lines, it's unknown for us where to place the comment to disable the linter.

datapipe = datapipe. sharding_filter()
datapipe = datapipe.shuffle()

And, comparing with the other three options, I prefer the syntax of using Adapter from the last option: dl = DataLoader(dp_graph,[DisableLint('shuffle-shard')], ...)

But, I am worried about disabling linter via an attribute from DataPipe. If the Dataset is more complicated like datapipe = datapipe.sharding_filter().map().shuffle().map().filter()...., where should developers place the disable_lint() in the pipeline?. And, do we want to allow developers to disable multiple linters at different places in the pipeline?

How about we always append a linter DataPipe at the end of the pipeline if it's not presented at the end? And, the Adapter is used to disable certain linters. For Dataset developers, they could provide the graph of DataPipe with a Linter DataPipe appended by explicitly disabling the linters.

And, before iteration, we can drop the linter DataPipe from the graph.

Jul 08 '22 18:07 ejguan

I share the same concern as @ejguan regarding option 1 since we need the linter to work during runtime.

I personally prefer Option 2, specially the Global based rather than context based, since that will allow linter to work outside of DataLoaderV2 (i.e. users call just call linter whenever they want on a DataPipeGraph. Option 3 can also work but users will not be able to call/disable certain linters wherever they want, and linter can only work when passed into a DataLoaderV2.

Jul 15 '22 21:07 NivekT

data data copied to clipboard

[Linter] Ability to disable some lints

🚀 The feature

data
data copied to clipboard