pytorch-lightning
pytorch-lightning copied to clipboard
Support `str(datamodule)`
🚀 Feature
Add support
print(str(MyDataModule()))
Motivation
It currently prints:
<__main__.MyDataModule object at 0x10284c970>
Pitch
It could print the DataLoader structure:
MyDataModule(
train_dataloader: {"a": DataLoaderClass(batch_size=8, num_batches=16, num_workers=2), "b": DataLoaderClass(batch_size=2, num_batches=16, num_workers=2)]
val_dataloader: [DataLoaderClass(batch_size=3, num_batches=14, num_workers=0), DataLoaderClass(batch_size=8, num_batches=4, num_workers=0)]
test_dataloader: DataLoaderClass(batch_size=4, num_batches=7, num_workers=2)
)
Or the number of batches per dataloader, similar to what was done in https://github.com/PyTorchLightning/pytorch-lightning/issues/5965
Alternatives
Open to other ideas
If you enjoy Lightning, check out our other projects! ⚡
-
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
-
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
-
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
-
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @kingyiusuen
I could take care of this 👍
cc @kingyiusuen
I am happy to let @Abelarm take it :)
Hi guys I am currently at a problem between:

and
*
*which is not consistent on the prints.
the problem is the str() of the dict :(
Do you have any idea? or one of the two solutions is good enough?
Hi guys I am currently at a problem between:
and
*
*which is not consistent on the prints.
the problem is the str() of the dict :(
Do you have any idea? or one of the two solutions is good enough?
if you really want the keys of the dict to be with "" I can do it but it won't be the nicest of the solutions
Hey @Abelarm! You can open a draft PR so we can check your current implementation and discuss it.
in the spirit of https://docs.python.org/3.4/reference/datamodel.html#object.repr
If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment).
I recommend:
- keeping the quotes around dict keys but not dict values
- using an
=after the name of initialization parameters instead of a:
Following these recommendations, @Abelarm 's test expression would become:
MyDataModule(
train_dataloader={"a": DataLoaderClass(batch_size=8, num_batches=16, num_workers=2), "b": DataLoaderClass(batch_size=2, num_batches=16, num_workers=2)]
val_dataloader=[DataLoaderClass(batch_size=3, num_batches=14, num_workers=0), DataLoaderClass(batch_size=8, num_batches=4, num_workers=0)]
test_dataloader=DataLoaderClass(batch_size=4, num_batches=7, num_workers=2)
)
Hey @carmocca,
I believe adding support for str() provides the same inconvenient as using len().
It might be worth to consider a describe LightningDataModule method instead.
Best, T.C
The main reason for the revertion of len was the impact to existing truthiness checks. That should not be a problem for str.
@ananthsub do you think the rest of the points you raised in https://github.com/PyTorchLightning/pytorch-lightning/issues/5965#issuecomment-948862064 are worth dropping this feature? We would still have the problem of initialization.
in the spirit of https://docs.python.org/3.4/reference/datamodel.html#object.repr
If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment).
I recommend:
- keeping the quotes around dict keys but not dict values
- using an
=after the name of initialization parameters instead of a:Following these recommendations, @Abelarm 's test expression would become:
MyDataModule( train_dataloader={"a": DataLoaderClass(batch_size=8, num_batches=16, num_workers=2), "b": DataLoaderClass(batch_size=2, num_batches=16, num_workers=2)] val_dataloader=[DataLoaderClass(batch_size=3, num_batches=14, num_workers=0), DataLoaderClass(batch_size=8, num_batches=4, num_workers=0)] test_dataloader=DataLoaderClass(batch_size=4, num_batches=7, num_workers=2) )
in my pr I already go : instead of = but I am struggling to add "" around keys dict :(
It seems like this feature is still not implemented. Would it be possible to work in this issue?