NiftyNetModelZoo icon indicating copy to clipboard operation
NiftyNetModelZoo copied to clipboard

Check compatibility of license of each entry with the original dataset license

Open tvercaut opened this issue 7 years ago • 5 comments

As per #1 CC-BY is chosen as the default licence for the model zoo entries. However, this might not be compatible with the licence of the training dataset that was used to compute the weights.

OASIS for example has a permissive CC-BY licence (https://www.oasis-brains.org/#access) but has additional citation requirements which are currently not quite met in https://github.com/NifTK/NiftyNetModelZoo/tree/5-reorganising-with-lfs/OASIS

We need to check each entry individually.

  • What does the BRATS license say?
  • The VISCERAL paper mentions a "license agreement that assured the use of the data in its given environment and for its research purpose". We currently do not mention a non-commercial restriction
  • etc.

tvercaut avatar Oct 14 '18 14:10 tvercaut

For OASIS there's an additional license file included in the .tar.gz; for BRATS, it's a few volume extracted from the original set, I have contacted Spyros, he agreed that we host these volumes with a citation to the original papers. I'll double check the other downloadables...

wyli avatar Oct 16 '18 09:10 wyli

Thanks. Note that it's not only about the data but also about the pre-trained weights as these might be considered derived work. Not 100% sure about it but would be worth looking into.

Re OASIS, for clarity, we could copy (or point to) the OASIS licence in a README file (in line with the discussion in #6 )

tvercaut avatar Oct 16 '18 10:10 tvercaut

@tvercaut, do you have any reference that explains what licenses are needed for machine learning models?

fepegar avatar Oct 16 '18 10:10 fepegar

That is a complex question and in many cases might depend on the licences under which the training data was released. You will need someone with an actual law background to help navigate these questions I am afraid.

Even when the training data consists of photographs from say imagenet, flickr, etc. there are copyright questions. Whether pre-trained weights from there fall under "fair use" (not convinced but see see e.g. https://fairuse.stanford.edu/overview/fair-use/what-is-fair-use/) or whether they fall under "databases/fact compilations" (never really looked into these) or whether I am just fantasising (very plausible but I don't think this has been tested in court yet) is a great question. You will find many reddit and similar discussions on the topic, e.g.:

  • https://www.reddit.com/r/MachineLearning/comments/7eor11/d_do_the_weights_trained_from_a_dataset_also_come/
  • https://www.reddit.com/r/MachineLearning/comments/4eu2vd/can_pretrained_networks_be_used_in_commercial/
  • https://www.reddit.com/r/MachineLearning/comments/3a24wx/copyright_laws_and_machine_learning_algorithms/
  • https://www.reddit.com/r/MachineLearning/comments/6ss5mw/d_can_i_share_a_model_trained_on_a_non_free/

In short, we won't have a clear cut answer unless the licence in the original dataset helps us out...

tvercaut avatar Oct 16 '18 17:10 tvercaut

Thanks, Tom! I'll take a look.

fepegar avatar Oct 16 '18 17:10 fepegar