[WIP] a proposal to document all datasets and models
Initial proposal, pretty vague for now but establishing what I think should be done as a first step.
Please provide feedback on the content, not necessarily the form or typos (we'll fix those later on)
Specially interested on @eiso and @vmarkovtsev's opinion, but feel free to provide yours too!
This is to notify that I am here and struggling to find the time for a proper review. ETA is Friday.
Any news, @vmarkovtsev ?
Damn I am squeezed but will do my best to review ASAP, sorry
I read through the proposal carefully and @campoy and I also discussed it in person. I am a big fan of this approach to information design. It makes it a lot clearer.
A minor comment is on the name predictors. e.g. Given input code, give me all tests that are likely to be related to it. This to me is not a prediction, it's inference.
Good point, I replaced predictor by inferencer which is probably more accurate.
Any other reviews are welcome, maybe it's time to discuss this on a meeting?
@vmarkovtsev based on your really nice/insightful feedback. I feel that you're not rejecting the proposal but wanting to amend the implementation.
I feel that @campoy's main point here is, how we build/present/communicate a mental model of how you build on top of the source{d} stack.

I think at this point it makes sense to have a small meeting about this. I would also like to invite our new Head of Architecture to review this proposal @smola
Yes, this better explains my rationale, thanks Eiso. I really love this graph BTW, and additional :heart: for using Graphviz to plot it.
So what should the next step be? Should I drop this PR and follow the engineering workflow? I'm totally fine with that
I would go with a DD (template). It is easier to discuss and fight, also this change would require actions from ML team or even Apps, depending on our depth level.
@vmarkovtsev @campoy agree with next steps and looping in @marnovo here.
FYI, I'm working on going through with this and creating a Design Document for my ideas on the topic. I'll update this issue once I have a draft.
I wrote an initial Design Document. For now it's very empty intentionally.
https://docs.google.com/document/d/1EbwfOd4UpVXCprW-9ApPhX-HN6PXHODVbKj4ajJtDfM/edit?usp=sharing
What do you all think?
@campoy @dennwc has proposed JSON-LD and the Dataset schema to annotate datasets: https://github.com/src-d/datasets/issues/51
More info: https://developers.google.com/search/docs/data-types/dataset
It seems a nice solution instead of doing our own format and schema.
I am back to this, finally. Reading and editing the document.
I am frozen, there are some urgent blocking tasks in style-analyzer
I am back to life. Actually, I have already written most of my thoughts and the document is ready for @campoy's and @marnovo's review.
Commented on the doc