guide icon indicating copy to clipboard operation
guide copied to clipboard

[WIP] a proposal to document all datasets and models

Open campoy opened this issue 8 years ago • 18 comments

Signed-off-by: Francesc Campoy [email protected]

campoy avatar Mar 09 '18 21:03 campoy

Initial proposal, pretty vague for now but establishing what I think should be done as a first step.

Please provide feedback on the content, not necessarily the form or typos (we'll fix those later on)

Specially interested on @eiso and @vmarkovtsev's opinion, but feel free to provide yours too!

campoy avatar Mar 09 '18 21:03 campoy

This is to notify that I am here and struggling to find the time for a proper review. ETA is Friday.

vmarkovtsev avatar Mar 14 '18 11:03 vmarkovtsev

Any news, @vmarkovtsev ?

campoy avatar Mar 20 '18 18:03 campoy

Damn I am squeezed but will do my best to review ASAP, sorry

vmarkovtsev avatar Mar 20 '18 19:03 vmarkovtsev

I read through the proposal carefully and @campoy and I also discussed it in person. I am a big fan of this approach to information design. It makes it a lot clearer.

A minor comment is on the name predictors. e.g. Given input code, give me all tests that are likely to be related to it. This to me is not a prediction, it's inference.

eiso avatar Mar 21 '18 11:03 eiso

Good point, I replaced predictor by inferencer which is probably more accurate. Any other reviews are welcome, maybe it's time to discuss this on a meeting?

campoy avatar Mar 21 '18 18:03 campoy

@vmarkovtsev based on your really nice/insightful feedback. I feel that you're not rejecting the proposal but wanting to amend the implementation.

I feel that @campoy's main point here is, how we build/present/communicate a mental model of how you build on top of the source{d} stack.

image

I think at this point it makes sense to have a small meeting about this. I would also like to invite our new Head of Architecture to review this proposal @smola

eiso avatar Mar 23 '18 11:03 eiso

Yes, this better explains my rationale, thanks Eiso. I really love this graph BTW, and additional :heart: for using Graphviz to plot it.

vmarkovtsev avatar Mar 23 '18 15:03 vmarkovtsev

So what should the next step be? Should I drop this PR and follow the engineering workflow? I'm totally fine with that

campoy avatar Apr 04 '18 18:04 campoy

I would go with a DD (template). It is easier to discuss and fight, also this change would require actions from ML team or even Apps, depending on our depth level.

vmarkovtsev avatar Apr 04 '18 19:04 vmarkovtsev

@vmarkovtsev @campoy agree with next steps and looping in @marnovo here.

eiso avatar Apr 06 '18 18:04 eiso

FYI, I'm working on going through with this and creating a Design Document for my ideas on the topic. I'll update this issue once I have a draft.

campoy avatar Apr 09 '18 21:04 campoy

I wrote an initial Design Document. For now it's very empty intentionally.

https://docs.google.com/document/d/1EbwfOd4UpVXCprW-9ApPhX-HN6PXHODVbKj4ajJtDfM/edit?usp=sharing

What do you all think?

campoy avatar Apr 10 '18 01:04 campoy

@campoy @dennwc has proposed JSON-LD and the Dataset schema to annotate datasets: https://github.com/src-d/datasets/issues/51

More info: https://developers.google.com/search/docs/data-types/dataset

It seems a nice solution instead of doing our own format and schema.

smola avatar Apr 26 '18 07:04 smola

I am back to this, finally. Reading and editing the document.

vmarkovtsev avatar Nov 05 '18 13:11 vmarkovtsev

I am frozen, there are some urgent blocking tasks in style-analyzer

vmarkovtsev avatar Nov 21 '18 10:11 vmarkovtsev

I am back to life. Actually, I have already written most of my thoughts and the document is ready for @campoy's and @marnovo's review.

vmarkovtsev avatar Nov 22 '18 18:11 vmarkovtsev

Commented on the doc

campoy avatar Nov 26 '18 21:11 campoy