dvc.org icon indicating copy to clipboard operation
dvc.org copied to clipboard

Document dvcfs API

Open dberenbaum opened this issue 3 years ago • 8 comments

Report

Dvcfs can be a useful public api that is more flexible than the methods in the current dvc api. Some use cases for it:

  • It can support walking through a repo or directory and getting the url or opening the files inside.
  • It can also be used to pass arbitrary credentials or other configuration to connect to remote storage.

dberenbaum avatar Sep 07 '22 14:09 dberenbaum

Have we considered simplifying API refs and moving the details to auto-generated sites like https://docs.iterative.ai/dvc-task/reference/dvc_task/ ?

jorgeorpinel avatar Sep 08 '22 03:09 jorgeorpinel

@jorgeorpinel, that’s a ā€œDeveloper docsā€, not a user facing one. It does not make sense to me to redirect to a different site for what is part of dvc’s offering.

But I am not against auto-generation if the engine can support.

skshetry avatar Sep 08 '22 03:09 skshetry

We had that discussion for dvc.api and decided against it (primarily bc team wanted to keep docstrings really simple - e.g. no full examples like we have, etc, etc). Do we plan in this case bring those into docstring?

shcheklein avatar Sep 08 '22 03:09 shcheklein

We do have docstrings now in dvc.api thanks to @daavoo. Docstrings are useful when you are working in IDEs/Editors. Personally, I wouldn't want to leave my text-editor/REPL to docs as much as possible because it involves context switching from the task I have in hand. So I prefer docstrings.

But I also think the docs have a different responsibility compared to docstrings, such as introducing the dvcfs to new users, suggesting scenarios that it can solve, etc that @dberenbaum mentioned above. That said, we also need an API reference too in the docs (which could be generated from the docstrings).

skshetry avatar Sep 08 '22 06:09 skshetry

Yes, I was talking specifically about API reference. If you feel that docstring that we have (and plan to have for dvcfs) cover enough of a doc like this https://dvc.org/doc/api-reference/get_url , we should def auto generate. Since docstrings are 100% needed (I would never want to jump into browser also).

shcheklein avatar Sep 08 '22 06:09 shcheklein

simplifying API refs and moving the details to auto-generated sites like https://docs.iterative.ai/dvc-task/reference/dvc_task/ ?

that’s a ā€œDeveloper docsā€, not a user facing one

Not sure it's a meaningful distinction @skshetry: The user is a "developer" here (API ref).

If the sites in docs.iterative.ai just repeat docstrings, do we even need them? Apparently unrelated, but the idea was: if we do need them, then we could leave all the dvc.api/fs formalities (signature, types, arguments, etc.) that can be easily auto-generated there, and only use dvc.org/docs/api-ref for the Description and Examples.

does not make sense to me to redirect to a different site

It's pretty common to see different apps for technical explanations (example) vs. specifications (example). That said if the engine could incorporate auto-generated content that would be nice indeed.

jorgeorpinel avatar Sep 13 '22 08:09 jorgeorpinel

Remaining work here is to have all supported methods linked from the docs:

It makes sense to start small with examples for this PR, but can we hold off closing #3927 until have a reference of supported methods either in the docs or linked from the docs (to an auto-generated docs site)? Compare to the s3fs docs and gcsfs docs. Both of those document the full API of methods they support. Otherwise, we are leaving users to either look through the codebase (without pointing to where they should look) or go to the fsspec AbstractFileSystem docs and use trial and error to find which methods are implemented.

Originally posted by @dberenbaum in https://github.com/iterative/dvc.org/pull/3932#pullrequestreview-1110727121

dberenbaum avatar Sep 19 '22 13:09 dberenbaum

Related discussion: https://github.com/iterative/mlem.ai/issues/171 And initial proposal (with more discussion): https://github.com/iterative/mlem.ai/pull/172

jorgeorpinel avatar Sep 22 '22 04:09 jorgeorpinel

Docstrings were added in https://github.com/iterative/dvc/pull/8332. Now how can we publish some docs from them?

dberenbaum avatar Sep 29 '22 20:09 dberenbaum

Docstrings were added in iterative/dvc#8332. Now how can we publish some docs from them?

That's THE question šŸ˜…

daavoo avatar Sep 30 '22 11:09 daavoo

Is there a plan with plugins to separate dvcfs from dvc?

dberenbaum avatar Sep 30 '22 15:09 dberenbaum

Do you mean to generate something like this? https://docs.iterative.ai/dvc-task/

That could work. We just need to link these sites from DVC docs more, I think.

jorgeorpinel avatar Oct 05 '22 04:10 jorgeorpinel

Yup @jorgeorpinel! The problem here is that dvcfs is one of the few things we haven't made into a separate package, so there's no obvious way to generate that.

dberenbaum avatar Oct 05 '22 15:10 dberenbaum

OK but if the plan is to document it outside of dvc.org maybe we should move this ticket to the core repo.

jorgeorpinel avatar Oct 12 '22 04:10 jorgeorpinel

Added a ticket in https://github.com/iterative/dvc/issues/8428. I'd vote to keep this one also to remember to link to whatever gets generated, but if you prefer to close it to reduce noise, no big deal @jorgeorpinel .

dberenbaum avatar Oct 12 '22 16:10 dberenbaum

@yathomasi @rogermparent technical question - can we use docstring from an external repo as a gatsby node?

shcheklein avatar Oct 12 '22 18:10 shcheklein

Yep! Anything that can be accessed with Node.js can be turned into a Gatsby node. I was thinking of making prism commands exported like this to get around our multi-repo command woes. We can also consume that node in any way we want, including having doc pages display the docstring content alongside some optional extra elaboration.

rogermparent avatar Oct 12 '22 21:10 rogermparent

@rogermparent that sounds cool! and I see that multiple repos need this. It can simplify the docs flow - for certain pages we'll use Python files as a source, not Markdown. Let's create a ticket and discuss during the next call.

shcheklein avatar Oct 12 '22 21:10 shcheklein

The generated reference docs will have the same content as fsspec's docs. What is our goal with the reference docs?

Do we want to expand upon those docs with examples? If so, then example-driven docs might be better than having a reference. I prefer User Guides that help users get things done (and looking at fastapi, they don't seem to have reference docs).

The only issue that I think this will solve is regarding discoverability, and that won't be fixed if we use a separate site for just dvcfs.

skshetry avatar Oct 17 '22 10:10 skshetry

and looking at fastapi, they don't seem to have reference docs).

Not a popular decision, though: https://github.com/tiangolo/fastapi/issues/804 šŸ˜…

daavoo avatar Oct 17 '22 10:10 daavoo

Not a popular decision, though

For a popular project like that, someone will be unhappy, and only few people have commented. And dvcfs is not in the scale of fastapi. I am not against reference docs, but just want to set goals and how we can improve compared to fsspec's existing reference docs (given the same content except for the initializer/constructor).

skshetry avatar Oct 17 '22 11:10 skshetry

and only few people have commented.

Well . . .

Captura de Pantalla 2022-10-17 a las 13 15 36

I am not against reference docs, but just want to set goals and how we can improve compared to fsspec's existing reference docs (given the same content except for the initializer/constructor).

Yes, in this case, I agree that there is too much overlap with the existing fsspec API ref and that we already cover the differences with examples, so I am ok with not duplicating.

daavoo avatar Oct 17 '22 11:10 daavoo

My main concern is that it's unclear which methods from fsspec are implemented by dvcfs.

dberenbaum avatar Oct 17 '22 13:10 dberenbaum

My main concern is that it's unclear which methods from fsspec are implemented by dvcfs.

Maybe I am biased here, but I think "Read only filesystem" may be enough here. Most of the API will raise NotImplementedError or EROFS during runtime.

Also, most of the APIs are implemented in fsspec and only the write-operations are not available during runtime. So this may require us to extend some methods to add docs that they are unavailable.

Anyway, here is the current state of APIs that are unsupported:

  1. mkdir/makedirs/rmdir: are no-ops + optional to implement.

All of the following raises or should raise NotImplementedError/OSError(EROFS):

  1. cp_file/sign/created/modified/rm_file: not extended in dvcfs, default implementation is to raise NotImplementedError.
  2. write_text/pipe_file/put_file/touch -> because open does not support write operations.
  3. copy -> depends on cp_file.
  4. mv -> depends on copy/rm.
  5. rm -> depends on rm_file.
  6. put -> depends on put_file.
  7. pipe -> depends on pipe_file.

skshetry avatar Oct 17 '22 14:10 skshetry

Good points @skshetry. It looks like auto suggestions and docstrings from within the IDE are working well, and between that, the existings fsspec reference, and the "read only filesystem" description, it should be enough. I'm fine to close this, but if someone else feels more is needed, feel free to reopen.

dberenbaum avatar Oct 17 '22 17:10 dberenbaum