dvc.org
                                
                                
                                
                                    dvc.org copied to clipboard
                            
                            
                            
                        Document dvcfs API
Report
Dvcfs can be a useful public api that is more flexible than the methods in the current dvc api. Some use cases for it:
- It can support walking through a repo or directory and getting the url or opening the files inside.
 - It can also be used to pass arbitrary credentials or other configuration to connect to remote storage.
 
Have we considered simplifying API refs and moving the details to auto-generated sites like https://docs.iterative.ai/dvc-task/reference/dvc_task/ ?
@jorgeorpinel, thatās a āDeveloper docsā, not a user facing one. It does not make sense to me to redirect to a different site for what is part of dvcās offering.
But I am not against auto-generation if the engine can support.
We had that discussion for dvc.api and decided against it (primarily bc team wanted to keep docstrings really simple - e.g. no full examples like we have, etc, etc). Do we plan in this case bring those into docstring?
We do have docstrings now in dvc.api thanks to @daavoo. Docstrings are useful when you are working in IDEs/Editors. Personally, I wouldn't want to leave my text-editor/REPL to docs as much as possible because it involves context switching from the task I have in hand. So I prefer docstrings.
But I also think the docs have a different responsibility compared to docstrings, such as introducing the dvcfs to new users, suggesting scenarios that it can solve, etc that @dberenbaum mentioned above. That said, we also need an API reference too in the docs (which could be generated from the docstrings).
Yes, I was talking specifically about API reference. If you feel that docstring that we have (and plan to have for dvcfs) cover enough of a doc like this https://dvc.org/doc/api-reference/get_url , we should def auto generate. Since docstrings are 100% needed (I would never want to jump into browser also).
simplifying API refs and moving the details to auto-generated sites like https://docs.iterative.ai/dvc-task/reference/dvc_task/ ?
thatās a āDeveloper docsā, not a user facing one
Not sure it's a meaningful distinction @skshetry: The user is a "developer" here (API ref).
If the sites in docs.iterative.ai just repeat docstrings, do we even need them? Apparently unrelated, but the idea was: if we do need them, then we could leave all the dvc.api/fs formalities (signature, types, arguments, etc.) that can be easily auto-generated there, and only use dvc.org/docs/api-ref for the Description and Examples.
does not make sense to me to redirect to a different site
It's pretty common to see different apps for technical explanations (example) vs. specifications (example). That said if the engine could incorporate auto-generated content that would be nice indeed.
Remaining work here is to have all supported methods linked from the docs:
It makes sense to start small with examples for this PR, but can we hold off closing #3927 until have a reference of supported methods either in the docs or linked from the docs (to an auto-generated docs site)? Compare to the s3fs docs and gcsfs docs. Both of those document the full API of methods they support. Otherwise, we are leaving users to either look through the codebase (without pointing to where they should look) or go to the fsspec AbstractFileSystem docs and use trial and error to find which methods are implemented.
Originally posted by @dberenbaum in https://github.com/iterative/dvc.org/pull/3932#pullrequestreview-1110727121
Related discussion: https://github.com/iterative/mlem.ai/issues/171 And initial proposal (with more discussion): https://github.com/iterative/mlem.ai/pull/172
Docstrings were added in https://github.com/iterative/dvc/pull/8332. Now how can we publish some docs from them?
Docstrings were added in iterative/dvc#8332. Now how can we publish some docs from them?
That's THE question š
Is there a plan with plugins to separate dvcfs from dvc?
Do you mean to generate something like this? https://docs.iterative.ai/dvc-task/
That could work. We just need to link these sites from DVC docs more, I think.
Yup @jorgeorpinel! The problem here is that dvcfs is one of the few things we haven't made into a separate package, so there's no obvious way to generate that.
OK but if the plan is to document it outside of dvc.org maybe we should move this ticket to the core repo.
Added a ticket in https://github.com/iterative/dvc/issues/8428. I'd vote to keep this one also to remember to link to whatever gets generated, but if you prefer to close it to reduce noise, no big deal @jorgeorpinel .
@yathomasi @rogermparent technical question - can we use docstring from an external repo as a gatsby node?
Yep! Anything that can be accessed with Node.js can be turned into a Gatsby node. I was thinking of making prism commands exported like this to get around our multi-repo command woes. We can also consume that node in any way we want, including having doc pages display the docstring content alongside some optional extra elaboration.
@rogermparent that sounds cool! and I see that multiple repos need this. It can simplify the docs flow - for certain pages we'll use Python files as a source, not Markdown. Let's create a ticket and discuss during the next call.
The generated reference docs will have the same content as fsspec's docs. What is our goal with the reference docs?
Do we want to expand upon those docs with examples? If so, then example-driven docs might be better than having a reference. I prefer User Guides that help users get things done (and looking at fastapi, they don't seem to have reference docs).
The only issue that I think this will solve is regarding discoverability, and that won't be fixed if we use a separate site for just dvcfs.
and looking at
fastapi, they don't seem to have reference docs).
Not a popular decision, though: https://github.com/tiangolo/fastapi/issues/804 š
Not a popular decision, though
For a popular project like that, someone will be unhappy, and only few people have commented. And dvcfs is not in the scale of fastapi. I am not against reference docs, but just want to set goals and how we can improve compared to fsspec's existing reference docs (given the same content except for the initializer/constructor).
and only few people have commented.
Well . . .
I am not against reference docs, but just want to set goals and how we can improve compared to fsspec's existing reference docs (given the same content except for the initializer/constructor).
Yes, in this case, I agree that there is too much overlap with the existing fsspec API ref and that we already cover the differences with examples, so I am ok with not duplicating.
My main concern is that it's unclear which methods from fsspec are implemented by dvcfs.
My main concern is that it's unclear which methods from fsspec are implemented by dvcfs.
Maybe I am biased here, but I think "Read only filesystem" may be enough here. Most of the API will raise NotImplementedError or EROFS during runtime.
Also, most of the APIs are implemented in fsspec and only the write-operations are not available during runtime. So this may require us to extend some methods to add docs that they are unavailable.
Anyway, here is the current state of APIs that are unsupported:
mkdir/makedirs/rmdir: are no-ops + optional to implement.
All of the following raises or should raise NotImplementedError/OSError(EROFS):
cp_file/sign/created/modified/rm_file: not extended in dvcfs, default implementation is to raise NotImplementedError.write_text/pipe_file/put_file/touch-> becauseopendoes not support write operations.copy-> depends oncp_file.mv-> depends oncopy/rm.rm-> depends onrm_file.put-> depends onput_file.pipe-> depends onpipe_file.
Good points @skshetry. It looks like auto suggestions and docstrings from within the IDE are working well, and between that, the existings fsspec reference, and the "read only filesystem" description, it should be enough. I'm fine to close this, but if someone else feels more is needed, feel free to reopen.