solid-spec icon indicating copy to clipboard operation
solid-spec copied to clipboard

Remove globbing from the spec

Open RubenVerborgh opened this issue 6 years ago • 35 comments

I strongly think that globbing should be removed from the spec.

Reasons for removing

  1. No one really wants globbing. People want cross-file data access, and there are better ways of achieving that. Globbing has always been a hack for accessing data in multiple files. It was not thought through well (see below).

  2. Globbing is expensive on the server-side.

  3. Globbing can lead to denial of service (on server and client).

  4. Whatever can be achieved through globbing, can be achieved as efficiently without. With HTTP/2, there is zero overhead in just going through the files on the client side.

  5. Let's remove it soon before it is actually widely used and implemented.

Reasons for keeping it plus mitigations

  1. A (very low) number are using it. Let's upgrade them.

  2. It has been in the spec for many years. That doesn't make it a good idea, and only a low number of apps are using it anyway (see 1).

Conditions

  1. Globbing should NOT be removed until there is a replacement for the functionality it provides.

People who need to have a say in this

@timbl @melvincarvalho

RubenVerborgh avatar Mar 21 '19 19:03 RubenVerborgh

+1

michielbdejong avatar Mar 21 '19 20:03 michielbdejong

:+1: I questioned need for it a long time ago in https://github.com/solid/solid/issues/116

elf-pavlik avatar Mar 21 '19 20:03 elf-pavlik

+1

dmitrizagidulin avatar Mar 21 '19 20:03 dmitrizagidulin

Eventually it is probably ideal for developer trust and adoption to adopt a Linux-like "never break userspace" policy, and to never make backward-incompatible changes like this. What is in the way of adopting such a policy? (i.e. what other spec features should be considered 'at risk for removal' before a 'v1'?)

Edit: I don't think it's necessary to adopt such a policy today. But maybe by a year from now? I am +1 on this proposal in the interest of applying Occam's Razor to the core of the spec. Just think it's good to trim and apply learnings all at once here instead of continuing a piecemeal feature-removal strategy ad-infinitum. AS2 made backward-incompatible changes far too long into the spec, IMO, and it actively undercut my ability to get it adopted inside my organization at the time.

gobengo avatar Mar 21 '19 20:03 gobengo

Eventually it is probably ideal for developer trust and adoption to adopt a Linux-like "never break userspace" policy

Agreed: eventually.

What is in the way of adopting such a policy?

W3C standardization would be a good way of going through the spec with scrutiny, and identifying and fixing issues (such as globbing).

RubenVerborgh avatar Mar 21 '19 21:03 RubenVerborgh

-1

This is in use

Consider this a formal objection.

Please by all means work on a HTTP2 library that could possibly act as a replacement. That would go some way to me withdrawing this objection.

melvincarvalho avatar Mar 22 '19 15:03 melvincarvalho

More specifically, I think we did discuss this in the past, and globbing is not quite the same as grabbing all the content in a directory. Is that useful? I think there can be an argument for yes and one for no. For example in an LDPC you get all the file content in a directory, and that's useful. Not suggesting those are equivalents, but is more a meta point.

Also some stats on how well deployed http2 is would be handy. And a proposal on how http2 could replace globbing. I've had lots of interest from my social network app, in private discussions, including from former facebook people. And darcy etc. That uses globbing extensively. Could it be replaced. Possibly, but might take some design and time. Not time I currently have in the next quarter or two.

Im increasingly deploying solid servers on all my devices now including yesterday my android phone. So eventually I could see solid deployed widely including IoT. So http2 usage would be an interesting data point here. We cant just be a chrome specific project, we should think about solid as web servers running everywhere, in your home, your fridge, your watch, your phone etc.

There's a few things id like to see fixed and working before tackling this. So my more in depth answer is, not for now, we could mark it as "revisit", like many of our issues are.

However, speccing out a possible http2 solution seems to be a good idea, and I'd support that proposal.

melvincarvalho avatar Mar 22 '19 16:03 melvincarvalho

@melvincarvalho I'm in full agreement. Nothing will or should be removed until there is a replacement. I planned to explicitly state this in the issue, but forgot; will update now.

RubenVerborgh avatar Mar 22 '19 16:03 RubenVerborgh

Tracking implementation of such a client-side feature in https://github.com/solid/solid/issues/253. Just a small note that we will likely not need anything HTTP/2-specific: HTTP/2 will automatically optimize the sequence of requests.

RubenVerborgh avatar Mar 22 '19 16:03 RubenVerborgh

On pros cons (from https://github.com/solid/solid/issues/253#issuecomment-475723216)

So let's just be honest about what are the pros/cons. From #145 the OP starts with

  1. No one really wants globbing. People want cross-file data access, and there are better ways of achieving that. Globbing has always been a hack for accessing data in multiple files. It was not thought through well (see below).

I don't pretend to know about this.

  1. Globbing is expensive on the server-side.

Only if you implement it naively. Alternate datastores (or if the datastore is a filesystem, shell out to bash or a C lib instead of globbing in node) can make this an O(1) lookup. This assertion needs more justification.

  1. Globbing can lead to denial of service (on server and client).

See 2. DDOS are a risk no matter what, e.g. by repeatedly getting full directory listings that are huge and taking up all available OS connections. It's not that unique to globbing. Practically, pros would deploy behind a DDOS-protecting middleware that makes this a non-issue.

  1. Whatever can be achieved through globbing, can be achieved as efficiently without. With HTTP/2, there is zero overhead in just going through the files on the client side.

My argument here is meant to analyze if this is true.

At this point I think everyone agrees the 'HTTP/2' mention isn't what's important. Even over HTTP there is no'zero overhead', but the overhead is probably negligible in the vast majority of near-term scenarios.

  1. Let's remove it soon before it is actually widely used and implemented.

No argument here.


So 1 and 5 are likely good reasons. 3 is a bit of an overstatement ('zero overhead'), but can be rephrased to be just as convincing.


And the scalable solution here will be querying. Globbing is just a poor man's query; let's replace that with a client-side solution, and built proper query interfaces instead.

Totally agree! +1

gobengo avatar Mar 22 '19 18:03 gobengo

Important update: it seems that globbing is much more loosely defined in the spec than how @melvincarvalho intends it. My objection here has been to the loose version; some of your objections might also have been.

So please have a look at https://github.com/solid/solid-spec/pull/148 for a proposal to already narrow down the current definition of globbing.

RubenVerborgh avatar Mar 26 '19 21:03 RubenVerborgh

While I prefer removing globing all together. In case it stays maybe the response could at least use dataset (quad) representation (Trig, JSON-LD) so at least client knows from which graphs / documents which statements came from. Otherwise I don't see how client could perform updates when it needs to.

elf-pavlik avatar Mar 28 '19 14:03 elf-pavlik

@timbl I noticed that you thumbs uppped this one.

I think this is the first time in living memory that I possibly disagreed with you.

Globbing is in use. I spent months of time and work building apps based on this pattern. If this had been at risk, I would not have started that work, and left it until we had other patterns in place.

My intention was to revive work after the server work had stabilized, for which I have waited patiently.

The main question is on what time line would you want this. On a longer time line I could see myself getting behind this, particularly if there are like for like replacements. My concern is that there will be unilateral changes to the spec at short notice.

melvincarvalho avatar Mar 28 '19 17:03 melvincarvalho

Whatever can be achieved through globbing, can be achieved as efficiently without. With HTTP/2, there is zero overhead in just going through the files on the client side.

@RubenVerborgh There is a burden of proof for you to prove a number of things. But this one is foundational. So examine the apps that use globbing, and that also includes cimba, and make the case that globbing can do all the things that are done. In fact it needs to be said what the functional requirements are, because globbing is not just used to fetch files. It will be a good conversation and a learning experience, for those that follow, I think. And also, importantly for me, I will get some breathing space to digest the detail of the proposal and assess the timeline, which is the main thing that matters to me. I'd say 3 of our best 5 apps ever have used globbing and solid would not exist without them. Let's get to the bottom of the above, because I suspect there's some fine detail you've missed.

EDIT: or even better, if you feel like you are in super hero mode (which sometimes are are imbued with) why not take a crack at taking one of the apps and porting it to node solid server 5 / http2 -- I think such an effort, would likely be the ultimate win-win.

melvincarvalho avatar Mar 28 '19 18:03 melvincarvalho

Whatever can be achieved through globbing, can be achieved as efficiently without. With HTTP/2, there is zero overhead in just going through the files on the client side.

@RubenVerborgh There is a burden of proof for you to prove a number of things.

Happy to oblige:

  • a glob is nothing but a concatenation of RDF files in a container
  • can be replicated on the client side by a) fetching the container b) GETting those files individually
  • under HTTP/2, multiple requests have virtually no overhead compared to a single request (that was one of the explicit design goals)
  • only overhead we thus have downstream is receiving the names of files that are not RDF
  • only overhead we thus have upstream is sending the URLs of files that are RDF (but the upstream channel will never be the bottleneck)

But this one is foundational. So examine the apps that use globbing, and that also includes cimba, and make the case that globbing can do all the things that are done. In fact it needs to be said what the functional requirements are, because globbing is not just used to fetch files.

Hmm, this is new information.

For all we know (= your earlier statement at https://github.com/solid/solid/issues/253#issuecomment-476838647 and TestGlob at https://github.com/linkeddata/gold/blob/b000d003f9e2aa40e4977839ca063f09435f80c8/server_test.go#L1193), the only implemented functionality is GET /data/* (confirmed by manual inspection of the GOLD code).

I'd say 3 of our best 5 apps ever have used globbing and solid would not exist without them.

They would just client-side loop over all files in the container.

RubenVerborgh avatar Mar 28 '19 23:03 RubenVerborgh

Added PR for removal as well, given that seems to be the demand of most: https://github.com/solid/solid-spec/pull/151 No need to rush.

RubenVerborgh avatar Mar 28 '19 23:03 RubenVerborgh

Nothing will or should be removed until there is a replacement.

Replacement at https://github.com/solid/ldp-glob; live demo at https://solid.github.io/ldp-glob/demo.html?https://drive.verborgh.org/public/

RubenVerborgh avatar Mar 28 '19 23:03 RubenVerborgh

Replacement at solid/ldp-glob; live demo at solid.github.io/ldp-glob/demo.html?https://drive.verborgh.org/public

@RubenVerborgh thanks for taking the time to create this. It's in the first place rather difficult to evaluate whether this is a like for like replacement, as it doesnt even have a README. I have had a very quick look at it, but will take some more time to do so.

I've readded the on-hold tag, as I would like to discuss this over a longer period of time. Would appreciate it if you didnt unilaterally remove it. Cheers!

melvincarvalho avatar Mar 29 '19 06:03 melvincarvalho

It's in the first place rather difficult to evaluate whether this is a like for like replacement, as it doesnt even have a README.

It's just 9 lines, so I figured it would be overkill to turn it into a lib. Went through it with @timbl and works for its purpose.

I've readded the on-hold tag, as I would like to discuss this over a longer period of time. Would appreciate it if you didnt unilaterally remove it.

on-hold is for things that are technically blocked. There are no technical blockers on this issue. I understand you don't have time, but that is not a technical blocker. So please remove that label and only use it when one technical issue needs to be resolved before another.

RubenVerborgh avatar Mar 29 '19 06:03 RubenVerborgh

Went through it with @timbl and works for its purpose

Citation required. Would appreciate to see the context, or better still, hear from Tim himself. Pain I know, but the bar for changing specs is necessarily high.

melvincarvalho avatar Mar 29 '19 06:03 melvincarvalho

Citation required.

That's it right there. No need to doubt my word.

Would appreciate to see the context

It's a private conversation that I hence cannot share.

Assigned the issues to @timbl, and will ping him to take a look.

RubenVerborgh avatar Mar 29 '19 07:03 RubenVerborgh

Discussed out of band with @melvincarvalho: I agree that https://github.com/solid/solid-spec/pull/148 and https://github.com/solid/solid-spec/pull/151 should be on-hold; I propose for this issue to not be on-hold (since it is not being blocked) so people can discuss.

RubenVerborgh avatar Mar 29 '19 08:03 RubenVerborgh

@NoelDeMartin Since you are using globbing in Solid Focus you should be aware of this

angelo-v avatar Apr 07 '19 10:04 angelo-v

@angelo-v thanks for the heads up.

When it comes to my use case, the spec is already compatible with the things I want to do, I'm only using globbing because there is no support for SPARQL on node-solid-server implementation, as is being tracked on this issue: https://github.com/solid/node-solid-server/issues/962

NoelDeMartin avatar Apr 07 '19 18:04 NoelDeMartin

@NoelDeMartin do you see it possible to replace you current use of globbing with client side replacement @RubenVerborgh shared in https://github.com/solid/solid-spec/issues/145#issuecomment-477812829 ?

elf-pavlik avatar Apr 08 '19 12:04 elf-pavlik

@elf-pavlik Yes it is possible, assuming the server uses HTTP/2 as @RubenVerborgh mentions. If it doesn't it's still possible but the performance won't be great.

NoelDeMartin avatar Apr 08 '19 18:04 NoelDeMartin

If it doesn't it's still possible but the performance won't be great.

It's quite alright as long as there are not hundreds of RDF files (and there usually never are). All the rest is premature optimization 😉

RubenVerborgh avatar Apr 08 '19 19:04 RubenVerborgh

@RubenVerborgh Well, considering I'm building a task manager there will probably be hundreds of files :). But yeah, I can live with that for the time being (and there is always HTTP/2).

NoelDeMartin avatar Apr 08 '19 19:04 NoelDeMartin

Well, considering I'm building a task manager there will probably be hundreds of files :).

And every task is a file? In that case, yes.

RubenVerborgh avatar Apr 08 '19 19:04 RubenVerborgh

and there is always HTTP/2

I can't see any reason why any Solid server would not use HTTP/2. I run NSS behind nginx and it just takes listen 443 ssl http2; to have HTTP/2 enabled. I think if NSS doesn't have it already it should have a config option to use node native HTTP/2. Enabling HTTP/2 should really add no extra effort to deployment of NSS.

elf-pavlik avatar Apr 08 '19 19:04 elf-pavlik