specification icon indicating copy to clipboard operation
specification copied to clipboard

Content-Type - insisting on it as a MUST will create a barrier to adoption

Open emmettownsend opened this issue 5 years ago • 21 comments

https://github.com/solid/specification/blob/0a5c1a2b19c4a80429177e8527811b68fcaa87d3/main/protocol.bs#L37

The Solid specification currently changes the Content-Type from SHOULD to MUST. This is of course cleaner and safer. However it is also a change to the HTTP specification. The relevant text from RFC 7231 is shown below.

A sender that generates a message containing a payload body SHOULD generate a Content-Type header field in that message unless the intended media type of the enclosed representation is unknown to the sender. If a Content-Type header field is not present, the recipient MAY either assume a media type of "application/octet-stream" ([RFC2046], Section 4.5.1) or examine the data to determine its type.

Changing this to a MUST means there will be lots of code written over the past couple of decades by millions of developers that cannot be reused in the Solid world without modification. This will have an impact on adoption of solid across the world. Lots of that code will depend on the fact that HTTP servers do not enforce MUST and often examine the data to determine the type. If we want to make it as easy as possible to reuse existing code so that the transition to Solid based apps is encouraged, then I suggest that this decision is counter productive.

emmettownsend avatar Nov 25 '20 16:11 emmettownsend

This MUST had been put there at the request of @timbl (and with consensus from several). The reasoning being: data without a content type is not meaningful and in almost all cases an error. We do not know what to do with it, so hence we should reject. In that sense, it makes it easier, because it gives an error when something goes wrong as opposed to silently failing and causing problems down the line.

Clarification (by me): application/octet-stream is meaningless and cannot be amended later by the client (without another addition that goes beyond HTTP). Content type sniffing has security consequences.

In general, I don't think the notion of going beyond HTTP works as an argument for there being a problem. When a write operation such as PUT, POST, or PATCH is being performed, it likely involves authentication as well, which already goes beyond HTTP, so the millions of developers' code cannot be reused in any case. Asking for a content type is even minor in that regard; plus it would be hard to find a library that does not support it.

RubenVerborgh avatar Nov 25 '20 16:11 RubenVerborgh

https://github.com/solid/specification/issues/70#issuecomment-547924171

We found it is advantageous to avoid lack of clarity in content types, since a fallback to defaults like application/octet-stream would result in that apps cannot determine the content type, and therefore not present a suitable UX for it.

Perhaps not the best summary of the discussion of the thread and the meeting..

Put differently, if server defaults to application/octet-stream (or text/plain) from one client, another client can't realistically be expected to process it and provide a useful UI. Likely a miss.

The expectation is that sender should know what it is sending, and if not, application/octet-stream is perfectly fine to send - at least a valid field-value (media type) for Content-Type. In that case, sender understands the consequences in that a recipient application may not necessarily be able to use it in a meaningful way - besides downloading the representation to local environment.

csarven avatar Nov 25 '20 17:11 csarven

This issue is invalid as it is nonsense as phrased. The Solid spec does not CHANGE the http spec at all. It refers to it and calls out a small subset of it. Goodness how many things are allowed in HTTP which Solid does not allow.

This particular restriction improves the simplicity and security of solid systems. It doss not prevent non-solid systems from doing whatever they like.

We can discuss whether to change the Solid. Here is an issue where that has been done a year ago. Let's not repeat that conversation here without reading it first:

https://github.com/solid/specification/issues/70

timbl avatar Nov 25 '20 18:11 timbl

Specifically the point made at https://github.com/solid/specification/issues/70#issuecomment-547924171 (Thanks @csarven )

timbl avatar Nov 25 '20 18:11 timbl

Just a note that NSS currently accepts a PUT without a specified content-type and defaults to text/plain.

jeff-zucker avatar Nov 25 '20 18:11 jeff-zucker

It does? It should not.

timbl avatar Nov 25 '20 18:11 timbl

Yes, I thought it was agreed to send a 4xx when a request didn't include a content-type. I just ran this script and it created local.something with content-type text/plain. Same for any other extension or no extension.

const SolidNodeClient = require('solid-node-client').SolidNodeClient;         
const auth = new SolidNodeClient();                                             
                                                                                
let url     = "https://jeff-zucker.solidcommunity.net/public/local.something"   
let content = "some content"                                                    
                                                                                
async function run(){                                                           
  console.log("logging in ...")                                                 
  await auth.login()                                                            
  let session = await auth.currentSession()                                     
  console.log( `logged  in as <${session.webId}>` )                             
  await auth.fetch( url,{                                                       
    method:"PUT",                                                               
    body:content                                                                
  })                                                                            
  let response = await auth.fetch( url, {method:"GET"} )                        
  let got = await response.text()                                               
  console.log( (content===got ? "ok!" : "fail!") + ` write/read <${url}>` )    
  console.log( `got content-type <${response.headers.get('content-type')}>` )  
}                                                                               
run() 

jeff-zucker avatar Nov 25 '20 18:11 jeff-zucker

@jeff-zucker which Content-Type is received when you do the GET for the request? (I suspect it's application/octet-stream.)

megoth avatar Nov 25 '20 18:11 megoth

I did indeed read all of the previous thread from the end of 2019.

There was a good debate. And it seemed to be in favour of SHOULD.

Then there was an offline conversation and the outcome of that conversation was presented as the official answer. And it boiled down to the following paragraphs:

"Proposal following F2F meeting with @csarven , @timbl and @kjetilk present of 2019-10-30:

We found it is advantageous to avoid lack of clarity in content types, since a fallback to defaults like application/octet-stream would result in that apps cannot determine the content type, and therefore not present a suitable UX for it.

Users of basic UAs (e.g. curl) should be prevented from skipping the content type, because that may cause subsequent problems for apps using the data.

The cost of requiring clients to submit content type is thus much lower than cost of the requirement on servers and clients to deal with the consequences of wrong or useless content types.

This points towards a strict interpretation, i.e. MUST."

That was followed by an observation from @TallTed which was not taken into consideration from what I can see.

My point still stands regarding the inability to reuse lots of existing client code that assumes Content-Type is a SHOULD.

The debate made perfect sense up until the point where it went offline.

emmettownsend avatar Nov 25 '20 19:11 emmettownsend

@megoth

@jeff-zucker which Content-Type is received when you do the GET for the request? (I suspect it's application/octet-stream.)

Nope, it comes back as text/plain. I edited the script above to print it out.

jeff-zucker avatar Nov 25 '20 19:11 jeff-zucker

@jeff-zucker ok, I thought you might get different since we're observing different with the PodBrowser. Weird that there's different behavior.

megoth avatar Nov 25 '20 19:11 megoth

@emmettownsend -- It is helpful to include links to the specific comments you cite, especially when they occur in long threads. (They're hrefs beneath the timestamp on each comment.) I think that the observation you referred to above is in this comment on #70, but maybe you meant this other comment on #70? You might also quote (via copy/paste) the text of the specific observation, so as to remove all question.


@csarven, @timbl, @kjetilk -- I find it troubling when an offline conversation results in a decision without any opportunity for other interested and previously participating (online) parties to counter whatever arguments were made offline, especially when that offline conversation is not clearly summarized along with presentation of the decision.

That said, I do not think we're actually in disagreement, as it seems that the expected behavior is for the receiving software, whether on upload ("client" sender to "server" receiver) or download ("server" sender to "client" receiver), to fallback to application/octet-stream (as HTTP RFC calls for, and as I will continue to strenuously argue for) when the sending software fails to include a Content-type, while some people's observed behavior is fallback to text/plain (which appears to signal a bug somewhere; it's not entirely clear to me what "client" software was being used to upload nor download, nor what "server" software was in play), nor that all experimenters (@megoth, @emmettownsend, others?) were using the same software (including version) on either end of either transfer.

Further, I think there's some forgetfulness about what SHOULD actually means in this context. It's just shy of a MUST, allowing implementers who have good reason to accept the consequences of not complying with that SHOULD to not comply. (It's also perfectly reasonable to include the arguments for compliance with it right alongside that SHOULD in the spec -- summarizing all the arguments made here and in #70 -- which ahem SHOULD result in most implementers deciding to comply -- and letting those who have the good reason alluded to earlier to ignore it.) Conformance/compliance testing for SHOULD is also not typically difficult -- it's just like for MUST, except that the result of failure is a WARNING instead of a REJECT. Awareness of non-compliance, and the potential impacts thereof, is sufficient for deployment decision-makers to make their decisions.

TallTed avatar Nov 25 '20 21:11 TallTed

I didn't interpret the conclusion in the same way but if what you are saying is what the group in that meeting meant then I agree with that. It would be good to get clarity.

emmettownsend avatar Nov 25 '20 21:11 emmettownsend

@emmettownsend , and for anyone else that's curious about this:

That was followed by an observation from @TallTed which was not taken into consideration from what I can see.

I'll take the blame for not following up on the issue before or after the PR I've made https://github.com/solid/specification/pull/157 . Note that there was significant amount of time passed from the last comment in issue 70 and the PR, and there was no strong objection - as opposed to request for clarity and discussion - at either place, as well as chats before merging the requirement after review. While not unanimous in the CG ( https://www.w3.org/2020/Process-20200915/#def-Unanimity ), which is often difficult to achieve any way, it is as is, call it rough consensus if you will.

FWIW, I can only attempt to assure you that I have considered Ted's comment, and do take each public contribution (whether in the issues, chat, or elsewhere) on equal grounds, including the ones made in private to me - encouraging people to take it up in public and have it recorded so it is given equal consideration towards a stronger consensus. I've simply made an editorial decision to proceed based on what appeared to be sufficiently representative.

FWIW:

"Other clients will not be able to use it for anything" overstates the impact.

True. This is just loose language to get the point across.

"Other clients" may inspect the payload of .ttl typed as application/octet-stream, discover that it's actually JSON-LD, and take appropriate action.

As there was no significant case where a client needs to inspect payload, the design decision was to not allow that situation to occur in the first place. This generally reflects RFC 7231's guideline for determining intent:

In practice, resource owners do not always properly configure their origin server to provide the correct Content-Type for a given representation, with the result that some clients will examine a payload's content and override the specified type. Clients that do so risk drawing incorrect conclusions, which might expose additional security risks (e.g., "privilege escalation"). Furthermore, it is impossible to determine the sender's intent by examining the data format: many data formats match multiple media types that differ only in processing semantics. Implementers are encouraged to provide a means of disabling such "content sniffing" when it is used.

As for:

"Making the chance of breakage smaller" sounds to me much more like a SHOULD rule, than a MUST rule, as the latter will tend to lead to expectations that will not necessarily be satisfied.

The spec needs to be reasonably airtight, and lean on MUSTs, rather than SHOULDs or MAYs. The decision - just as any decision, one can raise a why or a counter - was to minimise potential risks as per available guidelines and implementation experience.

csarven avatar Nov 25 '20 21:11 csarven

@megoth - stranger and strangerer - on NSS, PUT without specified content-type returns text/plain but POST without specified content-type returns application/octet stream. Let's move the implementation part of this discussion to NSS where, apparently I opened this same issue more than a year ago. https://github.com/solid/node-solid-server/issues/1245#issuecomment-733959234

jeff-zucker avatar Nov 25 '20 21:11 jeff-zucker

So just for clarity...

The decision is not as per @TallTed interpretation?

If the Content-Type is not provided on a PUT and POST then a 400 will always be returned i.e. there will be no fallback?

emmettownsend avatar Nov 25 '20 21:11 emmettownsend

For context: web browsers (or at least, Firefox and Chromium) set the mime type to an empty string when they can't determine the file type in a file selector. Try it out here: https://xuibk.codesandbox.io/

Vinnl avatar Nov 26 '20 08:11 Vinnl

Sender must include a media-type for the Content-Type header as per RFC 7231. The media-type is a non-empty string ( https://tools.ietf.org/html/rfc7231#section-3.1.1.1 ).

While an empty File type value is allowed as per File API if the type cannot be determined ( https://www.w3.org/TR/FileAPI/#dfn-type ), the processing algorithm for the Content-Type header must fail when mimeType is null ( https://fetch.spec.whatwg.org/#content-type-header ).

Sender should either i) omit the Content-Type header in the request, ii) use a fallback (eg. application/octet-stream) media-type for the Content-Type or iii) determine the media-type by other means and use that for the Content-Type header in the request.

Edit: The current Solid spec requirement is that if sender does not include the Content-Type header in the request of a write operation, recipient will reject the request (400). For invalid Content-Type field-values, it is within recipient's right to reject the request.

Edit: Web browsers making a request with the Content-Type header with an empty field-value should be considered a bug since they don't conform to the specifications. Unless already recorded, I suggest filing browser bugs.

csarven avatar Nov 26 '20 10:11 csarven

I'm OK with rejection for empty Content-Type, though I prefer fallback to application/octet-stream.

My big complaint earlier, here and elswhere was with fallback to anything other than application/octet-stream (e.g., text/plain, text/turtle), presuming no or unsuccessful content sniffing. It now seems clear that the text/plain, text/turtle fallbacks were actual bugs and should be treated as such.

TallTed avatar Nov 29 '20 02:11 TallTed

@emmettownsend , are you content with the technical explanations given as well as the assessment made towards consensus? If you have no formal objection, I'd like to close this issue with "status: Commenter Satisfied" or "Not Satisfied" depending on your response.

csarven avatar Apr 08 '22 10:04 csarven

(It's late in the game, but I'll also refer to Postel's Law, i.e., be generous in what you accept and conservative in what you produce. Accept missing header(s) value(s), handling the payload as best you can for whatever the general purpose. In other words, Solid Server which is acting in any way as a fileserver, not just an RDF triple/quad server, SHOULD accept malformed or missing header values and fall back to treating the payload as application/octet-stream, as I described earlier.)

TallTed avatar Apr 08 '22 17:04 TallTed

Closing this issue at this point. Anyone running into this issue in the future, please refer to https://solidproject.org/ED/protocol and related issues (if any) on the topic presenting new information for consideration to change the requirement.

csarven avatar Aug 27 '23 09:08 csarven