Akshay Sharma

Results 18 comments of Akshay Sharma

Hello @Gallaecio, I have submitted a proposal for this year's GSoC program. Here is the link to it: https://docs.google.com/document/d/1X9g62mNxYI305nfiAAmYkrOWkQMiaToni7kIWx30TKA/edit?usp=sharing My apologies for showing it to you this late, I wasn't...

What I understand by looking into https://github.com/scrapy/scrapy/blob/master/scrapy/responsetypes.py, I think `from_args` is the main function required by other scrapy files for mime sniffing. I think calling `xtractmime.extract_mime` with different parameters based...

> Related to that, although not achievable simply extending `CLASSES`: the standard taught me that any MIME type ending in `+xml` is to be treated as an XML file, so...

What can be the value of the `supported_types` parameter for `extract_mime`? Is that required here or not?

I have added the pre n post xtractmime tests with expected behavior as comments. There can be more failing scenarios, if I found one I will add it later. Still,...

> E AssertionError: {'headers': {b'Content-Disposition': [b'attachment; filename="data.xml.gz"']}, 'url': 'http://www.example.com/page/'} ==> != This is failing because `mimetypes.MimeTypes()` returning a `text/xml` content type instead of a `application/gzip` ``` >>> MimeTypes().guess_type("data.xml.gz") ('text/xml', 'gzip')...

> E AssertionError: {'body': b'\x00\xfe\xff', 'url': 'http://www.example.com/item/', 'headers': {b'Content-Type': [b'text/plain']}} ==> != This is failing as we are not considering NULL byte anymore and xtractmime detecting `b"\xfe\xff"` as a `text/plain`...

I thought the integration part would be simpler, I was wrong 😅

> Actually, I believe the current NULL byte replacement is too simple. We should only replace if there are no other binary data bytes, and the current approach just replaces...

I have created a separate PR for the response class computation using mimegroups. Please review https://github.com/akshaysharmajs/scrapy/pull/2/files