min icon indicating copy to clipboard operation
min copied to clipboard

Better support for web scraping

Open ajusa opened this issue 7 years ago • 10 comments

Hey, First off, I have been having a lot of fun with this language. It makes me think about what I write, and I haven't had that feeling in years.

One shortcoming I have noticed is the lack of web related tools. You do have a sockets library, however I didn't see an easy way to get the HTML of a website. I feel like it would be nice to have GET and POST built in to the library similar to the way Nim handles them. I honestly feel like the example provided was too much and should be abstracted away by a sigil built into min.

I am personally planning on hooking up nimquery by using a dynamic library. Basically, I want an easy way to get the full text of an HTML page.

ajusa avatar Feb 03 '18 03:02 ajusa

Hello! Glad to hear you are enjoying min!

I see... perhaps a wrapper on nim's httpclient library? I al always trying to keep min... well, minimal! But perhaps that could be a useful addition.

On the other hand, something like nimquery would be a bit too specialized I think, but it could make a great dynamic library of course!

I didn't quite understand what you mean with "the example provided was too much and should be abstracted away by a sigil built into min" -- are you referring to the example in the net module?

h3rald avatar Feb 04 '18 08:02 h3rald

Yes, I was referring to the get request example with httpbin. I feel like get requests (which are always going to happen on port 80) should be a simple function that takes a URL and returns the string of the webpage.

As a scripting language, this is nice for creating a terminal interface for websites without an API

ajusa avatar Feb 04 '18 13:02 ajusa

So, do you think that a wrapper on nim's httpclient library would add too much bloat to Min?

I personally really want it in, but if you don't want it, then I understand.

ajusa avatar Feb 06 '18 02:02 ajusa

Hello again,

I think it would be a useful addition to min! Now I only need to wait for a weekend to implement it ;-)

I'll keep this issue open and I'll give it a shot when I get a chance!

h3rald avatar Feb 06 '18 17:02 h3rald

Okay. I'll also see if I can get it done before.

ajusa avatar Feb 06 '18 20:02 ajusa

Great! Of course pull requests are always welcome :-)

h3rald avatar Feb 06 '18 21:02 h3rald

OK... I managed to put together a small http module. I still have to document it before I can release a new version of min, but it's there if you want to try it out!

h3rald avatar Feb 10 '18 17:02 h3rald

Thanks for adding this! I haven't tried it out yet, but I read through the source and it appears to be exactly what I wanted.

Close this issue if you want, or keep it open as a reminder to document.

ajusa avatar Feb 13 '18 23:02 ajusa

I'm leaving it open for now... I am nearly done with the docs and I also figured I'd add two more operators to start and stop a simple HTTP server -- it may be useful to have an easy way to code a simple API server in min for testing purposes maybe ;)

My preliminary test of wrapping asynchttpserver looks promising, so I'll hopefully release the whole thing today.

h3rald avatar Feb 18 '18 07:02 h3rald

Reopening this, it seems a good idea after all now that min has become more of a "batteries included" language.

h3rald avatar Feb 21 '21 21:02 h3rald

I know it has been... 6 years 🙈 but I started to think about an xml module with the following symbols:

- (s -> xnode)            xparse
- (xnode sl -> s)         xget
- (xnode -> (xnode))      xchildren
- (xnode -> dict)         xattrs
- (xnode -> s)            xtype
- (xnode sl sl -> xnode)  xset
- (xnode sl -> xnode)     xdelete
- (xnode xnode -> xnode)  xpush 
- (xnode -> xnode)        xpop 
- (dict (xnode) -> xnode) xentity
- (s -> xnode)            xcomment
- (s -> xnode)            xtext 
- (s -> xnode)            xcdata 
- (d -> xnode)            xentity
- (xnode -> s)            xstring
- (xnode sl -> xnode)     xquery
- (xnode sl -> (xnode))   xqueryall

Basically leveraging xmltree, parsexml, and nimquery under the hood. Shouldn't be too hard.

h3rald avatar Jul 18 '23 19:07 h3rald

Haha, it's fine. I never lost hope :laughing:

That API looks pretty solid to me, I'd have to think about it more. I believe my original use case 5-6 years ago was to do some lightweight web scaping, and it seems like everything here would be enough to do just that.

ajusa avatar Jul 22 '23 22:07 ajusa

...and it's finally done! As of v0.39.0, min now has a new xml module. In the end I implemented less methods because to do things like add children or attributes you can manipulate them with existing APIs, as they are implemented a a quotation and a dictionary, respectively.

h3rald avatar Jul 31 '23 16:07 h3rald