Recipes icon indicating copy to clipboard operation
Recipes copied to clipboard

Call for content

Open ebenolson opened this issue 10 years ago • 30 comments

@benanne @dnouri @craffel @f0k @skaae (and anyone else of course)

With first release imminent it would be nice to have a bit more here... I know a couple of you have stuff written up already, but I bet everyone has some suitable code lying around.

If you have anything you're willing to contribute, please open a PR... don't worry if it's not perfect, I can take care of making sure everything functions with the latest Lasagne.

ebenolson avatar Aug 06 '15 13:08 ebenolson

I have some notebook from the recent Kaggle Diabetic Retinopathy competition loading some trained model and using it to predict on some images, get some activations, etc. But it's quite quickly written and a little messy, so it probably needs to be cleaned up a bit. Nevertheless, something quite applied like that might be interesting for some people, maybe?

JeffreyDF avatar Aug 06 '15 13:08 JeffreyDF

Forgot to mention: it's built on top of an older version of Lasagne as well. At commit cf1a23c21666fc0225a05d284134b255e3613335. Same as my own fork.

JeffreyDF avatar Aug 06 '15 13:08 JeffreyDF

I can write a spatial transformer example, when the layer is added.

I would need to clean up the transformer code and create a repeat layer similar to http://keras.io/layers/core/#repeatvector (which I think is quite usefull for encoder/decoder stuff)

I have code for combining a GRU with a spatial transformer network.

model

The output looks like: pics

I could also just rewrite the spatial transformer network results. What do people prefer?

skaae avatar Aug 06 '15 13:08 skaae

btw I also have a Penntree example. The setup is similar to http://arxiv.org/abs/1409.2329 except that i would have to use a GRU because of a minor technical issue: https://github.com/craffel/nntools/commit/f4d15bace8057731ab24b5a1f92d56fee80d2187#diff-48f9f9d93ed00587a6c49197b4d6e64eL992

Penntree example https://github.com/skaae/nntools/blob/pentree_recurrent/examples/pentree.py

skaae avatar Aug 06 '15 13:08 skaae

@JeffreyDF your notebook looks really interesting, but it seems like it's dependent on a lot other stuff in your repo, might be a quite a bit of work to move it.

Perhaps we should add an "External Resources" section to README.md with description and links to repos like yours? We could also try including it as a submodule.

@skaae I think the pentree example would be great, and spatial transformer as well once https://github.com/Lasagne/Lasagne/issues/355 is resolved.

ebenolson avatar Aug 06 '15 14:08 ebenolson

Perhaps we should add an "External Resources" section to README.md with description and links to repos like yours?

We also still have this wiki page for links to things that don't fit into Recipes (or were not turned into Recipes yet): https://github.com/Lasagne/Lasagne/wiki/3rd-party-extensions-and-code We could just as well link that from somewhere (e.g., README.md).

f0k avatar Aug 06 '15 14:08 f0k

@ebenolson Yes, you're right. But in essence, the same functionality can be achieved with just a few extra lines of code in the notebook, I think. It's mostly the DataLoader which only reads images and resizes them (at least for testing) and the custom metrics, iirc. I can try to clean that up a bit (a little busy at the moment but should be able to find some time in the week). But I'm totally fine if you would rather have it as a more "complete" project to maybe point to (or not!). It makes some sense to do that. :-)

JeffreyDF avatar Aug 06 '15 14:08 JeffreyDF

We also still have this wiki page for links to things that don't fit into Recipes (or were not turned into Recipes yet): https://github.com/Lasagne/Lasagne/wiki/3rd-party-extensions-and-code We could just as well link that from somewhere.

Yeah, I feel like not many people know about/visit the wiki though, it would be nice if we could increase visibility (although Recipes is in a similar state at the moment).

ebenolson avatar Aug 06 '15 14:08 ebenolson

Ah, yes, extra remark: for my notebook I'm using these 80-90MB model dumps to load first. So yes ... that probably makes it a little more unsuitable to have here! Forgot about that.

JeffreyDF avatar Aug 06 '15 14:08 JeffreyDF

@ebenolson Yes, you're right. But in essence, the same functionality can be achieved with just a few extra lines of code in the notebook, I think. It's mostly the DataLoader which only reads images and resizes them (at least for testing) and the custom metrics, iirc. I can try to clean that up a bit (a little busy at the moment but should be able to find some time in the week). But I'm totally fine if you would rather have it as a more "complete" project to maybe point to (or not!). It makes some sense to do that. :-)

If you've got time to package it, I think that would be great, I'm happy with either though. I would like to keep large binary files out of the repo (data, snapshots), so if you decide to rework it it would be good if you could host those on Dropbox/S3 or something similar.

ebenolson avatar Aug 06 '15 14:08 ebenolson

Ah, yes, extra remark: for my notebook I'm using these 80-90MB model dumps to load first. So yes ... that probably makes it a little more unsuitable to have here! Forgot about that.

I do have an S3 bucket I made for Recipes stuff that I'm happy to put them in if you want.

ebenolson avatar Aug 06 '15 14:08 ebenolson

I'll try to clean up the Penntree example and make it use GRU units instead of LSTM. I don't have access to a GPU before i return to Denmark in ~10 days so i can not test the results before then.

skaae avatar Aug 06 '15 14:08 skaae

I'll try to clean up the Penntree example and make it use GRU units instead of LSTM. I don't have access to a GPU before i return to Denmark in ~10 days so i can not test the results before then.

Great, thank you!

ebenolson avatar Aug 06 '15 14:08 ebenolson

@ebenolson Thank you! Will take that into account. Might do something with it in a little while. :-)

JeffreyDF avatar Aug 06 '15 15:08 JeffreyDF

I will submit PRs for the highway networks and hidden factors of variation notebooks when I find some time. It's Hopefully soon! I'll also have a think about what else I could share. Maybe some of the cyclic pooling/rolling stuff (with custom CUDA kernels) would be useful to have on here as well.

I could also just rewrite the spatial transformer network results. What do people prefer?

@skaae please do both :D That cluttered MNIST example looks really cool by the way! Is this published anywhere?

benanne avatar Aug 06 '15 15:08 benanne

We could also add links on http://gitxiv.com/ if the example reproduces a specific paper.

skaae avatar Aug 06 '15 15:08 skaae

I have the code from the LSTM benchmark, which needs a little updating, which is a noisy speech recognition experiment. It would be a little silly to host it in both places though.

I also have code (also online) for my ISMIR paper, but it's a bit obscure for a simple example.

One thing I've been asked for is examples for using a CNN on spectrograms. I could try to make up a simple example for this, or maybe @f0k should reproduce his ISMIR paper from last year with Lasagne ;)

For any of the above, data is an issue as none of them are common datasets (like CIFAR or MNIST). How can we get stuff in the S3 bucket @ebenolson ?

@skaae A number of people have asked me for a "char-rnn" example in Lasagne, which I think your penn treebank example is close to. Any chance you want to do that too/instead?

craffel avatar Aug 06 '15 15:08 craffel

Yes. I think its the same as the char-rnn except that i need a sample function which im not sure how to implement. As i understand it the model outputs a probability distribution over words which you sample and use as input in the next time step (?).

We could implement sampling by compling a single step model and then run it in for-loop.

  1. {h_t, wordprobs_t} = f(h_{t-1}, sampledword_{t-1})
  2. sampledword_t = multinomial_sample(wordprobs_t)

Here f is the recurrent model compiled to run a single step. Does anyone have a better solution for sampling?

skaae avatar Aug 06 '15 16:08 skaae

maybe @f0k should reproduce his ISMIR paper from last year with Lasagne ;)

For any of the above, data is an issue as none of them are common datasets (like CIFAR or MNIST).

I've actually already reproduced it, but it depends on other code of mine that I'd need to strip off. Also the training data is not available online. It'd be easier to reproduce the onset detection paper, for which at least part of the training data is public (but still not enough). These data issues are a bit annoying... Maybe this year's ISMIR paper would be suited best. I'll see what I can do.

f0k avatar Aug 06 '15 16:08 f0k

How can we get stuff in the S3 bucket @ebenolson ?

I've emailed access details to you and @skaae. If anyone else needs access let me know.

ebenolson avatar Aug 06 '15 16:08 ebenolson

I updated the language model to use GRU. It seems to run but i cannot test the performance until i have access to GPU.

https://github.com/skaae/nntools/tree/penntree_lasagne/examples

Todo:

  • put data in S3 + add download function
  • Create sampling function
  • Test performance
  • Move to reciepies.

Comments are velcome :)

skaae avatar Aug 06 '15 17:08 skaae

Re: data storage and S3, maybe a better solution would be to use Git LFS (or one of the equivalents)? That way it won't ever go anywhere and we won't need any downloader scripts.

craffel avatar Aug 06 '15 19:08 craffel

I'm not sure - it looks like Github LFS is still under construction (I applied for the early access though, we'll see what happens...).

Also I don't know if asking people to install git-lfs is easier than a download script (with an ipython notebook you can do it inline with !wget http://URL BTW)

ebenolson avatar Aug 06 '15 19:08 ebenolson

I'm not sure - it looks like Github LFS is still under construction (I applied for the early access though, we'll see what happens...).

I have access. There are a bunch of other options, but I'm predicting git LFS to come out on top.

Also I don't know if asking people to install git-lfs is easier than a download script (with an ipython notebook you can do it inline with !wget http://URL BTW)

Yeah, I mostly mean for data provenance (what happens when you go broke and your S3 disappears?)

craffel avatar Aug 06 '15 21:08 craffel

Hi @skaae, I like the Penntree example very much. Glad that it will work with what's checked in now. I had it working before, but needed to change some parameters for LTSM to fit Colin's recurrent branch.

What is the issue with data for that example? It's quite small. Or that's about a different paper. Sorry having a hard time following.

moscow25 avatar Aug 07 '15 04:08 moscow25

I have access. There are a bunch of other options, but I'm predicting git LFS to come out on top.

Cool! Do you know if it's possible to get a download URL, like the zipball links for repos? Is there any more info on quota/pricing available? The 1GB storage limit I saw seems pretty small.

Yeah, I mostly mean for data provenance (what happens when you go broke and your S3 disappears?)

Yeah, free github hosting would be ideal - although I wonder how sustainable binary hosting will be for them.

Anyway for now I think everyone should just use whatever they're comfortable with and we can always mirror/migrate in the future. S3 has cost me $0.52 so far, so I'm not too concerned about breaking the bank :)

ebenolson avatar Aug 07 '15 10:08 ebenolson

Do you know if it's possible to get a download URL, like the zipball links for repos?

That's a good question. I'm pretty sure it doesn't, which limits its utility...

Is there any more info on quota/pricing available? The 1GB storage limit I saw seems pretty small.

https://help.github.com/articles/billing-plans-for-git-large-file-storage/

Anyway for now I think everyone should just use whatever they're comfortable with and we can always mirror/migrate in the future. S3 has cost me $0.52 so far, so I'm not too concerned about breaking the bank :)

Hah ok, I didn't realize it was so cheap! I think you're right that that's the way to go for now.

craffel avatar Aug 07 '15 16:08 craffel

@moscow25 can you move the question to the lasagne google group? Maybe be a bit more specific about the problem and i'll try to help.

skaae avatar Aug 07 '15 16:08 skaae

I was gonna sort out the notebooks I did before and submit PRs for them, but it looks like they depended on some functions provided by the old mnist example that are no longer there. So it'll take a bit longer than I anticipated as I don't have time to fix them right now. If anyone else wants to do it, be my guest. It just a question of getting rid of from mnist import create_iter_functions, train afaik (we can still import load_dataset as that still exists). You can find them in this branch on the main repo: https://github.com/Lasagne/Lasagne/tree/highway_example

Otherwise, hopefully next weekend.

benanne avatar Aug 09 '15 21:08 benanne

@skaae all I meant was that @craffel pointed me to your Penntree example, and I found it useful to run, before the RNN change was checked in. So thanks for creating it. If that kind of comment belongs outside of GitHub, I'm not sure why, but I'll keep that in mind.

moscow25 avatar Aug 10 '15 22:08 moscow25