pyinterpolate icon indicating copy to clipboard operation
pyinterpolate copied to clipboard

[Variogram Point Cloud] [Scatterplot] Make a swarmplot from the scatterplot of semivariances per lag

Open SimonMolinsky opened this issue 3 years ago • 4 comments

Is your feature request related to a problem? Please describe. Yes, scatterplots generated with matplotlib are ugly and non-informative when we plot categorical data (lags). The example:

index

More is here: Variogram Point Cloud Tutorial

Describe the solution you'd like The best idea is to create:

  • swarmplot - https://stackoverflow.com/questions/36153410/how-to-create-a-swarm-plot-with-matplotlib,
  • or slightly disperse points along x axis (do not set them all in a single line) - https://seaborn.pydata.org/tutorial/categorical.html?highlight=bar%20plot#categorical-scatterplots

I would like to avoid seaborn at this moment, and if it is possible to use only matplotlib. If the overhead to create a swarm plot is too high in matplotlib, then use seaborn.

Describe alternatives you've considered n/a

Additional context The exact place in a code where a scatterplot is prepared: https://github.com/DataverseLabs/pyinterpolate/blob/67ca44a4a312850b929309d020fe9f94ea95c61e/pyinterpolate/variogram/empirical/cloud.py#L553

SimonMolinsky avatar Oct 08 '22 09:10 SimonMolinsky

Hi @SimonMolinsky , I am interested in working on this issue. Please assign this to me. Thanks!

SubinDuresh avatar Oct 08 '22 15:10 SubinDuresh

Hi @SubinDuresh !

Thank you! If something is not clear, ask here, and I'll respond quickly. If you fork the repo, please make PR into the dev branch later :) I've created a branch for this issue, and you can use it if you want to.

And this task is specific. If you will be forced to write too many lines of code and to perform heavy data processing before plotting, consider using seaborn. I'm not a fan of overengineering :)

SimonMolinsky avatar Oct 08 '22 15:10 SimonMolinsky

Hi @SimonMolinsky I am having trouble testing out the code changes in my local codebase. The code run is taking too long and feels frozen, so I am unable to judge if my code works. Could you suggest a way a to test my changes on my local?

SubinDuresh avatar Oct 10 '22 16:10 SubinDuresh

Hi @SubinDuresh ,

I wonder if you use the tutorial linked here? (https://pyinterpolate.readthedocs.io/en/latest/usage/tutorials/Variogram%20Point%20Cloud%20%28Basic%29.html)

If so, this tutorial uses a huge dataset, and scatterplot() of matplotlib renders it very slowly. You can modify it, which take approx. 0.5% of points within a dataset:

import numpy as np

from pyinterpolate import Blocks, read_txt, calc_point_to_point_distance, VariogramCloud
from pyinterpolate.variogram.empirical.experimental_variogram import calculate_semivariance

dem = read_txt('samples/point_data/txt/pl_dem_epsg2180.txt')

# HERE IS THE NEW LINE OR BLOCK

sample_size = int(0.05 * len(dem))
indices = np.random.choice(len(dem), sample_size, replace=False)
dem = dem[indices]

and all other lines are the same. Now it should be much faster :)

It should be the new issue, tutorials should work fast.

SimonMolinsky avatar Oct 11 '22 05:10 SimonMolinsky

Hi @SubinDuresh,

Did you work on this issue? Do you plan to do it? Please let me know, I want to close it soon and I need to know if you had a chance to look into it. If you didn't have time (and don't have it) then let me know too, I will reassign the issue for me ;)

Thanks, Szymon

SimonMolinsky avatar Jan 28 '23 15:01 SimonMolinsky

Hi @SimonMolinsky, really sorry, I've tried working on it but I'm unable to get it to work. Please reassign it. Sorry for the trouble, and thank you!

Regards, Subin Duresh

SubinDuresh avatar Jan 28 '23 15:01 SubinDuresh

That is not a problem, thank you for a quick response :) Have a great day,

Szymon

SimonMolinsky avatar Jan 28 '23 16:01 SimonMolinsky