scaper icon indicating copy to clipboard operation
scaper copied to clipboard

Manage co-occurence of events

Open turpaultn opened this issue 3 years ago • 2 comments

Would it be possible to manage the co-occurence of events ?

The idea I used to generate desed dataset was using the parameter "p" of np.random.choice to have "probas", so it is quite simple and everything is managed only depending on the first event sampled (which defines the co_occur_params dictionnary to use, because it is specific to an event):

def choose_cooccurence_class(co_occur_params, random_state=None):
    """ Choose another class given a dictionary of parameters (from an already specified class).
    Args:
        co_occur_params: dict, define the parameters of co-occurence of classes
            Example of co_occur_params dictionnary::
                {
                  "max_events": 13,
                  "classes": [
                    "Alarm_bell_ringing",
                    "Dog",
                  ],
                  "probas": [
                    70,
                    30
                  ]
                }
            classes and probas maps each others
        random_state: int, or RandomS0tate object
    Returns:
        str, the class name.
    """
    if random_state is not None:
        random_state = _check_random_state(random_state)
        chosen_class = random_state.choice(co_occur_params['classes'], p=co_occur_params['probas'])
    else:
        chosen_class = np.random.choice(co_occur_params['classes'], p=co_occur_params['probas'])
    return chosen_class

(the max_events is used to determine a random "number of events" in the soundscape depending on the class of the first event sampled once again, so not very good, but easy to make and at least class dependent)

This is very simplistic code. But a goal could be to have a better co-occurence sampling (n-gram or other ideas inspired from generation of text from language model I guess ?), what do you think ?

turpaultn avatar Dec 18 '20 00:12 turpaultn

Cheers @turpaultn !

We could definitely add support for non-uniform discrete sampling, e.g. via a new choose_weighted distribution tuple.

IIUC in the example above you're providing the probability for each event being chosen, and then choosing one of these events, but that's not the same as co-occurrence probabilities, right? That is, it's different to say

  1. Choose between alarm/dog with prob .7/.3
  2. Give me a soundscape where alarm and dog co-occur with probability X.

My understanding from today's meeting was that the team is interested in the latter, but maybe I misunderstood?

Regardless, it looks like we'd need something like choose_weighted to support Gibbs or related types of sampling methods?

justinsalamon avatar Dec 18 '20 00:12 justinsalamon

Cool !

Well, I understand it's not clear, because I've put this little piece of code.

But the algorithm is like this:

  • Sample an event from the 10 classes (uniformly)
  • Take the dictionnary of co-occurence above (what's the probability of alarm/dog when "cat" is the first event)
  • Sample an event using this dictionnary (for example, no chance a vacuum_cleaner would be picked, there is only dog and alarm left)

The idea was that if an alarm ("bip") appeared for example, there is a lot of chance you can hear another one. As I said, it is simple, but at least we were able to have a class balance closer to the real set without going spending too much time.

Regardless, it looks like we'd need something like choose_weighted to support Gibbs or related types of sampling methods?

I agree.

turpaultn avatar Dec 18 '20 08:12 turpaultn