reproman icon indicating copy to clipboard operation
reproman copied to clipboard

Potential for lost remaining_files_to_trace in identify_distributions()

Open chaselgrove opened this issue 6 years ago • 3 comments
trafficstars

At reproman/interface/retrace.py:211, we iterate over a tracer to get distributions and files, using the distribution every iteration but the files only from the last:

           for env, remaining_files_to_trace in tracer.identify_distributions(
                   files_to_trace):
               distibutions.append(env)
               nenvs += 1
           files_processed |= files_to_trace - remaining_files_to_trace
           files_to_trace = remaining_files_to_trace

We should either proccess remaing_files_to_trace every iteration or make tracer.identify_distributions() a single call.

chaselgrove avatar May 16 '19 14:05 chaselgrove

At reproman/interface/retrace.py:211, we iterate over a tracer to get distributions and files, using the distribution every iteration but the files only from the last: [...] We should either proccess remaing_files_to_trace every iteration or make tracer.identify_distributions() a single call.

For clarity and maintainability, I agree we should change this. But looking through the identify_distributions() definitions for the different tracers, I didn't spot one that doesn't yield the same set of files each time (and most definitions actually only have one yield), so I wouldn't think it matters (yet) in practice.

But perhaps I missed something in my scan because

Lost remaining_files_to_trace in identify_distributions()

makes me think that you actually hit a case where you noticed unknown files being silently dropped.

kyleam avatar May 16 '19 16:05 kyleam

No, I didn't find a problem case -- I just chose that as the hard-hitting headline. :) In fact, I'd rather if we made the tracer.identify_distributions() functions so we don't have to iterate over them. I'm ultimately trying to fix the duplicate distributions, and it would be a good start if the tracer.identify_distributions() didn't have the option of yielding several distributions.

chaselgrove avatar May 16 '19 17:05 chaselgrove

No, I didn't find a problem case

All right, thanks for confirming.

In fact, I'd rather if we made the tracer.identify_distributions() functions so we don't have to iterate over them

Converting the generator return value to distributions::list, unknown::set sounds fine to me. (I don't see a way around returning a list of distributions rather than a single one given that the conda tracer can find multiple distributions.)

kyleam avatar May 16 '19 17:05 kyleam