reproman
reproman copied to clipboard
Potential for lost remaining_files_to_trace in identify_distributions()
At reproman/interface/retrace.py:211, we iterate over a tracer to get distributions and files, using the distribution every iteration but the files only from the last:
for env, remaining_files_to_trace in tracer.identify_distributions( files_to_trace): distibutions.append(env) nenvs += 1 files_processed |= files_to_trace - remaining_files_to_trace files_to_trace = remaining_files_to_trace
We should either proccess remaing_files_to_trace every iteration or make tracer.identify_distributions() a single call.
At reproman/interface/retrace.py:211, we iterate over a tracer to get distributions and files, using the distribution every iteration but the files only from the last: [...] We should either proccess remaing_files_to_trace every iteration or make tracer.identify_distributions() a single call.
For clarity and maintainability, I agree we should change this. But looking through the identify_distributions() definitions for the different tracers, I didn't spot one that doesn't yield the same set of files each time (and most definitions actually only have one yield), so I wouldn't think it matters (yet) in practice.
But perhaps I missed something in my scan because
Lost remaining_files_to_trace in identify_distributions()
makes me think that you actually hit a case where you noticed unknown files being silently dropped.
No, I didn't find a problem case -- I just chose that as the hard-hitting headline. :) In fact, I'd rather if we made the tracer.identify_distributions() functions so we don't have to iterate over them. I'm ultimately trying to fix the duplicate distributions, and it would be a good start if the tracer.identify_distributions() didn't have the option of yielding several distributions.
No, I didn't find a problem case
All right, thanks for confirming.
In fact, I'd rather if we made the tracer.identify_distributions() functions so we don't have to iterate over them
Converting the generator return value to distributions::list, unknown::set sounds fine to me. (I don't see a way around returning a list of distributions rather than a single one given that the conda tracer can find multiple distributions.)