pycobertura Support for (generated) source files in different directories?

I had a quick test with pycobertura on Weston and hit a crash trying the diff: pycobertura.filesystem.FileNotFound: /home/pq/git/weston/protocol/fullscreen-shell-unstable-v1-client-protocol.h

That is expected: the file will never be found at that path. Instead, it is generated during build into the build directory, completely separate from the source directory. Since the file is generated, it will not be present in any source code or artifacts tar-balls in the same directory as the normal sources.

I'm just looking at different tools to diff code coverage reports for now, so I cannot promise to use pycobertura even with this fixed, but it seems like a show-stopper here. One would have to seach for a source file first in the source dir, then in the build dir.

Nov 22 '19 12:11 ppaalanen

Thank you @ppaalanen for reporting this!

Hm, that's interesting, I have never encountered this use-case before. What could we do about it? Any ideas?

A few things that cross my mind:

without modifying pycobertura, you could ensure that the files are present where pycobertura is expected to find them. You might need a little script before running the coverage diff that could copy over the files to the expected locations. You could have a map of known files or filename patterns to be copied over. This means that you'd need to have the build around, readily available, before running pycobertura.
Can we affect how paths in the coverage report are generated? Maybe generate two separate coverage reports instead? One for the source and one for the build such that each report would provide an accurate location of each file? You'd then run pycobertura twice and generate two coverage diff reports. It might not be as convenient.
We could have pycobertura skip over missing files instead of crashing. The coverage diff report will be unable to render the source of the missing files. But then it might not an ideal scenario for you.
Another thought would be to provide pycobertura with extra search directory paths from which it could try to scan and hopefully find a match. But that feels brittle and I can imagine how it could go wrong in so many ways by reporting on the wrong files, or never reporting on some files.

What do you think?

Nov 22 '19 16:11 aconrad

Thank you @ppaalanen for reporting this!

Hm, that's interesting, I have never encountered this use-case before. What could we do about it? Any ideas?

A few things that cross my mind:

without modifying pycobertura, you could ensure that the files are present where pycobertura is expected to find them. You might need a little script before running the coverage diff that could copy over the files to the expected locations. You could have a map of known files or filename patterns to be copied over. This means that you'd need to have the build around, readily available, before running pycobertura.

I think this would be unacceptable for the project. Sphinx is already causing us trouble, because we need to copy some documentation source files as is into the build directory, since we also generate some documentation source files and Sphinx et al. refuse to use more than one search path. The copying has a high risk of leaving stale source files around in build dirs when switching git revisions or branches, and tools that load all files from the search path (Sphinx again) will load also the stale files, causing failures.

The complication seems too much compared to the benefit. It would be better if pycobertura simply ignored missing files.

Can we affect how paths in the coverage report are generated? Maybe generate two separate coverage reports instead? One for the source and one for the build such that each report would provide an accurate location of each file? You'd then run pycobertura twice and generate two coverage diff reports. It might not be as convenient.

Right, seems inconvenient.

We could have pycobertura skip over missing files instead of crashing. The coverage diff report will be unable to render the source of the missing files. But then it might not an ideal scenario for you.

I think this would be acceptable - definitely better than nothing.

Another thought would be to provide pycobertura with extra search directory paths from which it could try to scan and hopefully find a match. But that feels brittle and I can imagine how it could go wrong in so many ways by reporting on the wrong files, or never reporting on some files.

Yet, that is what the build system Meson does by passing complete source file paths and more -I options (adds a path to the header search path list) to the C compiler suite.

I don't know the Cobertura file format, so I don't know how precisely source files are identified there. Is it just the base name and extension without any path? Seems unlikely because that would break in any project having two files with the same name, just in separate directories.

The obvious idea to me would be to list the base directories. Searching for the files would then be plain concatenation of each base directory and a file path from a Cobertura file until a hit is found. What problems do you see with that approach?

Any confusion of same named (path and name) files between source and build directories would be self-inflicted in any project, which would be fragile to begin with especially with C headers as one would have to carefully control the search path order of the C compiler. So I think we can assume that people will avoid that situation in any case.

Maybe one more idea would be to let pycobertura search the source directory like it does already, but allow giving a list additional files with complete (absolute or not) paths. Generating the additional files list would be doable with Meson I believe. Maybe that could be matched with a simple tail comparison to file names in Cobertura files.

If none of this seems like an obviously good idea to you, then I might prefer to wait with this issue until there are more interested projects or I have serious plans to make use of this. For now, it seems diff-cover could cover the most pressing needs. The case where it seems like I would need something like pycobertura is when comparing the effect of adding more tests across the whole code base and not just the added code lines.

Nov 25 '19 08:11 ppaalanen

I think we should start by skipping files that aren't found, that seems the simplest first step.

I'm on vacation this week of Thanksgiving but if you want to take a stab at trying to get pycobetura to skip missing files and I can take a look when I get back in December.

I think skipping files should be the default when generating reports from the command line to be more user friendly. The missing files should be reported as missing instead of going completely silent to let the reader of the report that there was a problem finding the files.

Pycobetura will also allow you to find changes in coverage in files you haven't touched. Say if you refactor code and the code flows differently elsewhere causing coverage to change then pycobetura will let you know. For example, if a refactor introduced inaccessible/dead code because you stopped calling a function then you will easily spot where that happened and act upon it.

Nov 25 '19 08:11 aconrad

I think we should start by skipping files that aren't found, that seems the simplest first step.

I'm on vacation this week of Thanksgiving but if you want to take a stab at trying to get pycobetura to skip missing files and I can take a look when I get back in December.

Thanks, but I'll leave it for you, I'm not in a hurry with this, this year.

Pycobetura will also allow you to find changes in coverage in files you haven't touched. Say if you refactor code and the code flows differently elsewhere causing coverage to change then pycobetura will let you know. For example, if a refactor introduced inaccessible/dead code because you stopped calling a function then you will easily spot where that happened and act upon it.

Yes, that will be valuable, and is the reason why I'm looking into pycobertura - I have not found another tool for the same yet.

Nov 25 '19 09:11 ppaalanen

Hi, sorry to enter this conversation out of the blue. But I have the same test case where source files are generated at compilation time. I tried the idea of copying the right files with a small script, it isn't ideal but it's okay. But I faced another error which is : "UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 1039: invalid continuation byte". The idea of skipping file when an error occurs when decoding/parsing or even finding the file is very interesting. I don't know if you have any update if this feature will be taken in the future or can I help in any way possible. Great peace of software by the way and thank you. P.S it would be great to enrich the error with the file causing the unicode error

Feb 04 '20 11:02 omarala

Hello,

I had another try with pycobertura-diff on Weston, and it works! It does not choke on "missing" files anymore. The reports look very useful.

I'm also lucky in that I don't care about the test coverage of the generated files. OTOH I can well imagine projects that would care, so I hesitate to just close this issue.

Thanks for a great tool!

I still don't know of any other cobertura diff tool that would report coverage changes also in unchanged source files, but I haven't really searched in a long time either.

Oct 12 '22 10:10 ppaalanen

I had another try with pycobertura-diff on Weston, and it works! It does not choke on "missing" files anymore. The reports look very useful.

I'm glad it works for you! Did you try it with pycobertura v3.0.0?

I'm also lucky in that I don't care about the test coverage of the generated files. OTOH I can well imagine projects that would care, so I hesitate to just close this issue.

Each project has their own context and needs. This issue doesn't seem active so I'll close it until someone has this need again and we can revisit with a fresh mind and new inputs.

Thanks for a great tool!

Thanks for your support, it's always appreciated!

I still don't know of any other cobertura diff tool that would report coverage changes also in unchanged source files, but I haven't really searched in a long time either.

Likewise, I don't know any other tools that achieved that. Most tools I've seen overlay coverage information over the code diff. Pycobertura-diff does it the other way around by design, we overlay code information over the coverage diff so we can get the full picture.

Oct 12 '22 11:10 aconrad

Yes, I took what pip3 happened to offer which is 3.0.0.

Oct 12 '22 12:10 ppaalanen