ccan icon indicating copy to clipboard operation
ccan copied to clipboard

problem with generator on 64 bit compters

Open picca opened this issue 7 years ago • 14 comments

Hello, I am using generator and coroutine on an i386 computer in order to build my hkl project. It works fine.

The problem arise when I switch to 64 bits computer and can be reproduce using the current debian unstable source package from here

https://packages.debian.org/source/unstable/hkl

(the problematic version is this one https://tracker.debian.org/news/930678)

It generates this kind of failure via a Segfault.

https://buildd.debian.org/status/fetch.php?pkg=hkl&arch=amd64&ver=5.0.0.2447-1&stamp=1518020479&raw=0

I tryed to debug the problem via gdb and I got this

Program received signal SIGSEGV, Segmentation fault.
generator_new_ (fn=fn@entry=0xaaaaaaae65b8 <trajectory_gen_generator__>, retsize=48, retsize@entry=40) at generator/generator.c:38
38      generator/generator.c: No such file or directory.
(gdb) bt
#0  0x0000aaaaaaae50d8 in generator_new_ (fn=fn@entry=0xaaaaaaae65b8 <trajectory_gen_generator__>, retsize=48, retsize@entry=40) at generator/generator.c:38
#1  0x0000aaaaaaae6b78 in trajectory_gen (tconfig=...) at hkl2.c:246
#2  0x0000aaaaaaae6c2c in Trajectory_solve (tconfig=..., gconfig=..., sconfig=..., move=43690) at hkl2.c:288
#3  0x0000aaaaaaabf324 in main () at sirius.c:161

If we look at the generator line,we get this

gen->base = base;

I do not understand why it work great on 32 bits computer and not on 64 bits computer.

can you help me find the problem. I am ready to test this until it works on 64 bits computer also :))

Cheers

Frederic

picca avatar Feb 07 '18 18:02 picca

here the valgrind output on the arm64 porter box from debian

(sid_arm64-dchroot)picca@amdahl:~/hkl/Documentation/figures$ ../../libtool --mode=execute valgrind  --leak-check=full --show-leak-kinds=all --main-stacksize=1000000 sirius
==7368== Memcheck, a memory error detector
==7368== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==7368== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==7368== Command: /home/picca/hkl/Documentation/figures/.libs/sirius
==7368==
==7368== Invalid write of size 8
==7368==    at 0x1430D8: generator_new_ (generator.c:38)
==7368==    by 0x144B77: trajectory_gen (hkl2.c:246)
==7368==    by 0x144C2B: Trajectory_solve (hkl2.c:288)
==7368==    by 0x11D323: main (sirius.c:161)
==7368==  Address 0x23c0 is not stack'd, malloc'd or (recently) free'd
==7368==
==7368==
==7368== Process terminating with default action of signal 11 (SIGSEGV)
==7368==  Access not within mapped region at address 0x23C0
==7368==    at 0x1430D8: generator_new_ (generator.c:38)
==7368==    by 0x144B77: trajectory_gen (hkl2.c:246)
==7368==    by 0x144C2B: Trajectory_solve (hkl2.c:288)
==7368==    by 0x11D323: main (sirius.c:161)
==7368==  If you believe this happened as a result of a stack
==7368==  overflow in your program's main thread (unlikely but
==7368==  possible), you can try to increase the size of the
==7368==  main thread stack using the --main-stacksize= flag.
==7368==  The main thread stack size used in this run was 1048576.
==7368==
==7368== HEAP SUMMARY:
==7368==     in use at exit: 45,052 bytes in 234 blocks
==7368==   total heap usage: 278 allocs, 44 frees, 52,804 bytes allocated
==7368==
==7368== 64 bytes in 4 blocks are possibly lost in loss record 1 of 8
==7368==    at 0x4844BBC: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 184 bytes in 1 blocks are possibly lost in loss record 2 of 8
==7368==    at 0x4846EE8: realloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 688 bytes in 45 blocks are still reachable in loss record 3 of 8
==7368==    at 0x4844BBC: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 1,104 bytes in 13 blocks are possibly lost in loss record 4 of 8
==7368==    at 0x4846CDC: calloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 4,096 bytes in 1 blocks are still reachable in loss record 5 of 8
==7368==    at 0x4846EE8: realloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 8,192 bytes in 1 blocks are definitely lost in loss record 6 of 8
==7368==    at 0x4844C84: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 12,976 bytes in 138 blocks are still reachable in loss record 7 of 8
==7368==    at 0x4846CDC: calloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 17,748 bytes in 31 blocks are still reachable in loss record 8 of 8
==7368==    at 0x4844C84: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== LEAK SUMMARY:
==7368==    definitely lost: 8,192 bytes in 1 blocks
==7368==    indirectly lost: 0 bytes in 0 blocks
==7368==      possibly lost: 1,352 bytes in 18 blocks
==7368==    still reachable: 35,508 bytes in 215 blocks
==7368==                       of which reachable via heuristic:
==7368==                         newarray           : 1,536 bytes in 16 blocks
==7368==         suppressed: 0 bytes in 0 blocks
==7368==
==7368== For counts of detected and suppressed errors, rerun with: -v
==7368== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)
Segmentation fault

picca avatar Feb 07 '18 18:02 picca

here my generator

struct Trajectory {
	enum trajectory_e tag;
	union {
		struct {double h0; double k0; double l0;
			double h1; double k1; double l1;
			uint n; struct Mode mode;} hklfromto;
	};
};

#define TrajectoryHklFromTo(h0_, k0_, l0_, h1_, k1_, l1_, n_, mode_) {.tag=TRAJECTORY_HKL_FROM_TO, .hklfromto={h0_, k0_, l0_, h1_, k1_, l1_, n_, .mode=mode_}}

extern generator_declare(trajectory_gen, struct Engine, struct Trajectory, tconfig);

picca avatar Feb 07 '18 18:02 picca

And indeed te code which call the generator

HklGeometryList *Trajectory_solve(struct Trajectory tconfig,
				  struct Geometry gconfig,
				  struct Sample sconfig,
				  uint move)
{
	const struct Engine *econfig;
	HklGeometryList *solutions = hkl_geometry_list_new();
	generator_t(struct Engine) gen = trajectory_gen(tconfig);

	HklGeometry *geometry = newGeometry(gconfig);
	HklEngineList *engines = newEngines(gconfig);
	HklSample *sample = newSample(sconfig);
	HklDetector *detector = hkl_detector_factory_new(HKL_DETECTOR_TYPE_0D);
	HklTrajectoryStats *stats = hkl_trajectory_stats_new(Trajectory_len(tconfig));

	hkl_engine_list_init(engines, geometry, detector, sample);

	while((econfig = generator_next(gen)) != NULL){
		/* Engine_fprintf(stdout, *econfig); */
		HklGeometryList *geometries = Engine_solve(engines, *econfig);
		if(NULL != geometries){
			const HklGeometryListItem *solution;

			hkl_trajectory_stats_add(stats, geometries);
			solution = hkl_geometry_list_items_first_get(geometries);
			if(move)
				hkl_engine_list_select_solution(engines, solution);

			hkl_geometry_list_add(solutions,
					      hkl_geometry_list_item_geometry_get(solution));
			/* hkl_geometry_list_fprintf(stdout, geometries); */
			hkl_geometry_list_free(geometries);
		}
	}

	/* hkl_trajectory_stats_fprintf(stdout, stats); */

	hkl_trajectory_stats_free(stats);
	hkl_detector_free(detector);
	hkl_sample_free(sample);
	hkl_engine_list_free(engines);
	hkl_geometry_free(geometry);
	generator_free(gen);

	return solutions;
}

picca avatar Feb 07 '18 18:02 picca

Ok. The first thing to know is that I wrote generator most to see if I could. I was quite surprised how well it worked out, but I don't know that I'd really call it ready for serious use. But let's see what I can work out..

I developed and tested the module mostly on x86_64, so it's certainly not broken on all 64-bit systems in all circumstances.

My guess is simply that you're running out of space on the generator's stack - 64-bit code will typically consume more stack space, because of the 64-bit pointers it needs to store there. Currently generator always allocates a fixed - and tiny (~8kiB) - stack for each generator, meaning you're pretty limited in what you can do inside the actual generator function. Your return type is also a reasonably sized structure, so if you're manipulating similar structures within your generator code that could consume stack pretty quickly.

I had some fairly ambitious plans to improve the stack handling in coroutine and generator, but I didn't get very far with it before being distracted by other projects and I've never had a chance to go back to it.

So first step is to see if that actually is the problem: go into generator.c and change DEFAULT_STACK_SIZE to something much larger (say a couple of megabytes) and see if the problem goes away. If it does, we can work together on improving the out of the box behaviour.

dgibson avatar Feb 08 '18 04:02 dgibson

Thanks a lot for your reply,

I can confirm that using 4 times the current stack_size (8192*4), make the problem vanish for my use case. Now it is time to improve the out of box behaviour ;)

picca avatar Feb 08 '18 07:02 picca

It vanish on the arm64 architecture, but even if I increase a lot the stack size, on amd64 it still faill

picca avatar Feb 08 '18 11:02 picca

the stack trace is a bit different

#0  0x000055555559311f in trajectory_gen_generator__ (ret=0x557be730) at hkl2.c:246
#1  0x00007ffff6aefd70 in  () at /lib/x86_64-linux-gnu/libc.so.6
#2  0x0000000000000000 in  ()

picca avatar Feb 08 '18 11:02 picca

the sefgault arise when I define the generator.

generator_def(trajectory_gen, struct Engine, struct Trajectory, tconfig)
{
	switch(tconfig.tag){
	case TRAJECTORY_HKL_FROM_TO:
	{
		uint i;
		double dh = (tconfig.hklfromto.h1 - tconfig.hklfromto.h0) / (tconfig.hklfromto.n);
		double dk = (tconfig.hklfromto.k1 - tconfig.hklfromto.k0) / (tconfig.hklfromto.n);
		double dl = (tconfig.hklfromto.l1 - tconfig.hklfromto.l0) / (tconfig.hklfromto.n);
		for(i=0; i<tconfig.hklfromto.n + 1; ++i){
			double h = i * dh + tconfig.hklfromto.h0;
			double k = i * dk + tconfig.hklfromto.k0;
			double l = i * dl + tconfig.hklfromto.l0;

			struct Engine econfig = EngineHkl(h, k, l, tconfig.hklfromto.mode);
			generator_yield(econfig);
		}
	}
	break;
	}
}

picca avatar Feb 08 '18 11:02 picca

Here the valgrind output on amd64

(sid_amd64-dchroot)picca@barriere:~/hkl$ ./libtool --mode=execute valgrind  --track-origins=yes Documentation/figures/sirius
==21449== Memcheck, a memory error detector
==21449== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==21449== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==21449== Command: /home/picca/hkl/Documentation/figures/.libs/sirius
==21449==
==21449== Warning: client switching stacks?  SP change: 0x1fff0001b8 --> 0x6c72e08
==21449==          to suppress, use: --max-stackframe=137308459952 or greater
==21449== Warning: client switching stacks?  SP change: 0x6c72d88 --> 0x1fff0001c0
==21449==          to suppress, use: --max-stackframe=137308460088 or greater
==21449== Conditional jump or move depends on uninitialised value(s)
==21449==    at 0x121A2B: perm_r (hkl-geometry.c:1213)
==21449==    by 0x121C2D: hkl_geometry_list_multiply_from_range (hkl-geometry.c:1281)
==21449==    by 0x11CC04: hkl_engine_set.part.7 (hkl-pseudoaxis-private.h:581)
==21449==    by 0x126778: hkl_engine_set (hkl-pseudoaxis-private.h:560)
==21449==    by 0x126778: hkl_engine_pseudo_axis_values_set (hkl-pseudoaxis.c:207)
==21449==    by 0x147664: Engine_solve (hkl2.c:233)
==21449==    by 0x1478C4: Trajectory_solve (hkl2.c:300)
==21449==    by 0x11CEE1: main (sirius.c:161)
==21449==  Uninitialised value was created by a stack allocation
==21449==    at 0x121BB1: hkl_geometry_list_multiply_from_range (hkl-geometry.c:1266)

The stack switch information worried me.

picca avatar Feb 08 '18 12:02 picca

Ok. So, the first step is altering generator to use coroutine_stack_alloc() instead of coroutine_stack_init(). The former is safer, more flexible, and if it fails is much more likely to fail with a relatively clear SEGV, instead of corrupting unrelated memory.

The complication is that the generator code makes a temporary allocation at the "bottom" of the stack area (that is, at the far end of the buffer from where the stack itself will first occupy). That's used to pass the initial parameters into the generator code. So far, this requires manually managing the stack buffer, because coroutine_stack_*() doesn't provide a built in way to make such an allocation.

So we either need to extend coroutine with a way to make such temporary allocations, or use a different means of passing the arguments in.

Let's worry about x86_64 first - there may be further complications with arm64.

The "switching stack" warnings are kind of expected - the generator code does indeed switch stacks, that's how it works. I have some valgrind calls which are supposed to tell it what's going on, but apparently they're not really supported any more, so we might just have to live with valgrind not understanding this code.

dgibson avatar Feb 11 '18 08:02 dgibson

Hello, I can not be of a great help here except if you want me to test patches :),

have a good weekend

picca avatar Feb 17 '18 12:02 picca

Just for information it works with gcc6, on an amd64 computer

:~/hkl/hkl$ gcc --version gcc (Debian 6.3.0-18) 6.3.0 20170516 Copyright (C) 2016 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

So there is something different with gcc-7

picca avatar Feb 22 '18 08:02 picca

My guess would just be that gcc7 is doing some better optimizations that result in less stack usage, and so we're not hitting the overrun that we did before.

dgibson avatar Feb 23 '18 02:02 dgibson

Hello, I reveive this today

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=889878#26

so it seems that this fantastic guy found the culprite :))

picca avatar Aug 04 '18 08:08 picca