Problems & Solutions

Compilation Problems

Compilation with g++ v4.3

When compiling POPC++v1.1.1 with g++ v4.3, this compilation errors occurs:

../include/paroc_array.h:23: error: request for member ‘~char [64]’ in 
     ‘* data’, which is of non-class type ‘char [64]’

You need to add these three lines in the sources :

diff --git a/include/paroc_array.h b/include/paroc_array.h
index d8750be..169e569 100644
--- a/include/paroc_array.h
+++ b/include/paroc_array.h
@@ -26,6 +26,11 @@ template<class T> void paroc_destruct_element(T *data, int n)
 inline void paroc_construct_element(char *data, int n) {};
 inline void paroc_destruct_element(char  *data, int n){};

+typedef char string64[64];
+
+inline void paroc_construct_element(string64 *data, int n) {};
+inline void paroc_destruct_element(string64  *data, int n){};
+
 template<class T> class paroc_array
 {
  public:

You also need to change one include in test/integer/main.cc : iostream.h is to be replaced with iostream.

Note : This bug has been corrected since version 1.3 (current version)


Parser limitations

Several standard C++ syntaxes in header files raise an error of this kind :

slfio/A3DIO.ph:44: ERROR :syntax error
  1. Variables cannot be defined as 'static' or 'const'
  2. This syntax : Methoid(void) is not supported (it must be replaced by Method())
  3. virtual void Method(…) = 0; : The '= 0' raises an error. This means that methods cannot be defined as purely virtual. - (fixed in version 1.2)
  4. void cleanup() throw();
  5. Also an error (or warning) must be raised if an async methods returns a value : int async Method() - (fixed for version 1.3)
  6. A method cannot be constant : int Meth()const; - (probably unfixable, see developer page)
  7. Heritage : the parser has troubles with this kind of syntax : class POPintVector : public vector<int> ,public POPBase. A template cannot be used as a base class

Double heritage issue

For the POP-C++ parser the second heritage of a double heritage is ignored : because a DataType may only have one base class ! Using double heritage may work but with some restrictions.

E.g. this line

class POPintVector : public vector<int>, public POPBase

can cause this error

ERROR in ParObject::SetData: unable to marshal argument 1.

because only the first base class is taken into account and not POPBase which is the class that can be marshalled. The solution is to re-write the line this way :

class POPintVector : public POPBase, public vector<int>

Compilation of an object

Error message :

g++: no input files

Explanation : version 1.2 beta of parocpp makes a segfault at the compilation of certain files. A '::' in class initialization is mistaken for a ':'

Solution: parser.lex must be modified

Index: parser.lex
===================================================================
--- parser.lex	(revision 23)
+++ parser.lex	(working copy)
@@ -236,8 +236,8 @@
 	  int n=ReadUntil(":;{", buf, 10240);
 	  othercodes.InsertAt(-1,buf,n);
 	  linenumber+=CountLine(buf);
-	  if (n && buf[n-1]==':')
-	    {
+         if (n && buf[n-1]==':')
+           {
 	      while (1)
 		{
 		  //extract base class name
@@ -252,9 +252,12 @@
 		    }
 		  sscanf(buf," %[_a-zA-Z0-9]",clname);
 
-		  if (thisCodeFile->FindClass(clname)!=NULL)
+                  // Modif : Ignore "::" which was mistaken for ":" 
+                  if (*buf!=':' && thisCodeFile->FindClass(clname)!=NULL)
 		    {
 		      char *t=strstr(buf,clname)+ strlen(clname);
+                      if(strstr(buf,clname)==NULL){
+                        fprintf(stderr, "ERROR: %s:%d: Bad base class initialization (2)\n",filename, linenumber);exit(1);}
 		      othercodes.InsertAt(-1,buf,t-buf);
 		      othercodes.InsertAt(-1,postfix,len1);
 		      othercodes.InsertAt(-1,t,strlen(t));

Then run :

cd parser ; make gen

to regenerate the parser files


multiple definition of CheckIfPacked

With the version of POP-C++ later than 1.2 this error may happen at the compilation of an object :

/home/lwinkler/prog/VMS/projects/testPhototheque/vmspack.cc:16: multiple definition of `CheckIfPacked(char const*)'

This may occur because you are compiling two files with a @pack command at the same time.

Solution Edit your .cc files and maybe create separate pack_myobj.cc files that look like this :

#include "myobj.ph"
@pack(MyObj);

You can then compile each object separately with its own pack_myobj.cc.

Class attribute declared as reference

description : A class attribute declared as a reference is considered as a non-reference after parsing.

fix :

Index: parser/classmember.cc
===================================================================
--- parser/classmember.cc	(revision 24)
+++ parser/classmember.cc	(revision 25)
@@ -173,8 +173,12 @@
 bool Param::DeclareVariable(char *output)
 {
   if (mytype==NULL) return false;
+  
+  char tmpvar[1024];
+  if (isRef&&!GetType()->IsParClass()) sprintf(tmpvar,"&%s", name);
+  else strcpy(tmpvar,name);
 
-  if (mytype->GetDeclaration(name, output))
+  if (mytype->GetDeclaration(tmpvar, output))
     {
       strcat(output,";\n");
       return true;

Error : undefined Class__parocobj::Method

At linkage, the compiler complains that a _parocobj method is not referenced.

error :

snowdrift/libsnowdriftparoc.a(SnowDriftWorker_par.o):(.rodata._ZTV25SnowDriftWorker__parocobj[vtable for SnowDriftWorker__parocobj]+0x94): undefined reference to `SnowDrift__parocobj::resetArray(CArray<double>&)'

solution : Your .o file is compiled with g++ instead of POP-C++. This means that it does not contain the POPC++ generated methods and cannot link.


Error : undefined reference to `yywrap' (for POP-C++ versions < 1.2)

error :

/home/lwinkler/dsp/popc-1.1.1/parser/parser.yy.cc:2136: undefined reference to `yywrap'

solution : add library -lfl in parser/Makefile

	# LIBS = -ldl -lnsl -lpthread
	LIBS = -ldl -lnsl -lpthread -lfl

of type :

cd parser
make LIBS=-lfl



Runtime Problems

Marshalling of large arrays/vectors

In case the user tries to use very large arrays (>300MB) as parameters to remote methods an exception is raised. This is due to a memory problem. The system cannot a sufficiently big contiguous memory space to allocate an array used to Pack the buffer.

Bug in benchmark

When used with any optimization flag g++ does not compute operations for unused variables. The benchmark is then skipped. This is fixed by making a printf of the result.

diff --git a/lib/benchmark.cc b/lib/benchmark.cc
index 779c46b..7e0364a 100644
--- a/lib/benchmark.cc
+++ b/lib/benchmark.cc
@@ -21,6 +21,7 @@ float paroc_utils::benchmark_power()
         c[i][j]=tmp;
       }
   unsigned cl2=clock()-cl1;
+  printf("c %f",c[44][34]); // HIS LINE MUST BE HERE : gcc with optimization flag does not comput
   return ((1.0*MATRIXSIZE*MATRIXSIZE*MATRIXSIZE*6.0)/(cl2*1.0E6/CLOCKS_PER_SEC));

 }

Note : This bug has been corrected for versions > 1.1.1


Object with od.memory(...) cannot start

If you want to run an object and use an object descriptor specifying the memory, you need to add a line to your /usr/local/popc/etc/jobmgr.conf.

For example this specifies that your local machine has 2048 MB of RAM available :

ram 2048

Destruction of object is impossible due to cross references

  • Description : All objects are explicitly destroyed but they they still keep running and the main routine cannot exit cleanly.
  • Explanation : Objects that are referenced by another objects keep running even after their explicit destruction. If several objects have crossed references to each other you need to explicitly delete these references before they can be destroyed.

Polymorphism issue

The program leads to a runtime error after a method is called; one of the arguments is a child class encapsulated in a parent class pointer.

Explanation : This is a limitation of POPC. Methods must be called with arguments whose dynamic type must correspond with the static class of their type.

Modification : The code of POPC++ can be modified in the following way to notify the error

--- parser/class.cc	(revision 4)
+++ parser/class.cc	(working copy)
@@ -63,6 +63,12 @@
   sprintf(tmpstr,"%s.Push(\"%s\",\"paroc_interface\",1);\n",bufname,paramname);
   output.InsertAt(-1,tmpstr,strlen(tmpstr));
 
+  sprintf(tmpstr, "if(!paroc_utils::MatchWildcard(typeid(%s).name(),\"*%s\"))\n",varname,GetName());
+  output.InsertAt(-1,tmpstr,strlen(tmpstr));
+  
+  sprintf(tmpstr, "{printf(\"POPC Error at method call: dynamic type of %s must correspond with static type %s\\n\");exit(-1);}\n",varname,GetName());
+  output.InsertAt(-1,tmpstr,strlen(tmpstr));
+
   sprintf(tmpstr, "((%s &)(%s)).Serialize(%s, true);",GetName(),varname,bufname);
   output.InsertAt(-1,tmpstr,strlen(tmpstr));

Note : This must be added to the limitations in the POPC documentation


Connection refused

You launch an application with parocrun :

parocrun <mapfile> <application>

and you get a message like the following :

<remote_node>: Connection refused

example :

[jfr@grid212 integer]$ parocrun map ./main
Object Integer on grid212.tic.hefr.ch
Object Integer on grid212.tic.hefr.ch
grid211.tic.hefr.ch: Connection refused ←—- this is the error message

Destroying the object…
Destroying the object…
Object creation failure
Kill all parallel objects are requested
[jfr@grid212 integer]$ [codemgr.cc:13]Now destroy CodeMgr
[jfr@grid212 integer]$

you have to check the $PAROC_RSH variable :

echo $PAROC_RSH

the answer should be

/usr/bin/ssh

if this is not the case do

export PAROC_RSH=/usr/bin/ssh

Explanation :

POP-C++ until and inclusive version 1.1.1 use rsh as default to contact remote machines. rsh, rlogin and rexec are usualy blocked by the sysadmin because it creates a weak point in the security. For most installations the sysadmin allows the use of ssh. Thus, use ssh instead of rsh. To do so you must set the PAROC_RSH variable to ssh ( normally located at /usr/bin )


Error: fail to connect to callback (for POP-C++ version < 1.3)

As you launch the jobmanager with the command

SXXparoc start

and you receive something like :

Starting PAROC Job manager service:
Error: fail to connect to callback
Error: starting broker….
: No route to host<br>

You can have the same problem if you have an application which does not use the jobmanager but you define directly the machine on which your objects will run (using od.url). The error message look like the same : “Error: fail to connect to callback”

Explanation A remote object cannot contact back its creator because the address specified for callback is set wrong (probably to 127.0.0.1).

Some Linux distributions add a line in the file /etc/hosts on which you find the loopback address and the name of the machine, for example :

127.0.0.1    my_machine_name

In this case, you have to remove the name of the machine on the line where the loopback address is, add a new line on which you give the ip address and the name of your machine :

127.0.0.1        localhost
<IP address>     <machine name>

or

127.0.0.1        localhost
<IP address>     <machine name>   <machine name.domain>

with <IP address> the IP address of the machine where you launch the jobmanager and <machine name> the name of the machine. <machine name.domain> is the name of the machine followed by the domain in which the machine is known. Of course, you can have other lines with other IP adresses and other machine names.

Once you have made the changes in your hosts file. The output of :

hostname -i

should return the ip address of your computer on your network and not 127.0.0.1.

If you cannot modify your /etc/hosts file. You must add the runtime environment variable PAROC_IP in the install procedure and set its value to your IP address.


Out of resource

This message occurs when the program is trying to launch an object but cannot find a suitable resource or object executable file :

  • Check if the path to the object in the map file is right
  • Check that the an object is listed for each type of architecture in your map file
    • Important : The notation of the platform architecture changes between version 1.1.1 and 1.2 : If you have changed your version of POP-C++, you need to do a full installation and adapt your map files !!
    • The architecture is case sensitive
  • Check that your Job Managers meets the requirements to run the object (nb of process, power, memory, …). Tip : use the jobmgrquery command

Other problems

printf / cout

If you use a printf command in your program do not forget to add a '\n' at the end : otherwise your message will remain in the buffer until the next '\n' is printed.

If you want to use cout instead printf you will need to end your message with endl and not '\n' so that the buffer is flushed and the result printed immediately.

jobmgr

Standard installation

When you make a standard installation the system ask you to enter the full qualified master host name. At this question do not answer with the local host name. Either give nothing (just press <return> key) if you just will use your local machine or give the name of remote machines.

Launching the jobmgr on several machines

By launching the jobmgr simultaneously on several machines, you will hang on all machines. If you intent to use several machines , just start the jobmgr on one machine after the other.

problems_solutions.txt · Last modified: 2011/03/28 08:56 by jfroche
CC Attribution-Noncommercial-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0