digital adj. Having digits.     peer n. A comrade; a companion; a fellow; an associate. inmotion    
   
Recent Articles
Handling a Subversion Repository URL Change
Sunday, May 3, 2009
If your repository URL changes, you can use the following command to fix existing snapshots.
vfat Mounts Default to Lowercase Shortnames
Tuesday, April 21, 2009
I want a "this is brain-damage" quote from Linus for this mess.
VirtualBox or VMWare Virtual Machine at Login
Sunday, April 12, 2009
How to start a virtual machine in X when a user logs in.
Dialog Progress Bar Through Pipe
Sunday, April 12, 2009
How to use dialog to display a script progress bar and communicate progress to it through a named pipe.
Mount JFFS2 Image
Saturday, October 25, 2008
Example of how to mount a JFFS2 image using mtdblock.
Ottawa Linux Symposium 2008
Sunday, July 27, 2008
Here are some pictures from the 2008 Linux Symposium.
Linux Symposium 2008
Sunday, July 20, 2008
I'll be attending the Linux Symposium this year.
Clay Shirky: Institutions vs. collaboration
Monday, July 14, 2008
This is a rather interesting talk that takes some very foundational ideas from open source software development, P2P networks, and social networking and implies that these paradigms can apply to a lot more.

Never tell the truth to people who are not worthy of it.
- Mark Twain

Projects-Code Snippets-Simple STL...

Simple STL String Tokenizer Function

Sunday, January 9, 2005 by digitalpeer

This function simply takes an STL string, a string of delimiters, and returns a vector of tokens.
#include <string>
#include <vector>
using namespace std;

vector<string> tokenize(const string& str,const string& delimiters)
{
	vector<string> tokens;
    	
	// skip delimiters at beginning.
    	string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    	
	// find first "non-delimiter".
    	string::size_type pos = str.find_first_of(delimiters, lastPos);

    	while (string::npos != pos || string::npos != lastPos)
    	{
        	// found a token, add it to the vector.
        	tokens.push_back(str.substr(lastPos, pos - lastPos));
		
        	// skip delimiters.  Note the "not_of"
        	lastPos = str.find_first_not_of(delimiters, pos);
		
        	// find next "non-delimiter"
        	pos = str.find_first_of(delimiters, lastPos);
    	}

	return tokens;
}

Eric Hu posted the following update to retain empty fields between all delimiters. Some comments below say this is buggy, so see Eli's below:
vector<string> tokenize(const string& str,const string& delimiters)
{
  vector<string> tokens;
  
  string::size_type lastPos = 0, pos = 0;  
  int count = 0;
  
  if(str.length()<1)  return tokens;
  
  // skip delimiters at beginning.  
  lastPos = str.find_first_not_of(delimiters, 0);
      
  if((str.substr(0, lastPos-pos).length()) > 0)
  {  	
  	count = str.substr(0, lastPos-pos).length();  	

  	for(int i=0; i < count; i++)  	
  	 	tokens.push_back("");
  	
  	if(string::npos == lastPos)
  		tokens.push_back("");
  }

  // find first "non-delimiter".
  pos = str.find_first_of(delimiters, lastPos);
  
  while (string::npos != pos || string::npos != lastPos)
  {  	      	    
     	// found a token, add it to the vector.
     	tokens.push_back( str.substr(lastPos, pos - lastPos));
				
    	// skip delimiters.  Note the "not_of"
     	lastPos = str.find_first_not_of(delimiters, pos);   	   	    
		
		if((string::npos != pos) && (str.substr(pos, lastPos-pos).length() > 1))  		
  		{
  			count = str.substr(pos, lastPos-pos).length();

  			for(int i=0; i < count; i++)
  	 			tokens.push_back("");
		}
		
  		pos = str.find_first_of(delimiters, lastPos);
  }

	return tokens;
}

Here's an alternative to Eric's implementation by Eli.
vector<string> Tokenize(const string& str,const string& delimiters)
{
 vector<string> tokens;
 string::size_type delimPos = 0, tokenPos = 0, pos = 0;

 if(str.length()<1)  return tokens;
 while(1){
   delimPos = str.find_first_of(delimiters, pos);
   tokenPos = str.find_first_not_of(delimiters, pos);

   if(string::npos != delimPos){
     if(string::npos != tokenPos){
       if(tokenPos<delimPos){
         tokens.push_back(str.substr(pos,delimPos-pos));
       }else{
         tokens.push_back("");
       }
     }else{
       tokens.push_back("");
     }
     pos = delimPos+1;
   } else {
     if(string::npos != tokenPos){
       tokens.push_back(str.substr(pos));
     } else {
       tokens.push_back("");
     }
     break;
   }
 }
 return tokens;
}

Comment Tuesday, February 15, 2005 by  anonymous
very nice thanks :)
Comment Sunday, November 27, 2005 by  anonymous
thanks, really helpful =)
Comment Thursday, December 22, 2005 by  Mgk
thanks, it's really great!
Comment Wednesday, February 8, 2006 by  anonymous
thanks!
Comment Tuesday, March 14, 2006 by  j. ilski
thank you!
Comment Wednesday, May 17, 2006 by  Ross MacGregor
Here is an alternative to listing two. I wrote it myself after examining the verbose listing above.

void tokenize(
std::string const & input,
std::string const & delimiters,
std::vector & tokens)
{
using namespace std;

string::size_type last_pos = 0;
string::size_type pos = 0;

while(true)
{
pos = input.find_first_of(delimiters, last_pos);
if( pos == string::npos )
{
tokens.push_back(input.substr(last_pos));
break;
}
else
{
tokens.push_back(input.substr(last_pos, pos - last_pos));
last_pos = pos + 1;
}
}
}
Comment Wednesday, May 31, 2006 by  anonymous
The top tokenizer code on this page allocates the tokens vector from the stack, then uses it as the return value. Therefore the return will be garbage. Amateur error.
Comment Wednesday, May 31, 2006 by  digitalpeer
Anonymous, you are entirely incorrect. When the vector of strings is returned, a copy is made. What you said would be true if it were a pointer to a stack address, but it simply is not the case.
Comment Monday, November 13, 2006 by  Holger
Hi there,
there is something which I don't understand. When I have a string with tabs as separators, and use a "\t" as the delimiter argument, the routine doesn't work as I would expect:
For a line looking like
a\tb\tc\d
the tokens vector only contains "a" instead of "a", "b", "c", "d".
The whole thing works for "," as delimiter and an input a,b,c,d.

Is there something that I misunderstood about escaping here?

Holger
Comment Thursday, July 3, 2008 by  by Henry Liu
A small bug was found in Eric Hu's version. When input
one,,two,three,four,five
We expect to get
[one] [] [two] [three] [four] [five]
In fact, the folowing vector is returned:
[one] [] [] [two] [three] [four] [five]
-----------------------------------------------------------------------------------------
Now I place a updated version:
--------------------------------------------------------------------------------------------
vector tokenize(const string& str,const string& delimiters)
{
string client = str;
vector result;

while (!client.empty())
{
string::size_type dPos = client.find_first_of( delimiters );
if ( dPos == 0 ) { // head is delimiter
client = client.substr(delimiters.length()); // remove header delimiter
result.push_back("");
} else { // head is a real node
string::size_type dPos = client.find_first_of( delimiters );
string element = client.substr(0, dPos);
result.push_back(element);

if (dPos == string::npos) { // node is last element, no more delimiter
return result;
} else {
client = client.substr(dPos+delimiters.length());
}
}
}
if (client.empty()) { // last element is delimeter
result.push_back("");
}
return result;
}
Comment Monday, September 8, 2008 by  Pix
Thanx Henry for the fix because you are right, the version of Eric is buggy!
Comment Wednesday, January 14, 2009 by  Ross MacGregor
I noticed there is a small typo in my original posting. Here is an updated version that supports string or wstring using a template.

template
void tokenize(
T const & input,
T const & delimiters,
std::vector & tokens)
{
using namespace std;

T::size_type last_pos = 0;
T::size_type pos = 0;

while(true)
{
pos = input.find_first_of(delimiters, last_pos);
if( pos == T::npos )
{
tokens.push_back(input.substr(last_pos));
break;
}
else
{
tokens.push_back(input.substr(last_pos, pos - last_pos));
last_pos = pos + 1;
}
}
}

Submit Comment to This Article
Please post a comment if you have something to add, find something wrong, or would like more information on the topic at hand. Do not use the comment form to contact the author about unrelated concerns!

Name: Email (optional):
Enter verification number here: