digital adj. Having digits.     peer n. A comrade; a companion; a fellow; an associate. inmotion    
   
Recent Articles
Stop DNSMasq From Forwarding Local Hostnames
Saturday, September 25, 2010
Securing your Wireless LAN
Wednesday, August 18, 2010
Some tips and things you might not know about your wireless network.
Using Different Subversion Client Versions
Wednesday, August 18, 2010
Handling a Subversion Repository URL Change
Sunday, May 3, 2009
If your repository URL changes, you can use the following command to fix existing snapshots.
vfat Mounts Default to Lowercase Shortnames
Tuesday, April 21, 2009
I want a "this is brain-damage" quote from Linus for this mess.
VirtualBox or VMWare Virtual Machine at Login
Sunday, April 12, 2009
How to start a virtual machine in X when a user logs in.
Dialog Progress Bar Through Pipe
Sunday, April 12, 2009
How to use dialog to display a script progress bar and communicate progress to it through a named pipe.
Mount JFFS2 Image
Saturday, October 25, 2008
Example of how to mount a JFFS2 image using mtdblock.

Programming is like sex: sometimes something useful comes out, but that is not the reason we are doing it.

Projects-Code Snippets-Simple STL...

Simple STL String Tokenizer Function

Sunday, January 9, 2005 by digitalpeer

This function simply takes an STL string, a string of delimiters, and returns a vector of tokens.
#include <string>
#include <vector>
using namespace std;

vector<string> tokenize(const string& str,const string& delimiters)
{
	vector<string> tokens;
    	
	// skip delimiters at beginning.
    	string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    	
	// find first "non-delimiter".
    	string::size_type pos = str.find_first_of(delimiters, lastPos);

    	while (string::npos != pos || string::npos != lastPos)
    	{
        	// found a token, add it to the vector.
        	tokens.push_back(str.substr(lastPos, pos - lastPos));
		
        	// skip delimiters.  Note the "not_of"
        	lastPos = str.find_first_not_of(delimiters, pos);
		
        	// find next "non-delimiter"
        	pos = str.find_first_of(delimiters, lastPos);
    	}

	return tokens;
}

Eric Hu posted the following update to retain empty fields between all delimiters. Some comments below say this is buggy, so see Eli's below:
vector<string> tokenize(const string& str,const string& delimiters)
{
  vector<string> tokens;
  
  string::size_type lastPos = 0, pos = 0;  
  int count = 0;
  
  if(str.length()<1)  return tokens;
  
  // skip delimiters at beginning.  
  lastPos = str.find_first_not_of(delimiters, 0);
      
  if((str.substr(0, lastPos-pos).length()) > 0)
  {  	
  	count = str.substr(0, lastPos-pos).length();  	

  	for(int i=0; i < count; i++)  	
  	 	tokens.push_back("");
  	
  	if(string::npos == lastPos)
  		tokens.push_back("");
  }

  // find first "non-delimiter".
  pos = str.find_first_of(delimiters, lastPos);
  
  while (string::npos != pos || string::npos != lastPos)
  {  	      	    
     	// found a token, add it to the vector.
     	tokens.push_back( str.substr(lastPos, pos - lastPos));
				
    	// skip delimiters.  Note the "not_of"
     	lastPos = str.find_first_not_of(delimiters, pos);   	   	    
		
		if((string::npos != pos) && (str.substr(pos, lastPos-pos).length() > 1))  		
  		{
  			count = str.substr(pos, lastPos-pos).length();

  			for(int i=0; i < count; i++)
  	 			tokens.push_back("");
		}
		
  		pos = str.find_first_of(delimiters, lastPos);
  }

	return tokens;
}

Here's an alternative to Eric's implementation by Eli.
vector<string> Tokenize(const string& str,const string& delimiters)
{
 vector<string> tokens;
 string::size_type delimPos = 0, tokenPos = 0, pos = 0;

 if(str.length()<1)  return tokens;
 while(1){
   delimPos = str.find_first_of(delimiters, pos);
   tokenPos = str.find_first_not_of(delimiters, pos);

   if(string::npos != delimPos){
     if(string::npos != tokenPos){
       if(tokenPos<delimPos){
         tokens.push_back(str.substr(pos,delimPos-pos));
       }else{
         tokens.push_back("");
       }
     }else{
       tokens.push_back("");
     }
     pos = delimPos+1;
   } else {
     if(string::npos != tokenPos){
       tokens.push_back(str.substr(pos));
     } else {
       tokens.push_back("");
     }
     break;
   }
 }
 return tokens;
}

Comment Tuesday, February 15, 2005 by  anonymous
very nice thanks :)
Comment Sunday, November 27, 2005 by  anonymous
thanks, really helpful =)
Comment Thursday, December 22, 2005 by  Mgk
thanks, it's really great!
Comment Wednesday, February 8, 2006 by  anonymous
thanks!
Comment Tuesday, March 14, 2006 by  j. ilski
thank you!
Comment Wednesday, May 17, 2006 by  Ross MacGregor
Here is an alternative to listing two. I wrote it myself after examining the verbose listing above.

void tokenize(
std::string const & input,
std::string const & delimiters,
std::vector & tokens)
{
using namespace std;

string::size_type last_pos = 0;
string::size_type pos = 0;

while(true)
{
pos = input.find_first_of(delimiters, last_pos);
if( pos == string::npos )
{
tokens.push_back(input.substr(last_pos));
break;
}
else
{
tokens.push_back(input.substr(last_pos, pos - last_pos));
last_pos = pos + 1;
}
}
}
Comment Wednesday, May 31, 2006 by  anonymous
The top tokenizer code on this page allocates the tokens vector from the stack, then uses it as the return value. Therefore the return will be garbage. Amateur error.
Comment Wednesday, May 31, 2006 by  digitalpeer
Anonymous, you are entirely incorrect. When the vector of strings is returned, a copy is made. What you said would be true if it were a pointer to a stack address, but it simply is not the case.
Comment Monday, November 13, 2006 by  Holger
Hi there,
there is something which I don't understand. When I have a string with tabs as separators, and use a "\t" as the delimiter argument, the routine doesn't work as I would expect:
For a line looking like
a\tb\tc\d
the tokens vector only contains "a" instead of "a", "b", "c", "d".
The whole thing works for "," as delimiter and an input a,b,c,d.

Is there something that I misunderstood about escaping here?

Holger
Comment Thursday, July 3, 2008 by  by Henry Liu
A small bug was found in Eric Hu's version. When input
one,,two,three,four,five
We expect to get
[one] [] [two] [three] [four] [five]
In fact, the folowing vector is returned:
[one] [] [] [two] [three] [four] [five]
-----------------------------------------------------------------------------------------
Now I place a updated version:
--------------------------------------------------------------------------------------------
vector tokenize(const string& str,const string& delimiters)
{
string client = str;
vector result;

while (!client.empty())
{
string::size_type dPos = client.find_first_of( delimiters );
if ( dPos == 0 ) { // head is delimiter
client = client.substr(delimiters.length()); // remove header delimiter
result.push_back("");
} else { // head is a real node
string::size_type dPos = client.find_first_of( delimiters );
string element = client.substr(0, dPos);
result.push_back(element);

if (dPos == string::npos) { // node is last element, no more delimiter
return result;
} else {
client = client.substr(dPos+delimiters.length());
}
}
}
if (client.empty()) { // last element is delimeter
result.push_back("");
}
return result;
}
Comment Monday, September 8, 2008 by  Pix
Thanx Henry for the fix because you are right, the version of Eric is buggy!
Comment Wednesday, January 14, 2009 by  Ross MacGregor
I noticed there is a small typo in my original posting. Here is an updated version that supports string or wstring using a template.

template
void tokenize(
T const & input,
T const & delimiters,
std::vector & tokens)
{
using namespace std;

T::size_type last_pos = 0;
T::size_type pos = 0;

while(true)
{
pos = input.find_first_of(delimiters, last_pos);
if( pos == T::npos )
{
tokens.push_back(input.substr(last_pos));
break;
}
else
{
tokens.push_back(input.substr(last_pos, pos - last_pos));
last_pos = pos + 1;
}
}
}

Submit Comment to This Article
Please post a comment if you have something to add, find something wrong, or would like more information on the topic at hand. Do not use the comment form to contact the author about unrelated concerns!

Name: Email (optional):
Enter verification number here: